Tennessee Tech professor uses AI to help preserve Cherokee language
Jesse Roberts, assistant professor of computer science at Tennessee Tech, is part
of a collaborative effort using AI to help preserve the Cherokee language.
A Tennessee Tech University computer science professor is part of a collaborative
effort using artificial intelligence to help preserve the Cherokee language, one of
many endangered Indigenous languages in the United States.
Jesse Roberts is an expert in natural language processing and computational linguistics,
and his interest in language preservation was sparked by work being done in Ireland
to protect the Gaelic language through AI-driven methods.
“My research looks at how we can use computers to process language, reason, and analyze
meaning,” Roberts said. “When I saw what was being done in Ireland with Gaelic, I
started thinking about how similar techniques could be used for Indigenous languages
– particularly Cherokee – here in the U.S.”
Cherokee is a polysynthetic language, which means complex words can be created by
combining smaller linguistic units. It can be described as Lego blocks for language
– words are built by combining many smaller word parts into long, complex words that
can express what would take an entire sentence in English.
And it presents unique challenges for AI modeling. Unlike widely spoken languages
such as English, which have vast digital datasets for training AI, Cherokee lacks
the same breadth of resources. This makes it difficult to develop AI-driven tools
for language learning and preservation.
“This is a typical problem in low-resource languages,” Roberts explained. “With English,
we have immense amounts of data available to train AI models. Cherokee, on the other
hand, has very little online presence, making it difficult to teach AI how to understand
and generate it.”
In fact, Roberts notes, fewer than 140 first-language Cherokee speakers remain, most
of them over the age of 60.
One of the primary goals of the project, he said, is to develop AI systems that can
go beyond simply recording spoken Cherokee.
While documentation is essential, Roberts and his collaborators envision a more interactive
approach – one where AI can engage with users in conversation, serving as a tool for
language learners, educators, and even museum installations.
Ben Frey, a linguist at the University of North Carolina at Asheville and a member
of the Eastern Band of Cherokee Indians, is a key collaborator with Roberts – and
the project involves other Cherokee language educators and community leaders as well.
“Ben is classically trained in linguistics and has immersive experience in Eastern
Band Cherokee. His expertise helps us make sure we’re not just preserving words but
also the deeper cultural meanings embedded in the language,” Roberts said.
Community involvement is critical to the project. The researchers are working closely
with James “Bo” Taylor, cultural resource officer with the Eastern Band of Cherokee
Indians, and other representatives and organizations to ensure that the AI tools align
with the needs and expectations of Cherokee speakers.
“There are strong opinions on how preservation resources should be used,” Roberts
said. “It’s a delicate balance – time spent training an AI model is time that a fluent
speaker isn’t spending directly teaching another person. We want this project to be
synergistic, helping rather than hindering language transmission.”
That’s why the team is collaborating on efforts that overlap as much as possible.
The long-term vision for the project is to create an AI model that can facilitate
meaningful conversations in Cherokee. While this is still years away, incremental
progress could lead to AI-powered tools for distance learning, museum exhibits and
language education programs.
“Every step forward helps,” Roberts said. “The more we can document, analyze, and
model, the better equipped we are to preserve the language for future generations.”
Beyond Cherokee, the research has broader implications for other endangered languages
worldwide. Similar AI-driven projects could aid in revitalization efforts for African
languages, Iroquoian languages and other polysynthetic linguistic systems.
“There’s no silver bullet for language preservation,” Roberts said. “But AI gives
us leverage, room for innovation, and new ways to engage people who want to learn.
Language loss is a global issue, and our hope is that the methods we develop here
can be transferred to help other communities as well.”
According to a 2023 report by the Language Conservancy, approximately nine languages
are lost each year. That means a language dies on average every 40 days.
The stakes are high for Roberts and his collaborators. Without active intervention,
he estimates that the Cherokee language could disappear within a generation. As the
project moves forward, the team encourages those interested in language preservation,
AI or Cherokee culture to get involved.
“When a language dies, you don’t just lose words – you lose a worldview, a way of
understanding and interacting with the world,” Roberts said, quoting United Nations
Educational, Scientific and Cultural Organization (UNESCO) scholar Lucía Iglesias
Kuntz. “Our goal is to help ensure that doesn’t happen.”