Skip to content

July 22, 2024

Half the World’s Languages Are Endangered—But AI Can Help Save Them

Researchers are using AI to preserve endangered languages—including ones you might not think of as endangered.

Listen to this article:

Loading the Elevenlabs Text to Speech AudioNative Player...

There is a word in Kwak’wala—a’wilaxsila—that roughly translates to “taking the care of something seriously.” 

It’s both Sarah Child’s go-to example of one of the many concepts that exists in her community’s language that English simply can’t translate, and how you might describe her passion for revitalizing the language of the Kwakwaka’wakw Nations on the land now known as British Columbia. 

“We hope our children will grow up knowing some of the phrases, those values, those beliefs,” says Child, who adds that language and culture are intrinsically bound together. “Our goal is to fill our children up, fill their hearts and their souls and help them walk on the earth in the way our ancestors did.” 

Like many other Indigenous languages, Kwak’wala has been driven to the brink of extinction by the oppressive, suppressive forces of colonialism. It’s estimated that there are just 140 fluent speakers left, and many of them are advanced in age. At the same time, it’s also been preserved in archival recordings taken by anthropologists—including Child’s own great-great-grandfather—decades ago. 

It’s a race against time combined with a mountain of data that Child and her team at the Sanyakola Project, a language revitalization research initiative at B.C.’s North Island College, are looking to solve in a thoroughly 21st-century way. By using artificial intelligence to comb through these recordings and transcribe them, present-day Kwak’wala speakers can further enrich their understanding of the language and the cultural concepts contained therein—ideally while there are still elders who are able to tease out the nuances and complexities that may have been lost, or just lost in interpretation.

We’re trying to piece our knowledge systems back together

sarah childs

“It’s archival data that has integral wisdom and knowledge of our ways of life and ways of being that have been severely damaged by colonization,” she says. “We’re trying to piece our knowledge systems back together.” It’s the difference, she adds, between learning the steps to a traditional dance and knowing the deeper meaning behind them. 

Advertisement

With funding from a government non-profit, the SanyakolaProject is hoping to develop a “voice to text” model that accelerates language learning, not just for Kwak’wala but as part of a larger revitalization model that could help preserve other endangered languages too. 

Kwak’wala is not the only endangered language in the world, of course. About 43 per cent of the 7,000 languages that are still in use share this status, according to the UNESCO Atlas of World Languages in Danger. What’s more, the United Nations estimates that an Indigenous language “dies” every two weeks, coinciding with the death of its last native speaker. And that’s before you get to the fact that 97 per cent of languages are considered to be “digitally disadvantaged.” According to the academics and tech experts behind SILICON (Stanford Initiative on Language Inclusion and Conservation in Old and New Media), a project that is trying to encourage linguistic inclusivity, most of the technology we’ve built our modern world on was created by, and continues to be maintained for, mostly English speakers using the Roman alphabet.

Case in point: Writing in Fortune earlier this year, Tamil speaker Karthik Chidambaram relayed the frustration—and blunting of meaning—that happens when a speaker of Tamil, which has 257 characters, is forced to type using English’s measly 26-character alphabet. He likened it to a kind of “digital extinction”—and it’s why he’s founded a company, DCKAP, that is building a Tamil keyboard that saves speakers from having to “transliterate” into English. 

SILICON is also helping build keyboards, working to ensure more languages are represented in Unicode, the international standard for digital text and characters, and developing AI translation apps. 

Especially that last one. AI is a “hot topic” in endangered language research, according to Anna Luisa Daigneault, a linguistic anthropologist who is starting a PhD on the topic at the University of Montreal in January 2025. 

Advertisement

“It’s enticing. It’s like, ‘How can AI help us speed up the work when there are so few of us focused on endangered language?’” she says, noting that there are some areas, particularly when it comes to compiling and sorting through data or digitizing analog records, that are promising. 

As with any other conversation around artificial intelligence, however, there is a flip side to all this potential. 

If the big giants start stealing our languages, we’re going to get left behind again

Anna Luisa Daigneault

“In some cases, because these endangered languages are only partially documented or not documented at all, you get a really high potential for errors,” she says. “If you start introducing AI really early when analyzing a language, you could end up with a completely wrong model for this language.” 

It’s exacerbated by the fact that AI is a “very English-centric technology built by English-centric people,” meaning it struggles with the vastly different grammars and sounds of other languages, especially those that share no common linguistic roots. 

Advertisement

For example, she’s seen colleagues try to use AI to create a phonetic transcription of an Indian Indigenous language recording. “And it does a terrible job because it needs humans to understand not just the phonetic realization of the words but the phonological underpinnings of the words,” she says. “Languages have sounds that correspond to the words people are uttering, but those change in the context of how they’re saying them.” Translation: If it’s something that even highly experienced phonologists struggle through, there’s no way the technology we have today is up to the job unless a phonologist has already fully analyzed the language and rules that govern sound. 

And then there’s the issue of “digital colonization,” a phrase that encompasses the ways that Silicon Valley can act as an imperial force, from digital ads sold by tech giants decimating local media companies to the way ride-share apps disrupt local taxi companies, all while mining people’s data in a way that echoes the way 19th-century colonial powers exploited physical resources. “We have this new wave of colonization in the tech sector—and if Indigenous people get left behind on this wave, equity will never be achieved,” Child says. “We want to make sure that Indigenous people around the globe are not left behind on the next freeway of colonization…and we want to decolonize it.” 

This is why she and her colleagues are also working with lawyers to draw up agreements around data sovereignty. “If the big giants start stealing our languages, we’re going to get left behind again.” 

It’s for these reasons that Daigneault says these tools should be built “from the ground up” in a way that centres the communities themselves in the process. 

It’s an approach that guides her work with the Living Tongues Institute for Endangered Languages, a non-profit that has spent the last 19 years building, in collaboration with communities, online “living dictionaries” of more than 100 endangered languages in 15 countries. 

Advertisement

She’s currently based in North Carolina, working with the Haliwa-Saponi Nation as they try to bring back their language, which has been extinct for decades but preserved in archival records that have been used to build out their own “living dictionary.”

“The community does feel like they’re far away from bringing fluent speakers back, but even just having words, greetings, signage, names for plants and animals is so meaningful for the people who are reviving it,” she says. “There’s a thread of that cultural legacy that’s not being lost, but revived, and leads to a deeper understanding of the past.” 

Knowing really strongly who we are, where we are from—language is a central part of that

sarah childs

Still, technology isn’t the solution to saving or reviving endangered languages, Daigneault stresses. It’s a tool, and one that she is optimistic about seeing people use to do this work. She points to a community in the Amazon that they work with who don’t have access to electricity, but have a community solar-powered battery cell that they use to charge their phones and access the internet via Starlink.

“And then they sit around when they have free time, and they talk with the older speakers, recording their voices on their phones, and they’re doing their dictionary entries themselves, one by one,” she says. “When I see people actually at work, doing their small projects using technology, for me I see that people are actively on the right track with language revitalization.” 

This mirrors Child’s own dream for her work with her own language. 

“Success is when my children, my grandchildren, my great-grandchildren can speak Kwak’wala, and we are able to live, love, laugh and work in our language, wherever we are,” she says. “And knowing really strongly who we are, where we are from—and language is a central part of that.”