Open Forum #73 Indigenous Peoples Languages in a Digital Age

26 Jun 2025 16:00h - 17:00h

Session at a glance

Summary

This panel discussion at the Internet Governance Forum focused on barriers to indigenous language technology and AI uptake, held during the International Decade of Indigenous Languages. The session brought together experts from various backgrounds, including representatives from the Sámi Parliament, UNESCO, Meta, and academic institutions, to address challenges facing indigenous languages in digital spaces.

The discussion revealed that while language technology for indigenous languages exists and is technically feasible, the main barriers are not technological but structural and systemic. Platform owners often make it difficult or impossible for indigenous communities to implement their language tools in mainstream applications and systems, even when the technology is available. This creates a significant gap between what is technically possible and what communities can actually deliver to their users.

Several key challenges were identified, including limited digital infrastructure, lack of written language systems for oral traditions, restrictive data protection regulations that complicate data collection from indigenous communities, and the dominance of global languages in online content. The panelists emphasized that large tech companies often don’t see indigenous languages as profitable markets, leading to their exclusion from digital platforms.

The discussion highlighted the importance of community involvement and data sovereignty, with speakers stressing that indigenous peoples must be partners and co-creators rather than passive users in technology development. The principle of free, prior, and informed consent was emphasized as essential for ethical data collection and use.

Meta’s representative presented several initiatives, including open-source AI models and translation tools for 200 languages, demonstrating how open-source approaches can enable communities to adapt and control their own language technologies. The session concluded with UNESCO’s call for a mindset shift toward viewing language rights as human rights that must be respected in digital spaces, emphasizing the need for inclusive, community-driven innovation rather than top-down technological solutions.

Keypoints

## Major Discussion Points:

– **Barriers to Indigenous Language Technology Access**: Despite existing language technology capabilities, indigenous communities face significant obstacles in implementing and distributing their language tools due to closed platforms, restrictive policies from major tech companies, and lack of accessible integration pathways for minority languages.

– **Data Sovereignty and Community Control**: The tension between needing large datasets to train AI models for indigenous languages while ensuring communities maintain ownership and control over their linguistic data, with emphasis on free, prior, and informed consent principles rather than simple permission-seeking.

– **AI’s Potential vs. Risks for Indigenous Communities**: AI presents opportunities to bridge equity gaps in education, healthcare, and adaptive learning for indigenous peoples, but also risks widening disparities if these communities are excluded from AI development and implementation processes.

– **Open Source vs. Proprietary Solutions**: The advantages of open-source AI models and technologies that allow communities to customize, refine, and maintain control over their language tools, contrasted with the limitations of closed, proprietary platforms that restrict community agency.

– **Need for Systemic Change Beyond Technology**: Recognition that the challenges aren’t primarily technical but structural, political, and ethical, requiring a fundamental mindset shift from treating indigenous languages as niche markets to recognizing language rights as human rights integral to digital inclusion.

## Overall Purpose:

The discussion aimed to examine barriers preventing indigenous and minority language communities from accessing and benefiting from language technology and AI, while exploring solutions for more equitable digital inclusion during the UN International Decade of Indigenous Languages.

## Overall Tone:

The discussion maintained a collaborative and solution-oriented tone throughout, with speakers demonstrating mutual respect and shared commitment to indigenous language rights. While participants acknowledged serious challenges and historical injustices, the tone remained constructive and forward-looking, emphasizing partnership, community empowerment, and the urgent need for systemic change in how technology platforms approach linguistic diversity.

Speakers

– **MODERATOR**: Session moderator (role unclear from transcript)

– **Sjur Norstebo Moshagen**: Head of Sámi language technology work at the University of Tromsø, panel debate moderator

– **Ole Henrik Bjorkmo Lifjell**: Member of the Governing Council of the Sámi Parliament

– **David Castillo Barra**: International consultant specializing in promotion of multilingualism, member of the Secretariat for the International Decade of Indigenous Languages at UNESCO

– **Lars Ailo Bongo**: Professor in health technology at the Department of Computer Science at the University of Tromsø, adjunct professor at the Sámi University College heading the Sámi AI Lab

– **Outi Kaarina Laiti**: Computer game researcher, designer, and media education specialist from the National Audiovisual Institute of Finland, blends Sámi culture with tech and education

– **Valts Ernstreits**: Livonian language activist developing digital tools for endangered languages, works at the University of Latvia Livonian Institute, focuses on global digital inclusion policies

– **Aili Keskitalo**: Former Sámi Parliament president, indigenous rights advocate focusing on climate and just transition in Sápmi, works for Amnesty International in Norway

– **Kevin Chan**: Works at Meta on global digital policy to empower indigenous languages online

– **Tawfik Jelassi**: UNESCO official (specific title not mentioned in transcript)

– **Audience**: Audience member, Henry Wang from Singapore IGF, founding member of Singapore Internet Governance Forum, co-founder of LingoAI

**Additional speakers:**

None identified beyond the speakers names list.

Full session report

# Indigenous Languages in the Digital Age: Barriers to Technology and AI Uptake

## Executive Summary

This panel discussion at the Internet Governance Forum examined the critical barriers preventing indigenous and minority language communities from accessing and benefiting from language technology and artificial intelligence during the UN International Decade of Indigenous Languages. The session brought together representatives from the Sámi Parliament, UNESCO, Meta, and leading academic institutions to address the challenges facing indigenous languages in digital spaces.

The discussion revealed that while language technology for indigenous languages is technically feasible, the primary barriers are structural, political, and ethical rather than technological. Platform owners often create obstacles for indigenous communities attempting to implement their language tools in mainstream applications, creating a gap between what is technically possible and what communities can actually deliver to their users.

## Opening Context and Moderation

**Sjur Norstebo Moshagen** from the University of Tromsø served as moderator, with **David Castillo Barra** from UNESCO’s Secretariat for the International Decade of Indigenous Languages as online co-moderator. The session opened with **Ole Henrik Bjorkmo Lifjell** from the Sámi Parliament setting the context for indigenous language challenges in the digital age.

## Key Presentations

### Sámi Parliament Perspective

Ole Henrik Bjorkmo Lifjell emphasized the political and rights-based dimensions of language technology access, highlighting how indigenous communities face systematic exclusion from digital platforms and services.

### Academic Research Insights

**Lars Ailo Bongo**, Professor at the University of Tromsø and head of the Sámi AI Lab, discussed AI’s potential to bridge equity gaps in healthcare and education for indigenous communities. He noted that “AI can bridge maybe the most important equity gap that indigenous people are exposed to, which is the lack of experts in fields like medicine or education that has the language and cultural knowledge needed to understand and provide equitable services.”

However, Bongo also highlighted regulatory challenges, explaining that “Indigenous people face a dilemma as data subjects requiring extra protection under GDPR, yet needing data collection to ensure AI works equitably for minorities.” He proposed regulatory sandboxes as a potential solution for ethical data collection.

### Educational Technology Integration

**Outi Kaarina Laiti** from Finland’s National Audiovisual Institute described Finland’s 10-year experience with programming education, including the development of Sámi programming guides and media archives for speech recognition training. She mentioned ongoing projects including the Sámi Game Jam and extended reality initiatives since 2018, while noting that questions about “how to teach programming in Sámi languages and what are the cultural aspects of computing” remain unresolved.

### Endangered Language Perspectives

**Valts Ernstreits** from the University of Latvia Livonian Institute provided insights from his work with the Latvian-Indigenous Livonian population and global digital inclusion policies. He emphasized that technology currently caters primarily to the top 200 languages globally, leaving the vast majority of languages in secondary positions.

### Rights-Based Framework

**Aili Keskitalo**, former Sámi Parliament president and current Amnesty International advocate, provided a powerful rights-based perspective. She noted that “over 98% of the world’s languages lack basic digital tools, creating a threat of digital extinction rather than just a gap.” Keskitalo warned that “AI is not neutral and can replicate colonial logics if indigenous peoples are not involved from the beginning as rights holders, not just users.”

She called for “the shift from seeking permission to entering true partnerships with indigenous peoples as co-creators, applying free, prior and informed consent principles.”

### Industry Perspective

**Kevin Chan** from Meta outlined several company initiatives, including:

– Facebook translation capabilities for Inuktitut (developed over 5 years)

– The “No Language Left Behind” translator supporting 200 languages

– The Language Technology Partnership seeking community collaboration

Chan explained that the partnership seeks collaborators who can provide speech recordings with transcriptions (requiring about 10 hours of recordings) or written text samples to build new open-source speech technologies. He argued that open-source AI technologies “can be valuable for indigenous communities as they allow refinement, fine-tuning, and community ownership of adapted models.”

## Platform Restrictions and Technical Barriers

Moshagen articulated a key insight: “platform owners make life hard for most of the world’s languages, but probably mostly without realizing it. I don’t think there’s bad intent behind it. It’s just ignorance or negligence.” This observation highlighted how technological barriers are often artificial constructs created by platform policies rather than genuine technical limitations.

The panelists agreed that language technology for indigenous languages often works effectively in controlled environments but cannot be delivered through the applications and systems where users actually want to employ them due to platform restrictions and closed system architectures.

## Audience Engagement

**Henry Wang** from the audience raised an important question about alternative approaches, specifically mentioning the SOLID protocol and LingoAI as potential solutions for data ownership issues. This intervention highlighted ongoing technical discussions about decentralized approaches to language technology.

## UNESCO’s Global Vision

The discussion concluded with **Tawfik Jelassi** from UNESCO presenting the organization’s comprehensive vision for digital language equality. He emphasized that “indigenous communities must be central to technology design, development and governance, with their knowledge systems essential for ethical digital futures.”

Jelassi mentioned specific initiatives including the Mayan Language Preservation and Digitalization Project with Masterwords, and UNESCO’s Global Roadmap for Multilingualism, which aims to ensure that all language communities can thrive in the digital age with technology that is multilingual by design.

He concluded with a quote from Nelson Mandela: “If you talk to a man in a language he understands, that goes to his head. If you talk to him in his language, that goes to his heart,” emphasizing the importance of building “a digital future that requires linguistic justice, cultural dignity and inclusive technology that speaks to hearts through indigenous languages.”

## Key Themes and Challenges

### Data Sovereignty and Community Control

A fundamental theme was the need for indigenous communities to maintain control over their linguistic data and be involved as co-creators throughout technology development processes, not merely as data sources or end users.

### Regulatory Complexities

The discussion revealed tensions between data protection regulations like GDPR and the EU AI Act and the practical needs of indigenous language AI development, where enhanced protection requirements can create barriers to the data collection needed for effective AI systems.

### Rights-Based Approach

Multiple speakers emphasized the need to recognize language rights as human rights in digital spaces, shifting focus from treating indigenous languages as optional features toward recognizing them as fundamental rights that platforms should support.

### Open Source Solutions

Several panelists expressed support for open-source approaches as a way to provide communities greater control over their language technologies while leveraging existing technological infrastructure.

## Conclusion

The panel demonstrated broad agreement on fundamental principles while revealing the complex technical, legal, and ethical challenges that must be addressed to ensure indigenous languages can thrive in the digital age. The discussion highlighted both the potential of AI and language technology to support indigenous communities and the risks of further marginalization if these communities are not involved as rights holders and co-creators in technology development.

The session concluded with calls for sustained commitment to fundamental changes in how the technology industry approaches linguistic diversity, moving beyond market-driven approaches toward rights-based inclusion that recognizes indigenous languages as essential components of human cultural heritage.

Session transcript

MODERATOR: ♪♪ ♪♪ ♪♪

Sjur Norstebo Moshagen: Hello, and welcome, everybody, both here and online. We’ll start by getting some very nice words from Ole-Henrik Björkun Liefjell, who is a member of the Governing Council of the Sámi Parliament. Please.

Ole Henrik Bjorkmo Lifjell: Buribije Bores. Dear participants, I have the honour of opening and welcoming the people present here and the online participant to this panel discussion. The panel discussion that will highlight the importance of the subject of indigenous languages, technology, and AI. To start with, let me express thanks to UNESCO and IDIL for putting these subjects on the agenda and promote visibility of the international decade of indigenous languages. Indigenous and minority communities face barriers as there is limited digital infrastructure and digital tools supporting use of our languages. Large tech companies may not see indigenous languages as profitable markets, and most online content is dominated by a handful of global languages. Many indigenous communities have oral traditions as cultural preservations, and lack of written language is making digitization complex and sometimes inappropriate without community consent. To overcome these barriers, we need to follow up with some following actions. We need to reduce language loss and revitalize indigenous languages, and the technological development for indigenous languages needs to ensure the importance of digital inclusion of indigenous languages also in digital platforms and AI. To promote policy is also something we need to do, policy that use human rights principles and take accountability by use of national laws which will regulate and secure that indigenous communities have control and management of linguistic data collection that will benefit our own communities. AI-generated data innovations needs to be used in a non-discriminative way and respect indigenous cultures as well. We need to initiate for further collaborations and foster dialogues with big tech companies developers to include digital tools and language technologies for indigenous communities and speakers of indigenous languages. We need to remind each other that no language is too small to matter, and by elevating the challenges faced by indigenous and minority language communities, the communities are helping to pave a path toward a more equitable digital future for everyone. This panel debate is now opened, and I will encourage participants to establish connections to further exchange on the ongoing subject race under this Internet Governance Forum in the framework of the decade. Thank you very much.

Sjur Norstebo Moshagen: Thank you very much. My name is Sjur Nørsteberg Mosagen, and I’m going to head this panel debate. In my daily life, I’m heading the Sámi language technology work at the University of Tromsø, but today we’re going to discuss barriers to indigenous language technology and AI uptake. And to help me with this, I have online David Castillo Barra as a co-moderator for the online participants. He’s an international consultant specializing in the promotion of multilingualism and currently serves as a member of the Secretariat for the International Decade of Indigenous Languages at UNESCO. He supports initiatives related to UNESCO’s recommendation on multilingualism in cyberspace with a strong focus on fostering linguistic diversity in digital space. And David, maybe you would like to say a few words to present yourself.

David Castillo Barra: Thank you. Good afternoon. Am I audible? I don’t know if you can hear me. Thank you very much. Hello. Good afternoon and greetings from Paris. Thank you, Jules, for the presentation. I am here from the Secretariat of the International Decade of Indigenous Languages at UNESCO. I’ll be your online moderator today, so feel free to share your questions in the chat for those joining remotely, and I’ll pass them to our panelists during the question and answer session. So thank you very much. I would like to pass the floor again to Jules to introduce our panelists. Thank you.

Sjur Norstebo Moshagen: Thank you very much, David. Yes, the panelists of today. On site, we have Laisa Ilobongo, who is a professor in health technology at the Department of Computer Science at the University of Tromso, as well as an adjunct professor at the Sámi University College heading the Sámi AI Lab there. Online, we have Ohti Laiti from the National Audiovisual Institute of Finland. Ohti is a computer game researcher, designer, and media education specialist blending Sámi culture with tech and education. Then, again, on site, we have Valts Enstrids, who is a Livonian language activist developing digital tools for endangered languages, and he is working hard to shape the global digital inclusion policies. Then, the last one on site is Eilidh Keskitalo, who is a former Sámi Parliament president and an indigenous rights advocate now focusing on climate and just transition in Sápmi, and working for Amnesty International in Norway. And finally, online, we have Kevin Chan, who is working at Meta on global digital policy to empower indigenous languages online. So that’s our panelists for today. And before they are giving the word, I will say a few words on the topic of today. Sorry about that. So, a starting point for this could be the global roadmap for multilingualism in the digital era that UNESCO is working on right now. They have a draft, and I’ll quote a few sentences from the introduction. And remember, this is only a draft, but I think it’s quite well formulated and goes to the heart of the topic of today. The global roadmap for multilingualism in the digital era provides a strategic framework for advancing language technologies, promoting linguistic diversity and multilingualism, and ensuring that all language users from all language communities thrive in the digital age. Recognizing that language rights are integral to human rights, the roadmap aims to empower every individual to use and preserve the language in digital spaces. And the question today is, how is that, what’s the actual status, what are the problems, what are the obstacles to actually do what they are trying to do in that part? So, I will say a very few words on this. I’ve been working on language technology for the Sámi languages for the last 20 years, and what we have seen is that the conditions for third-party languages, language technology, is very different from the first-party languages. So, tools by Apple and Microsoft are treated very differently from tools by everyone else. So, there are serious problems for these tools, and often they are completely blocked. So, independent localization, for example, is also not possible, or it might not be accessible, or if it’s possible or accessible, it’s not distributable. There are no platforms for providing translations to a piece of software without asking. and getting permission from the original developer. And AI for indigenous languages and minority languages, that’s a quite open question. It’s open for probably many languages at the moment, but what you have seen so far is that bad output is dominating for these languages, partly due to lack of data, but also partly because of lack of community involvement and lack of quality assurance and testing and evaluation. And a major question in this discussion is how can one add one’s own language to models from big technology? How can Sámi or any indigenous language be added to the models from open AI, from Apple, from Microsoft, whoever? So what we can say is that, as I said, we have been doing this for 20 years. We know the technology, we know we can make it work on the technological level. What we cannot always do is deliver the tools in the apps and the systems and the context where users want to use them. That’s the major problem. Here are some examples here that we have experienced. I’m not going to spend much more time on that, but just one short one is spellers approving tools in online office applications. We have no possibility to install these tools using so that they behave as people expect them to. So the conclusion is that language technology for indigenous languages is often not possible, even though the technology is there and we know the technology. That’s not the issue. Platform owners make life hard for most of the world’s languages, but probably mostly without realizing it. I don’t think there’s bad intent behind it. It’s just ignorance or negligence. So we need a new approach to how human languages are included and how they are approached in the digital world. And that’s what we are going to discuss today. So then next up is Lars Ailo-Vongo.

Lars Ailo Bongo: Thank you. And thank you for inviting me to give this talk. So I’m going to talk about future issues that may sort of hinder the use of indigenous language and indigenous AI. So AI, it has a great potential to bridge maybe the most important equity gap that sort of indigenous people are exposed to, which is the lack of experts in fields like medicine or education that has the language and cultural knowledge needed to sort of understand and provide equitable services. So for instance, in psychology, there are very few tests that are normed on indigenous minority languages. So the tests basically don’t work well for indigenous people. So AI has the potential to provide something where there is nothing there before. And also in education, AI has a great potential to provide adaptive learning, which is very important for minority language speakers because the level of language knowledge often has a greater variation than the majority languages. So there’s a great potential for AI to sort of bridge this equity gap. But then again, if indigenous peoples are excluded from using AI, then we are at the risk that this equity gap will just widen when the majority people start using AI for their health service and their educational service. It’s very important that indigenous people are included in these new AI services. And luckily, this is regulated somewhat by law. So this is the report from the EU Act, EU AI Act, which basically says that it’s not allowed to discriminate minorities such as indigenous people. So if there is an educational service or a health service provided, it should work as well for minority people as for the other people. However, there is one big challenge, which is that the indigenous people and other minorities are considered data botanists, considered a special category. So this requires extra strong data protection. And this is, for instance, regulated by the GDPR law that says that it’s not allowed even to collect this data unless you have a really good purpose. But also, the EU Act says that it is allowed to actually do this, to collect ethical data if the purpose is to prove that this AI works as well for indigenous people as for other minorities. And I just want to illustrate the dilemma that the indigenous people and also the AI providers are facing. So let’s say that we want to do adaptive learning, which is maybe the application that is highest on the priority list of many indigenous people. So in order to do that, AI can help. But to do that, you need to build this AI and adaptive learning. One important component of that is cognitive tests. And this includes IQ tests. But if you want to do an IQ test that is equitable and works well for minority languages and cultures, you need to build that using the minority language and culture in mind. And that means you need to collect data from these minorities. And let’s say that these are indigenous children, then you need to collect basically indigenous IQ tests from indigenous people. And this is, of course, very controversial, because being an indigenous person myself, I know that we have historically been exposed to basically racist research where they sort of attempted to show that indigenous people are less intelligent. But I guess that if we want this to exploit all the opportunities that AI gives, including in the educational field, we must basically now start collecting this type of data. But luckily, we can do this in a much more ethical way than was done in the Dark Ages. So we can build some regulatory sandboxes that ensures that this data collection is done in an ethical and safe manner. And I think this is really important. We need to really start working on this in order to not leave the indigenous and other minority people and languages behind when the new AI tools are going to be used in important services like health and education. Thank you.

Sjur Norstebo Moshagen: Thank you very much, Lars. Next one out is Oti Laitti. So. Yeah. Please go ahead.

Outi Kaarina Laiti: Thank you, My slide somewhere. Is this one? Probably we need to share this Zoom. It should be shared. I can see only myself. Okay, on the screen shared in the stream here, it’s both you and the slide. Yes, I can see it on mine too. Okay, now I can see it, thank you. I’m going to dive in and thank you for having me as an indigenous woman. I come from the margins and it’s always a pleasure to be talking about computing. Programming has not been my passion. Games are, but since I was like three or five or something like that, I wrote my first line of code because I wanted to play games like Commodore 64 was a huge hit in 1980s. So when Finland introduced 10 years ago programming as a part of our basic education, it was the starting point of doing programming research because we have Sámi people living in North Finland and no one knew how to actually do this. The questions like how to teach programming in Sámi languages, what are the cultural aspects of computing, they still exist after 10 years of educating children in basic education. And this change was huge, like all teachers in all levels should teach programming, from crafts to gym teachers, they should all do it and starting from grade one. And I guess it has been 10 years, so we have like one generation of Sámi basic education programmers ready, or maybe they’re not. But anyway, can I get the next slide? And then the games. Nearly all the games I know that go under the digital Sámi game umbrella, they are all for education, and in that language education especially. We call this the serious games. And then we have a lot of developing content in games. This can go under the same umbrella of Sámi games when we are developing content in platforms like Second Life, Minecraft and so on, which I call the indigenous metaverse. It’s growing rapidly and most of these platforms are private-owned. Then we have semi-private platforms like, for example, in universities. York University has its own indigenous metaverse in development, and Helsinki University did Serendip, which is not indigenous, but it has some indigenous content. I have done extended reality projects since 2018 in Sámi Game Jam. The major issue is that you cannot actually use extended reality in language education, because we don’t have the tools to have discussions in virtual reality, if we don’t use like voice over IP or something like that. And it’s easier to use non-human centered design in games for multiple reasons, but ethical questions are one, for example, representation. If I’m doing non-playable or playable Sámi characters, what should I represent? And what are they talking if they are talking and how they are talking? These are all ethical questions. And next slide, please. There has been some progress because Finland introduced this programming in basic education. For example, National Audiovisual Institute has published these guides for media education and programming in three Sámi languages spoken in Finland. The picture is actually from a Skolt Sámi programming guide, which is the coolest thing I have seen for a while. We have huge Sámi media archives that has been used to train like automatic speech recognition tools. But the problem is that we are missing the text equivalent that we could actually use. We should have archives combining both, like the textual version of speech and the actual speech. So this development is quite slow. Thank you.

Sjur Norstebo Moshagen: Thank you very much. The next speaker is Valts.

Valts Ernstreits: Thank you for reminding me to this panel, which is really crucial for the IJF as well. Just a couple of words about my background. I represent originally Latvian-Indigenous Sámi population. Next year we will celebrate 35 years since official recognition in Latvia. I have been active in promoting Livonian issues for past 30 years. But last six years I have been working at the University of Latvia, Livonian Institute, which is specially established. One of the key action areas that we work with is building digital resources and looking to approaches for extremely under-resourced, scattered data conditions. Because the Livonian community is actually very small, we have less than 20 speakers in general. So we have to find the ways. This is pretty logical that we have been also recently quite active in maybe more global initiatives. I wanted to present those instruments that currently exist supporting developments in the digital area for Indigenous languages. As you all know, this is the International Decade of Indigenous Languages. Just last year there was a specially designated ad hoc group established on digital equality and domains. We were sure that there were also participants. This year there was one very interesting initiative that went out, which is a global survey on Indigenous languages. Which is closing in a couple of next months, which provides both data or perception of what is the actual state of Indigenous languages globally in the digital area. But also motivates those participating to think about technologies and issues that they have on the path to the digital equality. In February in Paris, a conference took place, Language Technology for All 2025. From that conference grew out new, maybe the freshest UNESCO’s initiative, which is Global Roadmap for Multilingualism in the Digital Era. This is the document that might define the future for languages, and especially for digital languages. Because it envisions the future where equal opportunities entering digital domains is ensured for all languages. Currently technology caters mostly those top 200 languages of the world. But the majority of the languages are somewhere in second row or in last row, as maybe some. And majority of those languages are Indigenous languages, so this is mechanism that in most work addresses the Indigenous issues. Very shortly, summarizing up the roadmap. Currently there is a roadmap consultation process, so you can easily look up at UNESCO’s webpage and take part of it. But summarizing up, there are three key moments in this roadmap. There are input issues, output issues, and everything regarding process. By input issues, it basically addresses the ability to produce and obtain digital data, which is kind of a precondition for any language to enter. So before we start the technology, we have to start with being capable of producing anything in digital format. Whether it’s sound data for spoken languages or written data, and there are lots of analog challenges before that. And also lots of restrictions. So there are countries that, for example, do not allow digital usage of certain languages or languages that simply don’t have any writing systems or access to technology. So this is one part. The second part is output. That was what Shur was talking about. So imagine if we have ability to produce digital data, we even have technologies like Sami languages, like Livonian, but we are not able to use them. We are not able to get them in. And not only on daily products, but also on cloud computing, on such products like games and educational instruments. So this is another aspect that we need to tackle. And basically what we want to achieve as the end goal is that technology is multilingual by design. So whatever language there is, any technology is adaptable to be used by users of that language. And regarding process, there are kind of in the middle of that roadmap sits the idea that communities, language speakers, have to be involved in one or another way in all the stages of technology development. This is not only an issue about how you handle data, but this is also about how technology is developed. This is a question of whether technology is published, if it doesn’t meet, for example, quality standards of the community and many more. Thank you.

Sjur Norstebo Moshagen: Thank you very much, Valt. Then Aili Keskitalo, please.

Aili Keskitalo: Thank you for the floor. And I’m here today as a Sami language user and as a mother raising now young women in a language that has often been pushed aside. in public systems, in education, and increasingly in technology. But I’m also here as an advocate for Indigenous people’s rights and human rights, believing that technology should serve rights and not markets. For us, it’s not just about innovation, it’s about justice. It’s about the right to exist fully in our own language, not only in traditional settings, but also in e-mails, in voice assistants, in learning apps, and eventually in AI systems. The ability to use your own language, including in digital spaces, is essential for dignity, for cultural continuity, and for meaningful participation in society. Today, over 98% of the world’s languages lack basic digital tools. And this is not a gap, it’s a threat. It means that unless we act, our languages risk going digitally extinct. But we see the potential, we have heard about it today. Sámi institutions, like the Sámi Parliament’s joint project, MIA Techno, are taking steps to develop language technology on our own terms, with open source tools, ethical frameworks, and strong demands for state responsibility. Still, we face barriers, and we have heard about them already today. Closed code from big tech, lack of funding, and not enough access to data to train the systems that we need. At the same time, as Lars Ailo already explained us, we must be careful. AI is not neutral. It can replicate colonial logics if we are not involved from the beginning, as rights holders, not just users. Language is power, and in this digital age, the right to speak your language must include the right to shape the tools that carry it forward. That is my message. Kiito. Thank you.

Sjur Norstebo Moshagen: Thank you very much, Aili. And the last speaker before we approach the questions, that is Kevin Chan from Meta. Please go ahead.

Kevin Chan: Thank you very much. Sure. It’s good to see you again. Maybe we just move to the first, the next slide, if you will. If we’re able to. Oh, there we go. So I wanted to start by just sharing that, you know, this is obviously and has been referred to by a few other people on the panel. Obviously, an important decade to be thinking about these very important issues. We are in the decade for indigenous languages. And at Meta, we have been putting together some initiatives, working closely with indigenous peoples and with UNESCO and other language partners to think through how we can help support with, in particular, some of our open source technologies with AI. And the previous panelists talked about a bit about open source versus closed source technologies. Open source technologies effectively are ones where we have built some kind of AI model, but then we make it freely available to anybody else who wants to use it. And what that allows you to do is you end up taking the model. You can refine it. You can fine tune it. You can add additional functionality and features to it. And then you own what it is afterwards. And so we do believe, I think, as was previously mentioned, we do believe that open sourced AI technologies can be a very valuable technology in this context. So I really want to just leave me just three things to start the conversation. One is an initiative that we helped drive in Canada with Nunavut Tungavik Incorporated, which is an entity up in Nunavut, which is Canada’s sort of Arctic territory, to translate the platform into Inuktitut. We also launched last year an online language translator for 200 languages with the help of UNESCO and Hugging Face, which is powered by Meta’s open source AI model called No Language Left Behind. And then there’s also a new language technology partnership that we recently announced as well. So if we can move to the next slide, please. I just wanted to, again, just call out the initial kind of one of the initial projects we did, which was announced, I guess it was maybe two years ago. We launched, again, with NTI’s help. And it really was a long kind of period of collaboration together because we wanted to do this properly. We want to do this in a way where we were welcomed by the community to do so. And, of course, the community had the expertise and we obviously didn’t. We want to make sure that we did this properly. And it did take about five years. But we were very, very pleased to be able to bring at least the desktop version of Facebook in a, I think, in a kind of minimal way because I wouldn’t say that everything was translated. But we had some key parts of the platform translated into Inuktitut. We were very pleased that the governor general of Canada, Mary Simon, who herself is Indigenous and Inuk, she shared the good news on Facebook when we launched that day. And very pleased that our friends at UNESCO were very supportive of us championing and making and helping to drive this kind of initiative during the international decade. The next slide, if I may. This is just a video. I think we can play it. Can you hover around the video or move the cursor? Oh, there we go. It’s a non-audio video. It just kind of shows you what this next initiative is, which is the No Language Left Behind online translator. Again, going 200 languages in terms of translation. It’s text-to-text. And it does include many, of course, not nowhere near comprehensive of the thousands of Indigenous languages that exist. But it does, among the 200, many of these languages are Indigenous languages. And you can see in the video, you select your origin language. You select the kind of language you intend to translate to. And you just put the text in. And there’s another window below that gives you the translated output. This, again, is something that is open source. And so folks are able to, for example, access the model on places like GitHub and then iterate on it. And so there is potential and opportunity to expand the language set to include other Indigenous languages. And you can do this freely. The technology is offered out to the world and to the community freely to do that. And then maybe the last slide, if I can. So this is the Language Technology Partnership, which is something we announced actually in Paris just earlier this spring in February. And it is a project that we’re once again working with partners around the world on. And that is to try to really push the frontiers of language technology, in particular trying to help support low-resource languages. And so what we have been seeking collaboration on, and we could obviously only do this with the agreement of partners, is to have made available different portions of data for languages to help train a model that we hope will be able to be quite powerful in terms of translating and in terms of transcribing different languages that are currently maybe not as supported as we would like. So what we are looking for are partners that can provide 10 hours of speech recordings. with transcriptions or some amount of written text, and here we’ve said we’ve specified sort of 200 plus sentences. And what we hope to do in the coming months with these partnerships is to again build new open source speech technologies, and of course our commitment would be that if we are successful in making some of these breakthroughs in terms of translation and transcription, we would want to then make these technologies freely open to the global language community for them to build applications and further the research. And if there’s interest in this, of course, please do feel free to reach out to me and I will try to do my best to connect you with the right teams that are looking at this. You can also, of course, search for this online. I think there is a portal where you can learn more and submit information if there is interest. So thank you very much, and I’ll pause here.

Sjur Norstebo Moshagen: Thank you very much, Kevin. Then all the panelists have introduced their slides, and we can go on to the questions. With over 7,000 languages in the world, it’s clear that no platform can realistically support them all by themselves. Platform owners constrained by security concerns and limited resources often end up centralizing control over language availability, leaving many communities without access to their own languages in the digital space. How can you shift this mindset, and what would it take for them to open up the platform so communities can manage their own languages in the digital space? No questions asked. I was thinking that maybe Lars Ailo could give the first comment on this one. So maybe I’m approaching this issue a bit differently

Lars Ailo Bongo: from the other panelists, because I’m not interested in building the models and thereby not having that need to sort of have a platform to run this, but more the AI applications that will use the technology that is provided, hopefully, by this kind of platform. So my concern is more on the sort of the practical issues of being allowed to do this, and that we need to address also the challenge of not just building the models, but also the applications, and especially the high risk that are useful in or will be used in education, health service, and other important

Sjur Norstebo Moshagen: public service. Thank you very much. Then Oti, what is your take on this question?

Outi Kaarina Laiti: I have a short answer to this question, and I’m speaking from the perspective of, for example, basic education, where we see language as a human right, of course. So we should shift focus from a feature or localization or a liability towards that language is a human right, and it is a human towards that language is a human right, and it is that on platforms as well. So that strengthens the platforms when, this is a philosophical question, so this is a philosophical answer, but we need to kind of start seeing the possibilities in this, and instead of and not talk about localizations anymore.

Sjur Norstebo Moshagen: Thank you very much. And Kevin, what do you think about what Oti just said?

Kevin Chan: Can you hear me? I just had to unmute it. It sounds like it’s okay. Yes, you can hear me? We hear you. Great. Yeah, I mean, I actually agree very much with what was expressed with the panelist intervention, which is that it may not be necessarily about the models themselves, but more about the application layer. There is, I think, again, going back to what I had mentioned about open source models, and there are many, Meta makes some, but there are other companies, obviously, that make them as well. This is, I think, going to be a very important vector by which Indigenous communities, people who are very committed to supporting, protecting, and promoting low resource languages, this is a very important way, I think, by which you can actually see applications built on top, precisely because the models are free for people to use. And so, with the right amount of training and work to build applications on top of these models, you very much, I think, can get models that are conversant in different languages.

Sjur Norstebo Moshagen: Okay, thank you very much. I think due to time constraints, we go on to the next question. AI technologies still rely heavily on large volumes of text data, even as those requirements gradually decrease as technology develops. How can you ensure that AI is developed for Indigenous and minority language communities in a way that keeps data ownership and control of linguist data in the hands of those communities? And how can we ensure that AI-generated content is of such quality that it supports rather than harms the language and its speakers? Valts, what do you think about this? Yeah, I would probably take this question the same way I would

Valts Ernstreits: approach the previous one. So, this is basically a mind shift, because, well, there is this thing. So, in order to use a language in whatever technology, we need large amounts of data. And that data, and especially for small communities, it’s always hypersensitive. So, you don’t even need sensitive data to actually collide with GDPR, which is already there. But what is actually needed is this community that, what I mentioned previously, sort of community involvement in all stages of technology, because we need community contribution in order to get technology running. But at the same time, we need to make sure that technology that is produced, it is not harmful, it is ready, it corresponds for what the community needs. And so, there is no other way around. And this is not done by a legislation that much. This is really a mind shift, because we run, for example, with developers, even with academia, who should be kind of very well aware of issues. We run in those situations that we have to explain them that, well, why this is not working, like, why this is not okay. And we do need mind shift in listening to indigenous people in the whole stages of the process.

Sjur Norstebo Moshagen: Thank you very much. Eilidh, would you like to say something?

Aili Keskitalo: Yes. Yes, I would start with agreeing with Valtz on the demand of shifting of the mindset. And I think the shift will need to be from thinking about getting permission to entering into true partnerships with the indigenous peoples, with the language communities. And so, that the language users are not just passive users, but co-creators. And that would maybe build the trust that is needed for data collection. So, because it is, of course, about data sovereignty as well. And, well, the principles, when it comes to the principle often used in other contexts, when it comes to indigenous people’s rights, is the principle of free, prior and informed consent. And that should be used also when it comes to data collection and the application of that data. Thanks.

Sjur Norstebo Moshagen: Thank you very much. Time is running way too fast for us. So, I think we should see… How is it, David? Do we have any questions from the online audience?

David Castillo Barra: No, we don’t have any online questions. So, I think you were very, very clear. Thank you.

Sjur Norstebo Moshagen: Do we have any questions from the audience in the room? Yes, we have one. Please.

Audience: Dear distinguished speakers, I’m so excited because this panel respects the indigenous language and the cultures so much, and also we have META and to work with UNESCO and to protect the languages and the cultures. So I think there’s a strong conflict between the data ownership and the way to collect the data. So if we want to help the large language models work well for the indigenous languages, we have to turn out the data and then fine-tuning the large language model. So the contributor becomes the users, unfortunately. So that is because the traditional architecture of the Internet. So META has to be a centralized platform. So there’s no way to solve this, but there’s a new paradigm shift. A new protocol was invented fully, and this year it will be triggered. So with data ownership, apply for GDPR, and also this data can be collected in a way that is anti-digital colonization. So I’m Henry Wang from Singapore IGF. I’m the founding member of the Singapore Internet Governance Forum, and also I’m the co-founder for LingoAI. So LingoAI works with the founding father of the World Wide Web called Sir Tim Berners-Lee. So he invented HTTP, and our Internet became centralized. Then he felt sorry about this, then he invented a new protocol to correct the Internet. So the new protocol called SOLID, and LingoAI works with SOLID and MetaLife, and we can collect the data with data ownership owned by all the contributors and the users. And the datasets can also be used by authorization, by permission, by the large language models companies worldwide or locally. So local data is important because on-device models can work with local datasets, become everyone’s AI agent. It’s your personal agent. So we are working on this solution, and it’s also already available. For example, semi-contributors can all contribute data, but they control to part that they own by themselves. They can authorize to Meta, authorize to OpenAI, authorize to no aging large language model companies, but they keep the ownership and fully apply it for GDPR. So I’m so happy to join this, you know, as attendees of this panel. So my question will be, so if we have such solutions, if Internet have such protocols, are you willing and to try and to work and with this way and to help to protect the indigenous languages and the cultures based on the languages? Thank you.

Sjur Norstebo Moshagen: Thank you very much. Just a few seconds. Okay. The ADG would like from UNESCO would like to have some closing remarks. We might have time for one short question after that. We’ll take the closing remarks now. So please go ahead, Tawfiq Jalassi.

Tawfik Jelassi: Good afternoon to all of you, Excellencies, distinguished panelists, esteemed participants. As we come to the close of this important session, I would like first to express my sincere gratitude to all the speakers and participants for their substantial and insightful inputs which we had this afternoon. And I would like also to extend special thanks to Mrs. Stenson, the Minister of Local Government and Regional Development of Norway, who has always showed us commitment, engagement and support, especially in the context of the international decade of indigenous languages. The commitment and leadership of Norway has been instrumental in advancing our shared goal to safeguard and revitalize indigenous languages in the digital age. Also I would like to express my deep gratitude to the members of the international decade of indigenous languages, members of the ADG for their invaluable contributions, especially to the global survey which took place on indigenous languages. I’m also excited to see the survey’s findings and to explore how we can further collaborate with indigenous communities and nations worldwide to advance this vital work. The title of this afternoon’s session, It’s Not Just the Tech, reminds us of a fundamental truth. Technology alone cannot solve the challenges that we face. And yes, indigenous language technologies exist and AI holds transformative potential. However, if the systems in which these tools are, if the systems are not inclusive, do not respect cultural and linguistic rights, then the technology by itself is just another barrier instead of fully playing its role as a bridge between communities and cultures. I think we heard this clearly today. The barriers to meaningful uptake of indigenous language technologies are not technical. They are structural, they are political and they are ethical. From the spread of proprietary platforms to restrictive data protection regimes to the persistent exclusion of indigenous peoples from digital policymaking, these are the conditions that determine whether indigenous languages can truly thrive in cyberspace. At UNESCO, we stand with indigenous peoples to affirm the right to fully participate, to also have equal footing in digital space in their own languages. Indigenous communities must not only benefit from these technologies, they must be central to its design, its development and its governance. Their knowledge systems, their worldviews and linguistic heritage are not just valuable, they are essential to shape an ethical and inclusive digital future. This is the vision behind the international decade of indigenous languages, not just preservation, but true empowerment. We are proud to support projects like the Mayan Language Preservation and Digitalization Project in partnership with Masterwords. This project has created new talking glossaries, localized websites and a universal Mayan keyboard, now empowering millions of speakers of this language across the Americas. Still many challenges remain. AI systems continue to reflect linguistic hierarchies, data remains scarce or inaccessible and indigenous women and girls face barriers in accessing and shaping these technologies. We must address these gaps by investing in open, community-driven innovation and in promoting gender-responsive digital inclusion. As a next step, UNESCO invites you all to contribute to the Roadmap for Language Technologies, a roadmap which is now online for public consultation. Your contribution will help us shape this global process. In closing, let me share the wise words of Nelson Mandela, who said, quote, If you talk to a man in a language he understands, that goes to his head. If you talk to him in his language, that goes to his heart, end of quote. Let’s work together to build a digital future that speaks not only to minds but to hearts through linguistic justice, cultural dignity and inclusive technology. Thank you.

Sjur Norstebo Moshagen: Thank you very much. And that’s the end of the panel discussion. Time is out. Thank you all participants and the audience. Thank you very much. Thank you.

MODERATOR:

Ole Henrik Bjorkmo Lifjell

Speech speed

119 words per minute

Speech length

355 words

Speech time

178 seconds

Indigenous communities face limited digital infrastructure and tools, with large tech companies not seeing indigenous languages as profitable markets

Explanation

Indigenous and minority communities encounter barriers due to insufficient digital infrastructure and tools supporting their languages. Large technology companies do not view indigenous languages as profitable markets, and most online content is dominated by a handful of global languages.

Evidence

Most online content is dominated by a handful of global languages, and many indigenous communities have oral traditions as cultural preservation with lack of written language making digitization complex

Major discussion point

Barriers to Indigenous Language Technology Access

Topics

Development | Sociocultural

Indigenous communities must have control and management of linguistic data collection that benefits their own communities, following human rights principles

Explanation

Policy development should use human rights principles and be supported by national laws that regulate and secure indigenous communities’ control over linguistic data collection. This ensures that data collection benefits the communities themselves rather than external entities.

Evidence

Need for national laws which will regulate and secure that indigenous communities have control and management of linguistic data collection, and AI-generated data innovations need to be used in a non-discriminative way

Major discussion point

Data Sovereignty and Community Control

Topics

Human rights | Legal and regulatory

Agreed with

– Aili Keskitalo
– Valts Ernstreits

Agreed on

Indigenous communities must have control and ownership over their linguistic data

National laws must regulate and secure indigenous community control over linguistic data collection

Explanation

There is a need for legal frameworks at the national level that will regulate and ensure indigenous communities maintain control and management over the collection of their linguistic data. This legal protection is essential to prevent exploitation and ensure community benefit.

Evidence

Policy that use human rights principles and take accountability by use of national laws which will regulate and secure that indigenous communities have control and management of linguistic data collection

Major discussion point

Regulatory and Legal Framework Challenges

Topics

Legal and regulatory | Human rights

Sjur Norstebo Moshagen

Speech speed

124 words per minute

Speech length

1401 words

Speech time

674 seconds

Platform owners make life difficult for most world languages through closed systems, often without realizing it due to ignorance or negligence

Explanation

Platform owners create barriers for the majority of the world’s languages through their closed systems and restrictive policies. This is typically not done with malicious intent but rather stems from ignorance or negligence about the needs of minority language communities.

Evidence

Aili Keskitalo

Speech speed

93 words per minute

Speech length

438 words

Speech time

281 seconds

Over 98% of the world’s languages lack basic digital tools, creating a threat of digital extinction rather than just a gap

Explanation

Agreed with

– Ole Henrik Bjorkmo Lifjell
– Valts Ernstreits

Agreed on

Indigenous communities must have control and ownership over their linguistic data

Disagreed with

– Lars Ailo Bongo

Disagreed on

Data collection approach and regulatory challenges

Lars Ailo Bongo

Speech speed

147 words per minute

Speech length

785 words

Speech time

319 seconds

AI has great potential to bridge equity gaps for indigenous people in fields like medicine and education where cultural and linguistic expertise is lacking

Explanation

Artificial intelligence could help address significant equity gaps that indigenous people face, particularly in areas like healthcare and education where there are very few experts with the necessary language and cultural knowledge. AI could provide services where currently nothing exists, such as culturally appropriate psychological tests or adaptive learning systems.

Evidence

In psychology, there are very few tests that are normed on indigenous minority languages, so the tests basically don’t work well for indigenous people. In education, AI has great potential to provide adaptive learning which is important for minority language speakers

Major discussion point

AI Development and Indigenous Language Inclusion

Topics

Development | Human rights

Disagreed with

– Kevin Chan

Disagreed on

Approach to AI development for indigenous languages

Indigenous people face a dilemma as data subjects requiring extra protection under GDPR, yet needing data collection to ensure AI works equitably for minorities

Explanation

There is a fundamental tension between data protection laws that classify indigenous people as a special category requiring extra strong protection, and the need to collect data from these communities to ensure AI systems work fairly for them. This creates a challenging situation where the very protections meant to help may hinder equitable AI development.

Evidence

Valts Ernstreits

Speech speed

121 words per minute

Speech length

926 words

Speech time

459 seconds

Technology currently caters mostly to the top 200 languages globally, leaving the majority of languages, especially indigenous ones, in secondary positions

Explanation

Current technology development focuses primarily on approximately 200 languages worldwide, while the vast majority of languages, particularly indigenous languages, receive little to no technological support. This creates a hierarchy where most of the world’s linguistic diversity is relegated to secondary status in the digital realm.

Evidence

Agreed with

– Ole Henrik Bjorkmo Lifjell
– Aili Keskitalo

Agreed on

Indigenous communities must have control and ownership over their linguistic data

Outi Kaarina Laiti

Speech speed

121 words per minute

Speech length

Human rights | Sociocultural

Agreed with

– Valts Ernstreits
– Aili Keskitalo

Agreed on

Need for fundamental mindset shift in technology development approach

Kevin Chan

Speech speed

137 words per minute

Speech length

1248 words

Speech time

545 seconds

Open source AI technologies can be valuable for indigenous communities as they allow refinement, fine-tuning, and community ownership of adapted models

Explanation

Open source AI models provide significant advantages for indigenous language communities because they can be freely accessed, modified, and customized to meet specific community needs. Unlike closed systems, open source technologies allow communities to take ownership of the adapted models and continue developing them independently.

Evidence

Meta’s No Language Left Behind translator covers 200 languages including many indigenous languages, and the Language Technology Partnership seeks 10 hours of speech recordings with transcriptions to build new open source speech technologies

Major discussion point

AI Development and Indigenous Language Inclusion

Topics

Development | Infrastructure

Disagreed with

– Lars Ailo Bongo

Disagreed on

Approach to AI development for indigenous languages

Meta has developed initiatives including Facebook translation to Inuktitut, No Language Left Behind translator for 200 languages, and Language Technology Partnership seeking community collaboration

Explanation

Meta has launched several specific initiatives to support indigenous languages, including translating Facebook into Inuktitut in collaboration with Nunavut Tungavik Incorporated, creating a 200-language translator, and establishing a partnership program that seeks community collaboration to develop new language technologies. These efforts represent concrete steps toward including indigenous languages in major technology platforms.

Evidence

The Inuktitut Facebook translation took five years of collaboration with the community, the No Language Left Behind translator is freely available on platforms like GitHub, and the Language Technology Partnership seeks partners who can provide speech recordings and text data

Major discussion point

Practical Implementation and Solutions

Topics

Development | Infrastructure

Audience

Speech speed

112 words per minute

Speech length

408 words

Speech time

217 seconds

New protocols like SOLID could enable data ownership by contributors while allowing authorized use by language model companies

Explanation

A new internet protocol called SOLID, invented by the founder of the World Wide Web, could solve data ownership issues by allowing indigenous language contributors to maintain ownership of their data while authorizing its use by AI companies. This approach could prevent digital colonization while enabling language model development.

Evidence

SOLID protocol works with LingoAI to collect data where contributors control ownership and can authorize use by Meta, OpenAI, or other companies while maintaining full GDPR compliance and ownership rights

Major discussion point

Practical Implementation and Solutions

Topics

Infrastructure | Legal and regulatory

Tawfik Jelassi

Speech speed

108 words per minute

Speech length

606 words

Speech time

336 seconds

Indigenous communities must be central to technology design, development and governance, with their knowledge systems essential for ethical digital futures

Explanation

Indigenous peoples should not merely benefit from digital technologies but must be at the center of how these technologies are designed, developed, and governed. Their knowledge systems, worldviews, and linguistic heritage are not just valuable additions but are essential components for creating ethical and inclusive digital futures.

Evidence

UNESCO supports projects like the Mayan Language Preservation and Digitalization Project which created talking glossaries, localized websites and a universal Mayan keyboard empowering millions of speakers

Major discussion point

Vision for Digital Language Equality

Topics

Human rights | Development

Building a digital future requires linguistic justice, cultural dignity and inclusive technology that speaks to hearts through indigenous languages

Explanation

Creating an equitable digital future necessitates more than just technical solutions – it requires linguistic justice, respect for cultural dignity, and truly inclusive technology development. Drawing on Nelson Mandela’s quote about speaking to people in their own language, the goal is to build technology that connects with people’s hearts and cultural identity, not just their minds.

Evidence

Nelson Mandela’s quote: ‘If you talk to a man in a language he understands, that goes to his head. If you talk to him in his language, that goes to his heart’

Major discussion point

Vision for Digital Language Equality

Topics

Human rights | Sociocultural

David Castillo Barra

Speech speed

163 words per minute

Speech length

118 words

Speech time

43 seconds

UNESCO’s International Decade of Indigenous Languages Secretariat supports multilingualism in cyberspace with focus on fostering linguistic diversity in digital spaces

Explanation

David Castillo Barra represents UNESCO’s Secretariat for the International Decade of Indigenous Languages and works as an international consultant specializing in multilingualism promotion. His role involves supporting initiatives related to UNESCO’s recommendation on multilingualism in cyberspace with a strong emphasis on fostering linguistic diversity in digital environments.

Evidence

He serves as a member of the Secretariat for the International Decade of Indigenous Languages at UNESCO and supports initiatives related to UNESCO’s recommendation on multilingualism in cyberspace

Major discussion point

Vision for Digital Language Equality

Topics

Human rights | Sociocultural

MODERATOR

Speech speed

5 words per minute

Speech length

3 words

Speech time

31 seconds

The session opens and closes the panel discussion on barriers to indigenous language technology and AI uptake

Explanation

The moderator provides structural support for the panel discussion by opening and closing the session with musical transitions. This represents the formal framework within which the substantive discussions about indigenous language technology barriers take place.

Evidence

Musical transitions at the beginning and end of the session

Major discussion point

Panel Structure and Format

Topics

Sociocultural

Agreements

Infrastructure | Development | Sociocultural

Human rights | Development | Sociocultural

Unexpected consensus

Arguments

AI has great potential to bridge equity gaps for indigenous people in fields like medicine and education where cultural and linguistic expertise is lacking

Open source AI technologies can be valuable for indigenous communities as they allow refinement, fine-tuning, and community ownership of adapted models

Summary

Lars Ailo focuses on building AI applications for high-risk areas like education and health services, while Kevin Chan emphasizes providing open source models that communities can adapt themselves. Lars Ailo is more concerned with practical applications, while Kevin Chan focuses on the foundational technology layer.

Topics

Development | Human rights

Data collection approach and regulatory challenges

Speakers

– Lars Ailo Bongo
– Aili Keskitalo

Arguments

Indigenous people face a dilemma as data subjects requiring extra protection under GDPR, yet needing data collection to ensure AI works equitably for minorities

The shift must be from seeking permission to entering true partnerships with indigenous peoples as co-creators, applying free, prior and informed consent principles

Summary

Lars Ailo emphasizes the technical and legal challenges of data collection under GDPR while advocating for regulatory sandboxes, whereas Aili Keskitalo focuses on fundamental partnership approaches and indigenous rights principles. They differ on whether the primary solution is regulatory reform or relationship restructuring.

Topics

Legal and regulatory | Human rights

Unexpected differences

Role of regulatory frameworks versus community partnerships

Speakers

– Lars Ailo Bongo
– Aili Keskitalo

Arguments

Regulatory sandboxes are needed to ensure ethical and safe data collection from indigenous communities for AI development

AI is not neutral and can replicate colonial logics if indigenous peoples are not involved from the beginning as rights holders, not just users

Explanation

This disagreement is unexpected because both speakers are indigenous advocates, yet they approach the solution differently. Lars Ailo, despite acknowledging historical racist research, still advocates for regulatory frameworks to enable data collection, while Aili Keskitalo emphasizes that AI can replicate colonial patterns and focuses on rights-based approaches. This reveals a tension within indigenous advocacy between pragmatic regulatory solutions and principled rights-based approaches.

Topics

Legal and regulatory | Human rights

Overall assessment

Summary

The main areas of disagreement center around approaches to AI development (application-focused vs. foundational technology), data collection methods (regulatory solutions vs. partnership principles), and the balance between technical pragmatism and rights-based approaches.

Disagreement level

The level of disagreement is moderate but significant. While all speakers share the common goal of advancing indigenous language technology, they differ substantially on implementation strategies. This disagreement reflects deeper tensions between technical feasibility, legal compliance, and indigenous rights principles. The implications are significant as these different approaches could lead to very different outcomes for indigenous communities – from regulatory sandboxes that enable data collection to partnership models that prioritize community control, to open source solutions that emphasize technical accessibility.

Partial agreements

Human rights | Development | Sociocultural

Takeaways

Key takeaways

Indigenous language technology barriers are primarily structural, political, and ethical rather than technical – the technology exists but cannot be delivered effectively due to platform restrictions and closed systems

Over 98% of the world’s languages lack basic digital tools, creating a threat of digital extinction, with technology currently serving only the top 200 languages globally

AI has transformative potential to bridge equity gaps in medicine, education, and other services for indigenous communities, but risks widening gaps if indigenous peoples are excluded from AI development

Open source AI technologies offer more promise than closed systems for indigenous language development as they allow community ownership, refinement, and adaptation

Data sovereignty is crucial – indigenous communities must control their linguistic data and be involved as co-creators and rights holders, not just users, throughout all stages of technology development

The mindset must shift from seeking permission to entering true partnerships with indigenous peoples, applying free, prior and informed consent principles

Language should be viewed as a human right on digital platforms rather than as a localization feature or liability

Regulatory frameworks like EU AI Act and GDPR create both protections and challenges for indigenous language AI development, requiring innovative approaches like regulatory sandboxes

Resolutions and action items

UNESCO invites all participants to contribute to the Global Roadmap for Language Technologies, which is available online for public consultation

Meta’s Language Technology Partnership is seeking collaborators who can provide 10 hours of speech recordings with transcriptions or 200+ sentences of written text to build new open source speech technologies

Participants encouraged to establish connections for further exchange on indigenous language technology issues within the Internet Governance Forum framework

Need to develop regulatory sandboxes to ensure ethical and safe data collection from indigenous communities for AI development

Requirement to follow up with policy development using human rights principles and national laws to regulate indigenous community control over linguistic data

Unresolved issues

How to practically implement community control over data while meeting technical requirements for AI training that typically require large datasets

How to balance GDPR data protection requirements for indigenous peoples as ‘special category’ subjects with the need for data collection to ensure equitable AI performance

How to shift platform owners’ mindset from centralized control to allowing communities to manage their own languages without security concerns

How to ensure AI-generated content quality supports rather than harms indigenous languages and their speakers

How to address the fundamental conflict between data ownership principles and the centralized architecture of current internet platforms

How to scale solutions beyond pilot projects to achieve meaningful global impact for thousands of indigenous languages

How to ensure indigenous women and girls have equal access to and influence over language technology development

Suggested compromises

Use of open source AI models as a middle ground that allows community adaptation while leveraging existing technological infrastructure

Development of regulatory sandboxes that balance ethical data collection needs with legal protection requirements

Adoption of new protocols like SOLID that could enable data ownership by contributors while allowing authorized use by language model companies

Ernstreits identifies a critical gap between academic awareness and practical implementation, suggesting that even well-intentioned researchers and developers fail to understand indigenous perspectives. His emphasis on community involvement ‘in all stages’ goes beyond consultation to suggest genuine partnership and co-creation. The observation about academia being unaware despite their supposed expertise is particularly striking.

Impact

This comment reinforced the emerging theme about the need for fundamental mindset changes in how technology is developed. It supported and expanded on earlier points about community control and helped establish consensus among panelists about the inadequacy of current approaches, even in supposedly progressive academic settings.

Overall assessment

Explanation

This explores alternative technical architectures that could solve the conflict between data ownership and AI model training needs

Disclaimer: This is not an official session record. DiploAI generates these resources from audiovisual recordings, and they are presented as-is, including potential errors. Due to logistical challenges, such as discrepancies in audio/video or transcripts, names may be misspelled. We strive for accuracy to the best of our ability.