WS #119 AI for Multilingual Inclusion

16 Dec 2024 14:00h - 15:00h

WS #119 AI for Multilingual Inclusion

Session at a Glance

Summary

This discussion focused on the role of AI in promoting multilingual inclusion and expanding internet access to diverse language communities. Participants explored challenges in developing AI systems for less common languages and strategies to address these issues.

Key points included the importance of data collection and documentation of local languages to train AI models effectively. Speakers emphasized that communities should actively create and share content in their native languages online to build robust datasets. The need for improved internet connectivity in underserved areas was highlighted as a crucial step in enabling diverse language representation online.

The discussion touched on efforts by organizations like the Internet Society and Pan-African Youth Ambassadors on Internet Governance to promote multilingualism through training programs and community networks. Speakers noted the importance of innovation and local solutions in developing AI tools tailored to specific language needs.

Challenges discussed included the dominance of major languages in AI development, the potential loss of minority languages, and the need for greater representation in tech fields. The conversation emphasized the role of governments, academia, and industry in collaborating to advance multilingual AI development.

Participants stressed the urgency of preserving and promoting linguistic diversity online, calling for active engagement from communities to document and digitize their languages. The discussion concluded by highlighting the power of individuals and communities in shaping the future of the internet and ensuring linguistic inclusivity in the digital age.

Keypoints

Major discussion points:

– The importance of developing AI models and tools in multiple languages beyond just English

– The need for more data and content in diverse languages to train AI systems

– The role of governments, academia, industry and communities in promoting multilingual AI development

– Challenges in preserving minority languages and including them in AI/technology

– The connection between language, culture, and digital inclusion

Overall purpose/goal:

The discussion aimed to explore how AI can be leveraged to promote multilingual inclusion and expand internet access/content in diverse languages, especially for underserved linguistic communities.

Tone:

The tone was largely informative and collaborative, with speakers sharing insights and experiences from different perspectives. There was an underlying sense of urgency about the need to act to preserve linguistic diversity in the digital age. The tone became more action-oriented towards the end, with calls for participants to actively document and promote their languages online.

Speakers

– Jesse Nathan Kalange: Moderator

– Athanase Bahizire: Internet Society alumni, facilitator of Pan-African Youth Ambassador on Internet Governance, engineer

– Claire van Zwieten: Alumni specialist at Internet Society

– Ida Padikuor Na-Tei: From East African region (did not speak in the transcript)

Additional speakers:

– Alejandra (no surname available): Mentioned as able to provide information about Internet Society empowerment programs

– Miriam (no surname available): From Kenya, ambassador in PAYAG Swahili cohort

– Abdul Rehman: From Lahore, Pakistan

– Grace Ngijoi: From Cameroon

– Kenli Kosa: From Mozambique

– Abineth Sentayo: From Ethiopia

– Vlad Ivanets: Youth Ambassador of Internet Society

Full session report

Expanded Summary of Discussion on AI and Multilingual Inclusion

Introduction

This discussion focused on the role of artificial intelligence (AI) in promoting multilingual inclusion and expanding internet access to diverse language communities. Participants, including Internet Society alumni and youth ambassadors, explored challenges in developing AI systems for less common languages and strategies to address these issues. The conversation featured key insights from Athanase Bahizire and Claire van Zwieten, with moderation by Jesse Nathan Kalange.

Key Themes and Arguments

1. AI Development and Multilingual Inclusion

The discussion emphasized the critical need for developing AI models and tools in multiple languages beyond English. Athanase Bahizire highlighted that AI models require diverse language data to be truly inclusive. He explained, “For us to have AI models, we need to have data. It’s just like a human being, for you to start speaking, you need to listen. After listening, okay, you understand, you learn, then you can speak, then you can deliver. It’s the same with AI, it has to learn from the data and then deliver.”

Claire van Zwieten noted that the Internet Society promotes multilingualism in its programmes, recognizing the importance of linguistic diversity in AI development. The Internet Society works in four main languages (Arabic, Spanish, French, and English) for their official trainings, while their chapters work in many more languages locally.

Audience members pointed out specific challenges with current AI tools, such as difficulties with Swahili greetings and a Punjabi resume-building project, underscoring the need for improvement in this area.

2. Data Collection and Language Documentation

A crucial point of agreement among speakers was the importance of data collection and documentation of local languages to train AI models effectively. Athanase Bahizire stressed that documenting local languages and content is vital for AI development and cultural preservation. Claire van Zwieten concurred, noting that AI can help preserve endangered languages if properly developed.

Claire provided a concrete example of the Navajo tribe’s efforts to preserve their language using AI, which inspired further discussion about practical applications of AI in preserving minority languages and cultural heritage.

3. Connectivity and Content Creation

Athanase Bahizire highlighted the crucial role of connectivity in enabling diverse language representation online and AI development. He emphasized the importance of community networks in improving connectivity and enabling content creation in local languages.

4. Challenges in Multilingual AI Development

Several challenges were identified in developing AI systems for multiple languages:

– Lack of quality data in many languages

– Technical challenges in accommodating non-Latin scripts

– Limited representation of diverse languages in AI development

These challenges underscore the need for innovation and local solutions, as emphasized by Athanase Bahizire.

5. Promoting Language Equity and Inclusion

The discussion touched on several strategies to promote language equity and inclusion:

– Encouraging learning and use of multiple languages

– Ensuring public services support multiple languages

– Increasing diversity in language representation

– Supporting local chapters working in their languages

– Documenting cultural heritage and traditional knowledge

– Leveraging grants and funding for language preservation projects

6. Collaboration for Multilingual AI Development

Jesse Nathan Kalange highlighted the need for a multi-stakeholder approach involving government, industry, and academia to advance multilingual AI development. Claire van Zwieten discussed the Internet Society’s role in connecting the unconnected, while Athanase Bahizire stressed the importance of local initiatives and innovation.

Claire van Zwieten also emphasized the need for more women in ICT and the importance of mentorship. She highlighted the Internet Society’s efforts in empowering youth to become future internet leaders, including the Pan-African Youth Ambassadors on Internet Governance program, which Athanase Bahizire described in detail.

Thought-Provoking Comments and Their Impact

Several comments sparked deeper discussions:

1. Athanase Bahizire provided historical context about AI, noting, “AI is something that is just coming now, but we have been having AI systems from long ago, and they are still developing.”

2. Claire van Zwieten highlighted a critical paradox in AI development, stating, “AI is great for the digital divide because it helps bring some people up but it also very deepens it.”

3. Athanase Bahizire emphasized the importance of local innovation, stating, “We need to build our own systems.”

Conclusion and Future Directions

The discussion concluded by highlighting the power of individuals and communities in shaping the future of the internet and ensuring linguistic inclusivity in the digital age. Athanase Bahizire’s closing remarks stressed the importance of innovation and being part of the solution.

Several follow-up questions were raised, indicating areas for future exploration:

1. How to encourage local communities to produce better quality text content in their languages

2. Ways to empower local communities to use AI systems in their own languages without fear

3. Methods for tailoring AI to support underserved or minority-language speakers

4. Concrete strategies for documenting languages

These questions underscore the ongoing challenges and opportunities in developing multilingual AI systems and promoting linguistic diversity in the digital realm.

The session concluded with the announcement of gifts for participants, highlighting the collaborative and engaging nature of the discussion.

Session Transcript

Jesse Nathan Kalange: you you you you Can they hear me? Yes, can you? I think so Can you hear me? Yes, I can I can hear you and you can hear me. Perfect We’re here. Great Thank you for being a little bit late But, you know, technical issues. Okay, hello. We can kick off this. You have your mic. You’re going to be leading this session. I don’t think you need that if you’re talking. Yeah. Just try. Try. Hello. Hello. Perfect. Hello. Yeah. Okay, great. So the floor is yours. You start. And we have also people online and everything is working. Okay. Ready? Perfect. Yeah. Hello. Yeah. Good. Good evening here in Riyadh. Good morning. Good afternoon, everyone. Watching NASA. Thank you for joining to this section. And we’re going to have a very insightful section between the Pan-African U.S. Ambassador for Internet Governance, Internet Society Foundation, and ISOC alumni group. Today I have here Atanasi Bahaziri, who is from DLC, IGF. I also have Clary Van Zweten from Internet Society Foundation. Sorry, NASA will be joining us. And I also have Ida from also the East African region. We’re going to talk about AI for multilingual inclusion. And before we start, we’re going to have more discussion, which is practical. And we have some youth also here that we want to engage them and include them in this section. With the opening aspect of this section, we’ve seen so many challenges of expanding Internet access and its availability in local content across many languages. Whilst working towards this in advancing human rights and inclusion at digital age, AI for multilingual inclusion, we also discussed how Internet can be expanded to greater languages and inclusion in all aspects of. So, I’m going to talk a little bit about how we can make the internet accessible to everyone. So, let’s start with our work. Through the use of multilingual AI system, we can engage digitally-isolated population, people who are isolated from the internet in terms of language barrier. Granting equal access to information. In this, we want to improve digital literacy and education efforts in making internet content available to everyone. So, we are working on that. We are working on that in terms of five languages. It’s been a very awesome time to welcome my speakers here. I will give them just one minute for them to introduce themselves. So, looking at online, I have Clary here. Clary, if you can hear me, just a minute, just introduce yourself, then I will move to Atanasi, then the rest will join us. All right. Thank you.

Claire van Zwieten: So, first, hi, everyone. My name is Clary. I am the alumni specialist at the Internet Society. I have the wonderful job of working with our alumni, two of which are there on the stage today. And also, Ida, who is also here. Thank you so much for coming to this session, and I’m very excited for this exciting talk about how we can create more internet access for people who are not speaking the dominant languages of the internet. Thank you.

Jesse Nathan Kalange: All right. Thank you. Let me move to Atanasi.

Athanase Bahizire: Thank you so much. I’m Atanasi, an Internet Society alumni, and I’m one of the facilitators of the Pan-African youth ambassador on internet governance. I’m an engineer by profession, and when it comes to the IGF ecosystem, I coordinate the youth IGF in the DRC. I’m very happy and looking forward to the discussion. Thank you. All right. Thank you.

Jesse Nathan Kalange: Thank you so much. Ida will join us very soon, and we can start giving the remarks. I want to start from Atanasi. You mentioned facilitator for Pan-African Youth Ambassador for Internet Governance. Can you give a brief information about that, how multilingualism is focused on the training that you are doing at Pan-African Youth Ambassador for Internet Governance? And we dive into the next question for Clary. Thank you.

Athanase Bahizire: Thank you so much, Selby. So basically, we have been seeing a raise of participation of different actors in the Internet Governance Forum and other Internet Governance-related activities, but we realized that there was a lack of meaningful participation from Africa. When we tend to look deeply into it, we realized that many African countries don’t speak English and non-English-speaking countries, they tend not to be active. So we tried to play our part in the solution, and we came up with this initiative of the Pan-African Youth Ambassadors on Internet Governance. Basically, it has five cohorts with five different languages. The target is to train 1,000 young people per year within five different languages, so 200 per language. We have five cohorts, one from the Arabic court, the Portuguese court, the Swahili court, the English court, and the French court. What is very unique in this program is that we have introduced some African languages that are only spoken in Africa, and some other languages that are not widely spoken. so that we build capacity of the different participants, of the different African youth, so that they understand the stakes of the internet governance. And then we guide them through mentorship so they can join these discussions and be able to participate and also can contribute locally to different ideas in the different countries or region. So briefly, that’s what is about the Pan-African Youth Ambassadors on Internet Governance.

Jesse Nathan Kalange: All right, thank you, Atanasie, for highlighting on that. Clary, I wanna ask you, I wanna move to you. Before people, the fellowship, the ambassadors of internet society could come out as an ambassador, I know that they don’t just come out like that, there’s training. And I’ve seen some couple of training in also different languages internet societies working on in terms of closing that kind of barrier within languages and internet governance and other trainings. Can you also highlight on what internet society foundation and internet society as a whole is doing in terms of training, I mean, multilingualism within the world? Thank you.

Claire van Zwieten: That’s a great question. Thank you so much for asking. And at the Internet Society, as a global organization, we are committed to making sure that the internet is for everyone. That really is what we are striving for. And that is our goal, that’s how we think. And the internet cannot be for everyone unless everyone has access to it and can read it. And of course you cannot read the internet if you’re not speaking the language of what is written. So at the Internet Society, we really do our best to include as much interpretation as possible when we communicate with our community. We do so in Spanish, French, Arabic when we can, and we’re very committed to making sure that whenever we are speaking with our community or speaking with people beyond our community, we are providing them access to be able to understand what we are saying in the language they are most comfortable in. So we… really try to walk the walk when it comes to multilingualism and the internet by providing as many interpretations of options as we can do.

Jesse Nathan Kalange: Okay, all right, thank you. Alejandro is also here, but today this session is not going to be panel discussion. It’s going to be a group discussion as we are here. Before we go to the next discussion, we’ve seen that there are some AI language tools and AI system that is coming that we are training on large language models. I want to ask in the room so that we have the conversation with you before we come back to all that. In terms of language, communication with some of the AI tools like Googlebot, chatGPT, which is very common. Have you tried communicating with this AI language models, AI tools with your local language, and how does it look like? Are you still limited to start with English because it can provide only the answers in English? Have you tried different languages with this AI tool? So if you have that knowledge you can share with us. Yeah, you mentioned your name, your country, and your organization you are representing.

Audience: Thank you. Okay, hello everyone. My name is Miriam from Kenya. I’m an ambassador in PAYAG, another Swahili cohort. So personally I’ve interacted with chatGPT. I use it all the time, but for Swahili it’s a bit tricky because even the basic like greetings. I’m supposed to ask you habari yako and then the response is njema. So for the AI, it really hallucinates. It gives you wrong answers and then you have to tell it no that is the wrong answer and then the next time I ask you habari you respond with njema. So for the Swahili, it’s not really that good, but it’s really doing well. Maybe not for the common, because greetings are different. Maybe in Kenya and Tanzania they are a bit different, but generally Swahili it’s okay for me.

Jesse Nathan Kalange: Okay, all right. So

Audience: Okay, we have a, my name is Abdul Rehman and I’m from Lahore, Pakistan. We have worked on, with JGPT in Urdu and Punjabi, Urdu is a bit good, but in Punjabi it has some issues. We’re trying to build a platform for local, like daily wage earners, earners like plumbers and those people to make their resume by speaking their Punjabi accent or Punjabi language. So that’s a bit hard thing for open AI right now. That’s my take. Okay. All right. Yeah. Hello everyone. My name is Grace Ngijoi from Cameroon and concerning local languages is a bit, it’s a little bit complicated with our country because we have more than 200 languages, local languages. So, but actually there’s a, there’s a young Cameroonian who’s working on it, but he specialised on common languages with a huge, with a, at least with some majority, like Eundo, Basa and Bamilike because we have more people that come from that side. So, but it’s not yet effective, but at least with the news that we are having from him, it’s something that you can help us a lot in terms of communication with our local people.

Jesse Nathan Kalange: Okay. All right. Thank you. Do we have anyone again? Okay. So, and we’ll come back to another question. Let me go to this. We’ve seen the aspects of looking at how AI has been shifted among local languages. We have, you mentioned PAYAC has five languages, Swahili, Arabic, Portuguese, English and French. English is common. So, that’s the general that it is. Thank you. Okay. AI, most AI language models are trained on. Now, I want to ask the same question to go to Clary, then Clary also explain perspective from the ISOC side. We want to understand that, in terms of this digital language divide, because we see there’s a vast difference from English to the other five languages. How can we ensure that there is an equal AI technology access for all speakers in all languages?

Claire van Zwieten: It’s a great question. And I think that that points to one of the most fundamental questions and challenges that we face is having data to train on. It’s so much of the content on the internet is in English. And when AI systems need data to train on, many of the data sources they use are in English. So having ample data sources of other languages will be instrumental to making sure that we’re able to use AI for multilingual inclusion on the internet. And I think that until we’re able to gather enough data about every other language that we want, we’ll be able to use the internet way more accessibly, but it’s the biggest hurdle is really having LLMs being able to be trained on this data. So until we have more access and more content, we won’t be able to do it.

Jesse Nathan Kalange: Okay, all right, Atanasy.

Athanase Bahizire: Thank you so much. Let me just give a bit of context. I told you I’m an IT engineer by profession, and there is some stuff we don’t understand about AI. AI, basically we say AI artificial intelligence. Basically it’s the ability of the machine to mimic what the human brain can do, the task we can do. And we have seen a wide hype of AI now with the LLMs, the language models. And we think that AI is something that is just coming now, but we have- been having AI systems from long ago, and they are still developing, they are still developing. And something Claire just mentioned, very important. For us to have AI models, we need to have data. It’s just like a human being, for you to start speaking, you need to listen. After listening, okay, you understand, you learn, then you can speak, then you can deliver. It’s the same with AI, it has to learn from the data and then deliver. So there is this, when it comes to multiple languages, and there is this divide we have, that we think many of our languages are not documented. And that equals, they can only deliver, AI systems can only deliver from the data they’ve got. There is an issue now we have in different, our different communities now. We think when you use, you publish content in your local language, you feel like many people won’t see it. And so you can speak French, you can speak Swahili, you can speak Wolof, but you will go to English because you feel like that’s where you’ll get a wider audience. But then we create content on English and that data is going to feed English models and it will generate, it will help generate AI models in English. So one of the things I used to tell people is, if you want, if you feel like there is this disparity, the only way we can solve it now, as of AI is not that far, it’s still, okay, the hype is still new and we can document our local perspectives on the internet. So it’s about the data we put on the internet. So when you create content, yeah, create. you can make videos in your local language, you can publish articles in your local language and that will help feed these models. One other thing like we have seen use cases in countries like Rwanda in Africa, where they’ve managed to create a model that they can ask the AI any question related to legislations and it will give them answers and references like this is in this bill in this law and is under which article. So what is happening is that they’ve got all the legislations in their local languages, but they were in paper based. So what they’ve tried to do is to correct that data. They supported young entrepreneurs, actually young students, young startups with hackathon and competition and tell them, so you have to build a data project where you take this local bills and you digitalize them first of all, because that’s the first point. You get them online, that’s the second point. And then within the data, how can I say, within the data curriculum, within the data line, you need to have the data. Then there is what they call data cleaning. Some of the terminologies you see, they are not accurate. So they used to review that data, cleaning it to make sure it’s accurate and can go to this platform. And after they’ve got the data on internet, now they can start training the models on that data. And training, trying in machine learning, we used to say accuracy of your model. So you’re going to train your model, but sometimes it won’t respond accurately. Most of the time, even we see sometimes charge EPT that is really big, can’t respond accurately. So what we do is someone was talking about, you ask a question and then the AI can’t reply, but then you tell it’s the answer. And then next time when you ask it to respond with the good answer. So that is one of the three types of AI, I mean, machine learning. The first one is, you just, you give the answer. You said, this is a tomato. You give a picture of a tomato and you also tell it, this is a tomato. So when someone asks, it will say, this is a tomato. The other model is you give a picture of a tomato, you don’t say anything. And then it will tell you, okay, this is a tomato based on what it has got on the internet. The other way is you give it a picture of a tomato. It tells you it’s an orange. You say, no, this is not an orange, this is a tomato. It’s going to save the new data next time when you ask to be more accurate. And this is actually the best way of learning. The same as our children, like you tell, no, this is not good. So next time it knows, if I go this way, I may fall, this is not good. So next time it’s going to do it the proper way. So that’s the same thing with AI. And when it comes to multiple languages, it’s not the only way we can build strong AI models in our languages is by training it with the data we have. And so I really encourage us when in your usual life, when you want to do your work and you feel like you can do it in a non-English language, do it, it’s important. And we definitely will need at some point this various data.

Jesse Nathan Kalange: All right. Thank you. So Atanasis, you made a very good point. I’m interested in some part, but I will still come back to you later. Let me see if Aida is online and you can unmute. You see, with the AI perspective, it also starts from the literacy aspect of AI. We can feel that there is a gender disparities within the AI and also the multilingualism, whereby maybe some, most females can speak very good and well understand their local languages. But the fact that these tools cannot be connected to their languages so that they can navigate the use of AI, so it becomes very dividing when we are talking about AI. If Aida is there, maybe you can ask me, what do you think that we can do to close the gender disparity when it comes to the multilingualism and the use of AI? If Aida is not, maybe Clary can talk on that for us because I know Internet Society also promotes gender equality and that. Thank you.

Claire van Zwieten: Absolutely. I don’t think Aida is able to speak, so I’ll speak on her behalf. But the Internet Society is deeply committed to bridging the digital divide and including women in the Internet space is a huge part of that. So we really are committed to making sure that women have access to trainings and they’re able to gain all of the opportunities of the Internet the same way. And a lot of that has to do with making sure that women have access to training, women have access to mentorship. There are a lot of studies that show that women who go into technical fields but don’t have a mentor are less likely to complete whatever training they’re doing and less likely to succeed in that field. So I think it’s one of the issues that kind of piles on each other. So we need women in ICT and we need… So they can mentor the younger fellows who are coming in and need a woman to help guide them through that process because it is different to be in a male-dominated field. like the technical community tech really is. And something else I want to mention that you mentioned before is that AI is great for the digital divide because it helps bring some people up but it also very deepens it. Because if you’re in an area where your language is not represented on the internet, largely because you don’t have like very good access in your area, you’re not going to be able to use the benefits of AI either. So it compounds on itself very quickly, just as the issue of women in ICT does. And women do bring a lot of value and perspective to the field that is very necessary to keep it moving forward. And I look forward to even more organizations beyond just the Internet Society and PAIAG working to bring more women into the field.

Jesse Nathan Kalange: Okay, all right, thank you very much. And PAIAG, we also promote gender equality. So we make sure that the selection of people to learn about this new language models and multilingualism, we try as much as possible to get more people. As I mentioned that Rwanda as a country has done a policy AI chat whereby people can get information on that. Now, as I said, to get back to you on that, and also as an engineer, because there are some also technical stuff I need to ask you so that we all learn. I want to go back to my people because it’s an engagement we are doing with them. Do you think that, what has your country been able to do to develop maybe AI in terms of AI and multilingualism content? Has your government or your country been able to develop something out that you think that may be the future? Because I quite remember on the Arabic side, I was hearing that there are some training models that they’ve been trained on data set. But is it for the government? Can someone share insight on the perspective country then we can come back to discussion. see what can be done, as Clary was mentioning, that beyond PAIAGEN, ISOC, we can get other organizations to onboard. So if anyone has a contribution or question on what their government has been done, yeah, in terms of multilingualism.

Audience: Yeah, hi, everyone. My name is Vlad Ivanets. I’m the Youth Ambassador of this year of the Internet Society. And I’d like to share my experience, because I’m originally from Russia. And I know that we have the local company, which is quite big, like the big tech company, Yandex, which is also working on the LLM model. And it is quite popular, I would say, probably not only in Russia, but also beyond the country. And they really work hard on creating the competitive system. But they actually have some problems they’re encountering right now. And they said the internet lacking enough resources on the exotic languages, as they call it. And they’re really worried that they will not be able to create the sufficient tool for the AI tool that will be based on these languages. And actually, they also say that they’re running out of Russian resources. And it applies mostly to the high-level resources, because, yes, you can find a lot of information online. But is it qualitative enough? Can you really use it to build an effective model based on this? So they’re doubting this. And I think this is the question that should be addressed as well and discussed among us. How can we encourage the local communities to produce better quality of the text? And how we can empower them just to use AI systems and not be afraid of using it on their own local language. Yeah.

Jesse Nathan Kalange: All right. Thank you very much. Very nice question, because that was the next question that was coming. coming to also delve much into in terms of the culture, the dialects and other aspects related to AI. As someone is saying that I greeted AI in Swahili but the response is not as much as expected. So we see that the AI languages, because it has not been trained in Swahili, the culture and the dialects and the modalities within that kind of languages is being changed. And as my fellow ambassador is also saying that in terms of losing the quality of data, because we believe that the communities have a lot of local content they can produce. But what I also see is that we see a lot of disconnecting, disconnection between the rural people and how they can bring that down because people are not also connected. They don’t have internet access, they don’t have access to mobile. So the content creation in local language is also limited. And we can also look into that. Can I also have any other perspective in terms of some countries which are doing a work like how Rwanda is doing in terms of policy aspects where people are giving, and the government is supporting AI Muslim realism or promoting local contents through AI. Is it some innovations that the youth have been able to broaden? We can move it to the next question for our speakers. Okay. Yeah, Clary has something to share. Yeah, all right, Clary.

Claire van Zwieten: I’m based in Amsterdam, but I don’t sound like it because I spent most of my life in the United States. And there has been some amazing work done by the Navajo tribe, which is a tribe indigenous to the United States. And their language is disappearing. As new generations are growing up, they’re not learning Navajo the same way their parents or grandparents did. So there has been a huge push among young people in the region to make sure that their language is protected. through AI. So they’re helping feed learned language models through the documents and the text that they have to make sure that people who are learning Navajo in school are able to use ChatGBT in Navajo when they want to ask about their assignments or if they want to know something about their cultural history they’re able to ask it in their native language. So and because of that I mean language is so intrinsically connected to culture. So when you lose so much of your language you end up losing your culture in parts as well. So I give a lot of credit to the Navajo Nation and the work of their young folks in making sure that their language is protected as generations go on and the schools in that neighborhood no longer teach that language. So that is one example of a community around the world that’s using AI to make sure their language is not only preserved but is usable and has effective use in the coming centuries as it’s probably going to happen that their schools no longer teach it.

Jesse Nathan Kalange: Okay all right Clary. Let me come back to Atanasi. You mentioned that even in software engineering you see that the data that we feed in AI is not that much in terms of multilingualism and also the sources that we get those data, quality of data as such. Even the language model in terms of developing local content on AI models, do you think that there is an open source software that has allowed that we can have that multilingualism in that in terms of development from your perspective?

Athanase Bahizire: Thank you so much. Very good question. Actually it’s very quick. You know what, these big AI models we know they are developed in California. So what do you expect someone who is developing this system in California to put pigeon? How? So the idea here is innovation and the only way we can innovate is by being building our own systems. Yeah. Many of these, uh, basic, uh, source codes are open. So Slack, open AI at the basis, it was open source. There is, uh, at the Emirates in Dubai, one research, uh, university is working on AI, if I recall, FA is like for us, I don’t recall the name in Arabic, but so basically what’s the trying to do is to build a strong, uh, because at least they’ve got enough, uh, enough data in Arabic, but then they try to build a strong AI model that is first based in Arabic. Now it’s, uh, it has English. It has some other languages, but it was first, uh, based in Arabic. So you see it’s promoting their own perspective. So what I can see if we want multilingualism, it’s all about innovation. We need to build our own systems. And, uh, we have many of, uh, these resources like education in the past, it was all about going to big universities and, uh, you know, lending all these big opportunities, but now through the internet, as, uh, uh, Claire was saying the internet is an enabling tool and through the internet, we can learn some of these things, many of the software engineers you see today, they will tell you, uh, 90% of their skills, they didn’t learn it in university. So that is it. The internet is enabling us to do wonders. And so I can encourage, if you feel like you’re interested in one of these fields, you, you’ll find resources on the internet, Google on, you can start. There are many open source, uh, even Gemini, the previous Google has a level that is open source that you can take and build on something. We have a Congolese boy who tried to build on Gemini, a certain model that, okay, um, in our country, the traffic can be. terrible. So he’s trying to see building on Gemini. So he doesn’t really build the data set by himself, but is building on the open source AI to take it on our own traffic now and try to see if we can find solutions to you know, solve that particular problem. And it can happen. I’m hopeful that by the two or three years, he will be able to come up with a strong solution. So that is one example, some of the examples of the people are trying to leverage the open source resources. Another thing I could say is innovation is something it’s there is also this culture of loving what you do. I’ve seen a lady who studied economics in university, went to an MBA, but at some point, I was asking her advices in code. She decided to start coding and now she’s very good. She’s very good. And, and at some point, learning a new language, and I’m like, Oh, she knows this very well. It’s not like she studied curriculum in in tech, but she got access to, you know, different resources. And now she’s good at it. And she can do great. So I’m encouraging people, if you feel like you have the interest, you do what you can, and make sure you document your local perspectives, because that will really help in this inclusion in this diversification. There is one thing maybe when we talk about multilanguage IDN, we call it technically IDN. It’s non Latin scripts on internet. Just a quick example, when you get on your email, on your email inbox, an email that is studying, I don’t know, maybe let’s say in Amharic, a certain, it’s a name, arts, another script in Amharic dot something, you will definitely, when you see it, you will feel like it’s a spam. First thing you’d be like, hmm, what script is this one? You’d be like, no, this is a spam. But we are talking about multi-language. So that these other, we now have domain names that are in Arabic, in Russian, like even the Russian TLDs, they have a version that, RU, they have a version in Russian. The same in many other countries. In Egypt, they have one in Arabic. We have some other one in Chinese. So sometimes when it comes also to multi-language, it’s also about accommodating scripts that are non-English. And this technically, technically speaking, it’s a challenge to developers. So if many of the developers, that if you don’t really increase the budgets to accommodate this, they feel like it’s an extra work. So there is this spirit of wanting to go global. You see, I have an e-governance platform I’m developing, and you want people to apply for visas in Guinea or in Djibouti. Doesn’t you speak English or you use Latin scripts, but someone who is using non-Latin script, this start from left to right. So you’ve seen the email instead of, let’s say, athanas at gmail.com, it should be .com, gmail at athanas. And technically it’s possible, they do exist. But then for me as a developer, I need an extra layer of work to accommodate this kind of emails, of addresses on my platform. But when I have. of an ambition to go global. I can say, I was talking about the e-governance. Some people will come to apply for visa to go to G-booking, but your platform only accommodates scripts that are in English, they call the Latin script. Then someone who is from Bangladesh and is coming with, I don’t know, Bengali script, how will he apply? His email is just in that specific script. How would he apply, use your system? Because he can’t even log in, because he needs to have a different email. But when you want to go global, you will be accommodating this kind of technology. You say, maybe now we don’t have many users who are not from our country, but we expect in the next years, we’ll be having more people. So you design your systems with this inclusion in mind, that will really help us to get to the level of multi-language that we want to be to. Back to you, Fifi.

Jesse Nathan Kalange: Okay, all right. Thank you, Atanas. Before, you were saying that AI, and also Clary mentioned, without regulation from the prospective government, AI cannot be allowed to work functionally as how it’s supposed to work in every country, in terms of the multilingual promotion. Let me come to my people. If anyone have a question so far, then we’ll just wrap up with the last questions and answers to talk about in terms of AI regulation, then we will just wrap up with a session. Do we have any question online or in the room? Okay. Okay.

Audience: Well, I believe that to improve the language equity involves addressing different parts of the… representation of it. So, I grabbed here some points. Excuse me, can you mention your name and your… Sorry? Can you mention your name? I’m Kenli, Kenli Kosa from Mozambique. Okay, alright, thank you. So, I believe that to promote this language equity, I have some points here that I grabbed that could actually help to have this kind of, all the cultural languages to be involved in this. So, I have here, by promoting multilingualism, which is encourage the learning and use of multiple languages in school and communities. I have here, actually, in case language access, which is ensure that the public services provide materials and support multiple languages. I have one more here, which is increase the representation of it. That is to promote diversity in language and representation in literatures, media, and in academia. So, with these points, I believe that they can bring the language equity for all of us around the world.

Jesse Nathan Kalange: Alright, do you have any questions too? Yeah. Clary, do you want to say something?

Claire van Zwieten: Yes, I just want to share that I really like that point that the audience member posed, because representation is such a big part of this, and when you have a group of people who are building a model that is supposed to be for the whole world to use, it is still being created in the context of the model builder’s culture and language. So, when we don’t have people building models from their cultural and linguistic perspective, that means they’re always going to be adapting to the other side. So, as we’re talking about us adapting to different addresses, such as Amharic, and the way that they would be reading differently, it’s important to recognize how long they have been adjusting to us. So, I just want to take a second to highlight that there are many people in the global minority, or the global majority, who are not able to use these systems the way they would like because it wasn’t developed in their context. And that requires way more representation in academia, way more representation in the technical fields, so we have more cultures and more languages involved in creating these kinds of models. Thank you.

Jesse Nathan Kalange: Any questions again? Okay. We have two people there. So from Abineth, then we’ll go to Marion. Okay.

Audience: Thank you so much. My name is Abineth Sentayo from Ethiopia. So the session title, it says, AI for Multilingual Inclusion is Mostly Essential and Inspirational for a Country like Ethiopia because Ethiopia has more than 80 nations nationally live together and also more than 80 nations, 80 languages spoken in that area. So my question is that within 80 languages, more than 80 languages, there is minority-speaking language and also the majority-speaking language will be found in that place. So within the context of internet governance, internet governance, promoting digital equity means ensuring different ethnic and linguistic groups are not left behind technological advances. So my question is that how can AI drive to be tailored to support just particularly for undeserved or minority-language speakers? Thank you.

Jesse Nathan Kalange: Okay. So Clary, then Atanasi, are you going to answer this one? So let me just also add on top of it. It’s a very good question asking because for Clary mentioned that internet society is currently focused on Spanish, Arabic, English, then French. So… These are, when you take maybe, let’s say, central of Africa, we could see that we have about five or more countries who speak French. When you come to West Africa, let’s say, my country Ghana, Nigeria, other countries also speak English. When you go to the North Africa, Egypt, Libya, and other Morocco, they speak Arabic. So they are very OK with that. Within East Africa, Swahili is very dominating among Kenya, Uganda, Tanzania, Rwanda, and stuff. But even in Uganda, what I have experienced is that it’s not that kind of basic language on there. They also have their own language. Now, Ethiopia is part of East Africa, but they also have their own language that they are speaking. Now, we are focusing in terms of, when we group Arabic, French, Portuguese from Mozambique, Sao Tome, Angola, Cabo Verde, they are speaking Portuguese, but sometimes it’s quite tricky and very different. These are grouped. So these are what he is classifying as major languages, because when you pick Spanish, you can get about five or more nationalities that speak Spanish. When we take off French, you can see even in Europe, Africa, other countries is there. When we take off English, it’s a very universal, common language everywhere. So we are working on that. But there are some countries which have different languages that they speak, apart from English, which is also a national language that most of them are speaking, which means that we are not considering that as an inclusion because it is a minor language. And most focus is getting people who can get about five to ten countries who speak this language. So he’s asking that, based on that inclusion that we are talking about. are we also going to look out for these minor languages, maybe let’s say one country is also, maybe about 2 million people in the country or 20 million people are speaking the same language, but it is a minor language because no other country is speaking the same thing. So you want to understand that inclusion. So Atenasi, you can go and then after that, Clary will also give the thoughts what Internet Society is also trying to do.

Athanase Bahizire: Thank you so much. Very good question, actually. I want to give you a perspective here. We have European countries like Slovenia, Romania and so on, who have a population of less than 5 million people, but their languages are on Google, on open AI and everything. And they have a strong foundation in their online presence. In the reverse, there are some countries, there are some languages like the Wolof, which is widely speaking more than one, more than three African countries. We have like the Hausa, is spoken in more than four countries. And we’ve like around 200 million people speaking the language. But if you’re talking about many language in the sense of the population, the population that speak Wolof may be bigger than the population that speak Romania, blah, blah, blah. So the idea is here is not only, it’s not really about the language itself, but it’s about how you document your own language. So that what I was saying again, I encourage you to document your languages. The other thing is after documentation, there is a connectivity. At some point Claire talked about it. There is a case in India. Like 10 years ago, they were at the same level as many of the African countries. But they’ve. got a solution, they have what they call a network of connectivity, fiber and other alternatives. So the country is interconnected, they have the infrastructure in place. And we have seen when the infrastructure is there, the connectivity is there, their e-commerce is very high, it’s very highly raising, their digital literacy. When you go to platforms, you go to YouTube, if you want to cook, among the ten videos you see, you find one. You go to online platforms that are not the big one. If you want a certain thing, you find one. So the mobile banking and mobile money is widely used there. Why? Because the connectivity was there. And then definitely they will document, they will do business, they will try to do the agricultural activity helped by the Internet, because the connectivity is there. So I believe connectivity is very important for us to, at some point, get this, because these applications are on the top of connectivity. We need to have connectivity. And what is the work of the Internet Society? Actually trying to empower communities and build community networks, whereby a community by their own, when they feel like marginalized, they don’t have the big ISPs, don’t find business there. They can build their own access to the Internet from their own. And by there, they will be able to leverage all the benefits that come with connectivity. So that is one of the things that when you have the connectivity, definitely you document your perspective. And when you document, we have the data and we can build AI models. We can build much more things.

Jesse Nathan Kalange: Okay. So, Clary, we are about to close, because we have four minutes. One minute or one and a half, you can take this. We’ve seen Internet Society as an international organization, which we have it in various countries. Looking at collaboration, Internet Society is doing their part. Now, can you give us an understanding? Because we have the government, industry, academia, who do research in terms of assets. And that’s why I was saying that there’s a country that they did their research to see that, let’s say, India, we have to connect them first. And when we connect them, they can create their content on their local languages or even other languages. Now, the government, industry, academia has a role to play. And we all understand that we cannot do that without innovations and stuff. What do you think that, in terms of collaboration, multilingual AI language development, what can we do? And you can share the perspective that all this organization, multistakeholder organization can do. And what’s Internet Society is doing to also support that? And what do you think that others can learn from that? Thank you very much.

Claire van Zwieten: Thank you. That’s a great question. And I would like to start by saying that, while we only cover the Arabic, Spanish, French, and English in our courses, we have over 100 global chapters around the world that are working with their local communities to try to solve their internet challenges, whether they have access or if there’s regulation that is harmful to the internet. They are able to do all of their work in their local language because of that local component to it. So while we are official for our trainings, it’s only those four, we really do, in our broader community, communicate in a much broader variance of languages. And, of course, multilingualism and AI will make the internet much more accessible to everyone. You can’t have AI without the internet. So we have amazing projects at the Internet Society which aim to connect the unconnected. We know that the remaining 2.8 billion are going to be the hardest to connect. So we’re doing the best we can and working with amazing partners around the world to connect those communities to the internet, help give them training so they can continue to maintain their internet. And then beyond that, we’re able to give them the skills to be able to communicate on the internet in their local language so there is greater representation. So it’s a process. And as one organization, we only do what we can, but we hope that with the power of our chapters, our lovely alumni like you two, people who are in our courses and our fellowships, and just our greater community are able to help support us in that mission of making sure that all communities are connected to the internet so they have the ability to use their local content online.

Jesse Nathan Kalange: Okay, all right, thank you. Does anyone have a question? So we are about to close, so they will say their final words. And Atenase, in your final words, just one and a half minutes, you can also highlight on what we can do to improve AI development and research in terms of collaborating to advance AI in multilingualism. Then we’ll just, okay, yeah.

Audience: Okay, should I present myself again? This is Grace from Cameroon. Okay, as an ambassador for PAIAC, I have a question. Atenase mentioned about we document our languages. Can you explain how can we do it? Like, okay, from Cameroon, how can I help my fellow, or how can I help my, yes, the young people to understand that, okay, we need to gather our document. What should they do concretely to gather those documents?

Athanase Bahizire: Okay, thank you so much. Very good. Quickly, we’ve been having in the past libraries, or easily what you can do like, let’s say personally, do you know your grandfather? Do you know their father and something? But if you reach out to some of your family, you can be able to gather specific data about your lineage or your lineage from, I don’t know, some generations. And you know, that is a valuable data you can’t find anywhere else. So you’ll be able to generate a certain information that nobody else can find in the world. And definitely your grandchildren, they’ll find it as a resource, you’ll get it. So there are other solutions. We have, we used to have our traditional musics. Most of the time during weddings, you see people singing all these musics, but now we are getting to, we are tending to forget all of them. But if you find, you have them, you can say, I’m collectioning, I have a collection of old music, I’m going to have them. And you can upload it on a certain streaming platform. That would be data that would be, people will be able to leverage on. And when people will be wanting to learn later on, they can build on that one. So we don’t have time. I think if you need some more information from,

Audience: I’m sorry, I’m skipping, so we have some grants. And I can tell you that there are some projects that are detailization of libraries. So you can take a look and you can, because you are not alone. There are many people on this. So, and sometimes the issue is that you don’t know what to do, or you know what to do, but you don’t have the funding to do it. So the Internet Society also helps you with this. So just for you to know that there are many people doing the same, trying to do that, and we can support you.

Athanase Bahizire: Okay. Yeah, thank you. So I was saying, yeah, if you want. you want to learn more definitely about Payag, about the Internet Society, or some of the technical questions, we can definitely meet later on after this session and discuss informally, because we don’t have time now. So my parting remark, I would say, when we want to build AI and inclusion, multi-language in AI, what we need is innovation, and innovation will come by creating our own solutions to our own problem, and trying to solve our own problems, actually, in using the digital technologies. Some of our countries are experiencing, I don’t know, flood or some other, like volcanoes in my country. And so there is a perspective that no one else have experienced. So you can also create a solution that no one else have created, that’s when innovation come into place too. Innovation is very important, and content creation and data collection. So we need to document, the internet is already there, we need to put on things. So document your expertise, document your life in a way that it can help the coming generation. Then there is one thing I was saying, we, this, there are challenges, but we need to be part of the solution. So there have been challenges when it comes to AI, when it comes to connectivity and all other aspects. So the only thing I can advise is let’s try to be part of the solution, to include our languages, part of the solution to make ethical and inclusive AI solutions.

Jesse Nathan Kalange: All right, thank you so much. Thank you, Atanasie, for that. And Clary, your final words, then we close.

Claire van Zwieten: My final words are, thank you so much for everyone coming to this session. The internet is for everybody. but not everybody has access. So I think it’s important that we have conversations like this on how we can use new and innovative tools to extend the reach of the internet and extend the accessibility of the internet. And I am so thankful to Asanase and Ibrahim Fifi Selby and being here and helping us guide through this conversation. And if you would like to hear more about how you can get more involved, I really encourage you to go to Alejandra who will raise her hand. There she is. And she can tell you more about our amazing empowerment programs which will train you to be the internet leaders of tomorrow, just like the two brilliant men who are on that podium.

Jesse Nathan Kalange: Okay. All right. Thank you, Clary. And thank you, wonderful people for joining. I have a gift for everyone so no one should leave. We have a gift for everyone for this section, joining this section. So our time is up and we thank you for joining. And we thank Alejandra for also supporting this program. If you have a final word, just 30 seconds, then we can all leave. And we appreciate you for joining. We are very happy for this conversation. Thank you. So just thank you.

Speaker 1: I want to say thank you to all of you. And this is an example of what we want to do at the Internet Society. We just give them tools, knowledge, some kind of like a program that it’s from six months or to a year. And that’s it. That’s what we do. The rest is coming from them. They are the stars. So that’s what I’m telling you now. We need you. So we are talking about multilinguism. We need people like you to go there to understand that if we don’t move and we don’t act right now, it doesn’t matter that we have like fundings, we have programs, we have support. At the end, your languages are going to die if you are not supporting them. So think about it. You need to put all the things that you have online, defend your languages and use all the tools that we have like A.I. to be sure that all these languages live and they have future for our kids on the future that we have. And as I said, everything is always depending on us. We are the power here and the internet needs us. So thank you so much. And thank you, the great speakers that we had. Please, a round of applause. Thank you.

Jesse Nathan Kalange: Then we are done. So I’ll just have a seat, then I will just give you is it Yeah, we’ll have a picture. So

A

Athanase Bahizire

Speech speed

147 words per minute

Speech length

3332 words

Speech time

1354 seconds

AI models need diverse language data to be inclusive

Explanation

AI systems require data to learn and deliver results. To have inclusive AI models that support multiple languages, there needs to be diverse language data available for training these models.

Evidence

Example of Rwanda creating an AI model for legislation by digitizing and cleaning local language bills.

Major Discussion Point

AI and Multilingual Inclusion

Agreed with

Claire van Zwieten

Audience

Agreed on

Need for diverse language data in AI development

Documenting local languages and content is crucial

Explanation

To create inclusive AI models, it’s important to document and digitize local languages and content. This provides the necessary data for training AI systems in diverse languages.

Evidence

Suggestion to document family histories, traditional music, and local perspectives as valuable data sources.

Major Discussion Point

AI and Multilingual Inclusion

Agreed with

Claire van Zwieten

Audience

Agreed on

Importance of documenting local languages and content

Differed with

Claire van Zwieten

Differed on

Approach to language documentation

Connectivity is key for communities to create online content

Explanation

For communities to document their languages and create online content, they need internet connectivity. This is crucial for building the necessary data for multilingual AI systems.

Evidence

Example of India’s progress in e-commerce and digital literacy due to improved connectivity.

Major Discussion Point

AI and Multilingual Inclusion

Agreed with

Claire van Zwieten

Agreed on

Importance of connectivity for multilingual content creation

Technical challenges in accommodating non-Latin scripts

Explanation

Developers face technical challenges when accommodating non-Latin scripts in their systems. This includes issues with email addresses and domain names in different scripts.

Evidence

Example of email addresses and domain names in Arabic, Russian, and other non-Latin scripts.

Major Discussion Point

Challenges in Developing Multilingual AI

Need for innovation and local solutions

Explanation

To achieve multilingual inclusion in AI, there is a need for innovation and local solutions. Communities should create their own systems and solutions to address their specific language needs.

Evidence

Example of a Congolese developer building on Gemini to solve local traffic problems.

Major Discussion Point

Collaboration for Multilingual AI Development

Importance of community networks for connectivity

Explanation

Community networks are crucial for providing internet access in areas where large ISPs don’t operate. This connectivity enables communities to document their languages and create online content.

Evidence

Mention of Internet Society’s work in empowering communities to build their own internet access.

Major Discussion Point

Challenges in Developing Multilingual AI

Document cultural heritage and traditional knowledge

Explanation

Preserving cultural heritage and traditional knowledge through documentation is important for language preservation and AI development. This creates valuable data that can be used to train AI models in local languages.

Evidence

Suggestions to document family histories, traditional music, and local perspectives.

Major Discussion Point

Promoting Language Equity and Inclusion

C

Claire van Zwieten

Speech speed

179 words per minute

Speech length

1655 words

Speech time

553 seconds

Internet Society promotes multilingualism in its programs

Explanation

The Internet Society is committed to making the internet accessible to everyone by providing multilingual support. They offer interpretation in various languages for their communications and trainings.

Evidence

Mention of providing interpretation in Spanish, French, and Arabic when possible.

Major Discussion Point

AI and Multilingual Inclusion

AI can help preserve endangered languages

Explanation

AI technology can be used to preserve and protect endangered languages. This helps maintain cultural heritage and ensures language continuity for future generations.

Evidence

Example of the Navajo tribe using AI to preserve their language and cultural history.

Major Discussion Point

AI and Multilingual Inclusion

Agreed with

Athanase Bahizire

Audience

Agreed on

Importance of documenting local languages and content

Differed with

Athanase Bahizire

Differed on

Approach to language documentation

Limited representation of diverse languages in AI development

Explanation

There is a lack of representation of diverse languages and cultures in AI development. This leads to AI models that are not fully inclusive or representative of global linguistic diversity.

Major Discussion Point

Challenges in Developing Multilingual AI

Agreed with

Athanase Bahizire

Audience

Agreed on

Need for diverse language data in AI development

Support local chapters working in their languages

Explanation

Internet Society supports over 100 global chapters that work with local communities in their own languages. This helps address internet challenges and promote linguistic diversity online.

Evidence

Mention of chapters working on local internet challenges and regulations in their local languages.

Major Discussion Point

Promoting Language Equity and Inclusion

Internet Society’s role in connecting the unconnected

Explanation

The Internet Society works on connecting the remaining 2.8 billion unconnected people to the internet. This is crucial for enabling diverse communities to participate in the digital world and contribute their linguistic content.

Evidence

Mention of projects aimed at connecting unconnected communities and providing training for internet maintenance.

Major Discussion Point

Collaboration for Multilingual AI Development

Agreed with

Athanase Bahizire

Agreed on

Importance of connectivity for multilingual content creation

Need for more women in ICT and mentorship

Explanation

There is a need for more women in the ICT field and for mentorship programs to support them. This helps bring diverse perspectives to the field and promotes gender equality in technology development.

Evidence

Reference to studies showing the importance of mentorship for women in technical fields.

Major Discussion Point

Collaboration for Multilingual AI Development

Empowering youth to be future internet leaders

Explanation

The Internet Society focuses on empowering youth through training programs to become future internet leaders. This helps ensure diverse representation in internet governance and development.

Evidence

Mention of empowerment programs that train future internet leaders.

Major Discussion Point

Collaboration for Multilingual AI Development

A

Audience

Speech speed

137 words per minute

Speech length

1056 words

Speech time

460 seconds

Current AI tools struggle with many local languages

Explanation

Existing AI tools like chatGPT have difficulties accurately processing and responding in many local languages. This highlights the need for more diverse language data and improved AI models.

Evidence

Example of chatGPT struggling with basic Swahili greetings and Punjabi language processing.

Major Discussion Point

AI and Multilingual Inclusion

Lack of quality data in many languages

Explanation

There is a shortage of high-quality data in many languages, especially for less common or ‘exotic’ languages. This lack of data makes it difficult to create effective AI models for these languages.

Evidence

Example from Russia where a company is struggling to find sufficient high-quality resources in Russian and other languages.

Major Discussion Point

Challenges in Developing Multilingual AI

Agreed with

Athanase Bahizire

Claire van Zwieten

Agreed on

Need for diverse language data in AI development

Encourage learning and use of multiple languages

Explanation

Promoting multilingualism by encouraging the learning and use of multiple languages in schools and communities is important for language equity. This helps create a more linguistically diverse online environment.

Major Discussion Point

Promoting Language Equity and Inclusion

Agreed with

Athanase Bahizire

Claire van Zwieten

Agreed on

Importance of documenting local languages and content

Ensure public services support multiple languages

Explanation

Public services should provide materials and support in multiple languages to promote language equity. This ensures that all community members can access important information and services regardless of their primary language.

Major Discussion Point

Promoting Language Equity and Inclusion

Increase diversity in language representation

Explanation

Promoting diversity in language representation in literature, media, and academia is crucial for language equity. This helps ensure that all languages and cultures are represented in various domains of knowledge and entertainment.

Major Discussion Point

Promoting Language Equity and Inclusion

Leverage grants and funding for language preservation projects

Explanation

There are grants and funding available for projects focused on language preservation and digitization of libraries. These resources can be used to support efforts in documenting and preserving local languages.

Evidence

Mention of existing grants for digitization of libraries and language preservation projects.

Major Discussion Point

Promoting Language Equity and Inclusion

J

Jesse Nathan Kalange

Speech speed

0 words per minute

Speech length

0 words

Speech time

1 seconds

Multi-stakeholder approach involving government, industry, and academia

Explanation

A collaborative approach involving government, industry, and academia is necessary for developing multilingual AI. This ensures a comprehensive effort in addressing the challenges of language diversity in AI development.

Major Discussion Point

Collaboration for Multilingual AI Development

Agreements

Agreement Points

Importance of documenting local languages and content

Athanase Bahizire

Claire van Zwieten

Audience

Documenting local languages and content is crucial

AI can help preserve endangered languages

Encourage learning and use of multiple languages

All speakers emphasized the importance of documenting and preserving local languages and content to support multilingual AI development and cultural preservation.

Need for diverse language data in AI development

Athanase Bahizire

Claire van Zwieten

Audience

AI models need diverse language data to be inclusive

Limited representation of diverse languages in AI development

Lack of quality data in many languages

Speakers agreed that there is a significant need for diverse and high-quality language data to develop inclusive AI models that support multiple languages.

Importance of connectivity for multilingual content creation

Athanase Bahizire

Claire van Zwieten

Connectivity is key for communities to create online content

Internet Society’s role in connecting the unconnected

Both speakers highlighted the crucial role of internet connectivity in enabling communities to create and share content in their local languages.

Similar Viewpoints

Both speakers emphasized the importance of local initiatives and solutions in addressing language diversity challenges in AI and internet development.

Athanase Bahizire

Claire van Zwieten

Need for innovation and local solutions

Support local chapters working in their languages

Both the speaker and audience members stressed the importance of empowering diverse groups, particularly youth, to participate in internet governance and development.

Claire van Zwieten

Audience

Empowering youth to be future internet leaders

Increase diversity in language representation

Unexpected Consensus

Technical challenges in accommodating non-Latin scripts

Athanase Bahizire

Audience

Technical challenges in accommodating non-Latin scripts

Current AI tools struggle with many local languages

There was an unexpected consensus on the specific technical challenges faced in accommodating non-Latin scripts and local languages in AI tools and internet systems, highlighting a shared understanding of the complexities involved in multilingual AI development.

Overall Assessment

Summary

The main areas of agreement centered around the importance of documenting and preserving local languages, the need for diverse language data in AI development, and the crucial role of connectivity in enabling multilingual content creation.

Consensus level

There was a high level of consensus among the speakers on the fundamental challenges and necessary steps for promoting multilingual inclusion in AI and internet development. This strong agreement suggests a shared understanding of the issues and potential solutions, which could facilitate collaborative efforts in addressing language diversity challenges in the digital space.

Differences

Different Viewpoints

Approach to language documentation

Athanase Bahizire

Claire van Zwieten

Documenting local languages and content is crucial

AI can help preserve endangered languages

While both speakers emphasize the importance of language preservation, Athanase focuses on community-driven documentation efforts, while Claire highlights the role of AI in language preservation.

Unexpected Differences

Overall Assessment

summary

The main areas of disagreement were subtle and centered around the approach to language documentation and preservation, as well as the emphasis on local solutions versus institutional support.

difference_level

The level of disagreement among the speakers was relatively low. Most speakers shared similar goals and perspectives on the importance of multilingual inclusion in AI and internet governance. The differences were mainly in the specific approaches and areas of emphasis, which could potentially lead to complementary rather than conflicting strategies for addressing the challenges of multilingual AI development and internet inclusion.

Partial Agreements

Partial Agreements

Both speakers agree on the need for multilingual inclusion, but Athanase emphasizes local innovation and solutions, while Claire focuses on the Internet Society’s existing programs and support.

Athanase Bahizire

Claire van Zwieten

Need for innovation and local solutions

Internet Society promotes multilingualism in its programs

Similar Viewpoints

Both speakers emphasized the importance of local initiatives and solutions in addressing language diversity challenges in AI and internet development.

Athanase Bahizire

Claire van Zwieten

Need for innovation and local solutions

Support local chapters working in their languages

Both the speaker and audience members stressed the importance of empowering diverse groups, particularly youth, to participate in internet governance and development.

Claire van Zwieten

Audience

Empowering youth to be future internet leaders

Increase diversity in language representation

Takeaways

Key Takeaways

AI models need diverse language data to be truly inclusive and multilingual

Documenting and creating online content in local languages is crucial for AI development

Connectivity and internet access are fundamental for communities to create and share local language content

AI can help preserve endangered languages if properly developed

There is a need for more diversity and representation in AI development to address language inequities

Innovation and local solutions are key to developing multilingual AI systems

A multi-stakeholder approach involving government, industry, and academia is necessary for advancing multilingual AI

Resolutions and Action Items

Encourage people to document their local languages and cultural heritage online

Support and participate in Internet Society programs to become future internet leaders

Leverage grants and funding opportunities for language preservation projects

Promote the learning and use of multiple languages in schools and communities

Increase representation of diverse languages in literature, media, and academia

Unresolved Issues

How to effectively support minority languages with small speaker populations in AI development

Addressing the technical challenges of accommodating non-Latin scripts in AI systems

Balancing the focus between major languages and less widely spoken languages in AI development

How to ensure consistent quality of language data for AI training across different languages

Suggested Compromises

Utilize open-source AI models as a foundation for developing localized language models

Focus on documenting and digitizing existing cultural and linguistic resources as a starting point

Collaborate with local communities and leverage community networks to improve connectivity and content creation

Thought Provoking Comments

AI, basically we say AI artificial intelligence. Basically it’s the ability of the machine to mimic what the human brain can do, the task we can do. And we have seen a wide hype of AI now with the LLMs, the language models. And we think that AI is something that is just coming now, but we have been having AI systems from long ago, and they are still developing, they are still developing.

speaker

Athanase Bahizire

reason

This comment provides important context about AI, clarifying common misconceptions and grounding the discussion in a longer historical perspective.

impact

It shifted the conversation from viewing AI as a new phenomenon to understanding it as an evolving field, setting the stage for a more nuanced discussion about AI’s role in multilingualism.

For us to have AI models, we need to have data. It’s just like a human being, for you to start speaking, you need to listen. After listening, okay, you understand, you learn, then you can speak, then you can deliver. It’s the same with AI, it has to learn from the data and then deliver.

speaker

Athanase Bahizire

reason

This analogy effectively explains the fundamental concept of how AI models work, making it accessible to a general audience.

impact

It led to a deeper discussion about the importance of data in AI development, particularly in the context of multilingualism and local language preservation.

AI is great for the digital divide because it helps bring some people up but it also very deepens it. Because if you’re in an area where your language is not represented on the internet, largely because you don’t have like very good access in your area, you’re not going to be able to use the benefits of AI either.

speaker

Claire van Zwieten

reason

This comment highlights a critical paradox in AI development and its potential impact on linguistic diversity.

impact

It sparked a more critical examination of the potential downsides of AI in language preservation and representation, leading to discussions about the need for inclusive AI development.

There has been some amazing work done by the Navajo tribe, which is a tribe indigenous to the United States. And their language is disappearing. As new generations are growing up, they’re not learning Navajo the same way their parents or grandparents did. So there has been a huge push among young people in the region to make sure that their language is protected through AI.

speaker

Claire van Zwieten

reason

This example provides a concrete case study of how AI can be used for language preservation, making the discussion more tangible and practical.

impact

It inspired further discussion about practical applications of AI in preserving minority languages and cultural heritage.

We need to build our own systems. And, uh, we have many of, uh, these resources like education in the past, it was all about going to big universities and, uh, you know, lending all these big opportunities, but now through the internet, as, uh, uh, Claire was saying the internet is an enabling tool and through the internet, we can learn some of these things, many of the software engineers you see today, they will tell you, uh, 90% of their skills, they didn’t learn it in university.

speaker

Athanase Bahizire

reason

This comment emphasizes the importance of local innovation and self-reliance in developing AI systems for multilingualism, while also highlighting the democratizing power of the internet for education and skill development.

impact

It shifted the conversation towards discussing practical steps that individuals and communities can take to contribute to AI development for their languages, rather than relying solely on large tech companies or universities.

Overall Assessment

These key comments shaped the discussion by providing a comprehensive overview of AI’s role in multilingualism, from its basic principles to its potential impacts and practical applications. The conversation evolved from a general introduction to AI to a nuanced exploration of its challenges and opportunities in preserving linguistic diversity. The speakers effectively balanced theoretical concepts with practical examples, encouraging participants to consider both the global implications of AI in language and the local actions they can take to contribute to inclusive AI development. This approach fostered a rich, multifaceted discussion that addressed both the technical and social aspects of AI in multilingual contexts.

Follow-up Questions

How can we encourage local communities to produce better quality text content in their languages?

speaker

Vlad Ivanets

explanation

This is important for building effective AI language models for less common languages.

How can we empower local communities to use AI systems in their own languages without fear?

speaker

Vlad Ivanets

explanation

This is crucial for increasing adoption and usefulness of AI in diverse linguistic contexts.

How can AI be tailored to support underserved or minority-language speakers?

speaker

Abineth Sentayo

explanation

This is essential for ensuring digital equity and preventing linguistic minorities from being left behind in technological advances.

How can we concretely document our languages?

speaker

Grace from Cameroon

explanation

This is important for preserving linguistic heritage and providing data for AI language models.

Disclaimer: This is not an official record of the session. The DiploAI system automatically generates these resources from the audiovisual recording. Resources are presented in their original format, as provided by the AI (e.g. including any spelling mistakes). The accuracy of these resources cannot be guaranteed.