WS #323 New Data Governance Models for African Nlp Ecosystems

26 Jun 2025 16:00h - 17:00h

Session at a glance

Summary

This panel discussion explored new data governance mechanisms for language data driving natural language processing (NLP) ecosystems in Africa, focusing on licensing frameworks that protect cultural sovereignty while enabling innovation. The session was moderated by Mark Irura from Mozilla Foundation and featured six experts from across Africa discussing the challenges of current open licensing models like CC0 that may inadvertently enable extractive practices with African language data.

Dr. Lilian Wanzare emphasized the need for community-centered approaches to data collection and licensing that balance open sharing with benefit sharing, noting that language embodies cultural identity and community aspirations. Dr. Melissa Omino highlighted the distinction between language communities (who preserve languages) and data communities (who create datasets), advocating for community ownership rather than just consent, and introduced the new Litiyabodo Open Data License as an alternative framework. She stressed that communities should define what benefits they want, which often involves sustainable, community-based returns rather than monetary compensation.

Deshni Govender pointed out that extractive practices occur within countries as well as across borders, suggesting policy protections should build on existing cultural and indigenous rights frameworks. She referenced the Nagoya Protocol from biodiversity as a potential model for linguistic resource sharing. Viola Ochola emphasized the need for robust legal frameworks, meaningful community engagement, and capacity building within African nations to support homegrown AI development.

Samuel Rutunda discussed how government AI strategies can raise awareness, create working frameworks, and foster collaborations, while Eli Sabblah shared Ghana’s experience in developing national AI strategy through inclusive stakeholder consultations. The panelists agreed that effective governance requires collaborative partnerships between communities, governments, funders, and developers, moving beyond extractive models toward equitable benefit-sharing arrangements that respect cultural protocols while advancing technological innovation.

Keypoints

## Major Discussion Points:

– **Data Governance and Community Sovereignty**: The need to shift from treating African language communities as mere data sources to recognizing them as collective data stewards with inherent rights to govern their cultural and linguistic data, moving beyond individual consent to community-centered governance models.

– **Licensing and Benefit-Sharing Mechanisms**: Discussion of new licensing frameworks like the Litiyabodo Open Data License that move beyond traditional open licenses (like CC0) to ensure equitable benefit-sharing with language communities, where benefits are defined by communities themselves rather than imposed externally.

– **Anti-Extractive Practices and Cultural Protection**: Addressing how current AI development practices often extract value from African language communities without providing benefits, and the need for policies that protect cultural sovereignty while still enabling innovation and open collaboration.

– **Government Role and Policy Frameworks**: Exploration of how national AI strategies and procurement systems can support community-led governance, including the challenges of government understanding and funding AI/NLP projects, and the need for capacity building within government institutions.

– **Community Capacity Building and Skills Development**: The necessity of building technical literacy, digital rights awareness, and governance frameworks within language communities so they can effectively participate in and control the development of AI technologies using their languages.

## Overall Purpose:

The discussion aimed to explore practical solutions for creating more equitable data governance mechanisms for African language data used in Natural Language Processing (NLP) systems. The panel sought to address the power imbalances and extractive practices in current AI development while finding ways to protect cultural sovereignty without stifling innovation.

## Overall Tone:

The discussion maintained a collaborative and solution-oriented tone throughout, with participants building on each other’s ideas constructively. While there was acknowledgment of serious challenges around exploitation and power imbalances, the tone remained optimistic and focused on practical pathways forward. The conversation was academic yet accessible, with participants sharing both theoretical frameworks and real-world experiences from their work across different African countries.

Speakers

– **Mark Irura** – Moderator, works with Mozilla Foundation

– **Deshni Govender** – Dynamic force from South Africa working at the intersection of law, technology, and social impact; passionate about democratizing AI ecosystems; advisory board member of the South African AI Association; co-founder of the GIZ diverse women in tech network; working group member on AI strategy recommendations for South Africa; featured on the list of 100 brilliant women in AI ethics

– **Rutunda Samuel** – CTO and principal researcher at Digital Umuganda, a leading AI driven voice technology organization for African languages based in Kigali

– **Lilian Diana Awuor Wanzare** – Dr., lecturer at the Department of Computer Science at Maseno University; research interests in artificial intelligence, machine learning, and natural language processing for low resource languages; holds a PhD in Computational Linguistics and an MSc in Language Science and Technology from Saarland University in Germany

– **Melissa Omino** – Dr., Director of the Center for Intellectual Property and Information Technology (CIPIT) at Strathmore University; intellectual property expert; board member at Creative Commons

– **Elikplim Sabblah** – Technical advisor working for the Fair Forward program, a project within the Digital Transformation Center (DTC) Ghana within GIZ (German technical cooperation); focuses on AI policy advisory, open AI resource accessibility and capacity building

– **Ochola Viola** – Director of Access to Information; advocate of the High Court of Kenya; legal practitioner with experience in administrative law, commercial law, human rights and law reforms; holds an MBA in strategic management; Open Government Leadership Fellow

Additional speakers:

None identified beyond the provided speakers names list.

Full session report

# Data Governance Mechanisms for African Language Technologies: A Panel Discussion Report

## Introduction and Context

This panel discussion, moderated by Mark Irura from Mozilla Foundation, brought together six distinguished experts from across Africa to explore new data governance mechanisms for language data driving natural language processing (NLP) ecosystems on the continent. Irura opened by providing context about Mozilla Common Voice, which has collected over 30,000 hours of voice data in more than 180 languages, highlighting both the scale of community contribution and the need for better governance frameworks.

The panel featured Dr Lilian Diana Awuor Wanzare, a computational linguistics expert from Maseno University; Dr Melissa Omino, Director of the Centre for Intellectual Property and Information Technology at Strathmore University; Deshni Govender, a South African legal and technology expert working on AI democratisation; Rutunda Samuel, CTO of Digital Umuganda focusing on African language voice technology; Elikplim Sabblah, a technical advisor with Ghana’s Digital Transformation Centre; and Ochola Viola, Director of Access to Information and legal practitioner specialising in administrative and human rights law.

The session focused particularly on developing licensing frameworks that protect cultural sovereignty whilst enabling innovation, addressing the critical challenge of how current open licensing models like Creative Commons Zero (CC0) may inadvertently enable extractive practices with African language data.

## Core Challenges in Current Data Governance Models

### The Inadequacy of Traditional Licensing Frameworks

The discussion began with a fundamental critique of existing open licensing models. Dr Melissa Omino articulated a crucial distinction that shaped the conversation: “Ownership and consent are two completely different things. The traditional data sharing regime treats communities as sources rather than partners, and this extracts value whilst leaving these very communities with just the risks and harms.”

This observation highlighted how current frameworks, including widely-used Creative Commons licences like CC0, fail to address the power imbalances inherent in AI development. Dr Lilian Wanzare emphasised the need for “community centeredness” in approaches to data collection and licensing, noting that language embodies cultural identity and community aspirations.

### Extractive Practices: Beyond Simple North-South Dynamics

Deshni Govender introduced a particularly thought-provoking perspective that challenged conventional narratives about data exploitation: “I think it’s important also to point out that when we mention the concept of extractive practices, that it’s not always a foreign versus local context. And it’s not a cross-border issue, because I think that extractive practices often happen within countries and within the continent under the guise of the open collaboration concept.”

This insight reframed the discussion from simplistic North-South dynamics to a more nuanced understanding of power structures that can perpetuate extraction even within African contexts. Govender suggested that policy protections should build upon existing cultural and indigenous rights frameworks, referencing the Nagoya Protocol from biodiversity as a potential model for linguistic resource sharing.

### The Complexity of Oral Traditions

A critical technical challenge emerged through Govender’s analysis of African oral traditions: “The problem with having culture or language that is intended for oral knowledge, it means that it’s also shaped by tone, it’s shaped by cadence, it’s shaped by who is telling the story and what is that meaning that’s attached to it… And so it’s kind of hard to understand the asset that you’re working with if you’re not even sure how to put it into create an asset value or an asset form.”

This observation highlighted the fundamental difficulty of digitising oral knowledge systems without losing their cultural essence, presenting unique challenges for NLP development that go beyond simple text-based approaches.

## Community Ownership and Alternative Frameworks

### Moving Beyond Consent to Ownership

The panellists demonstrated broad agreement on the need to shift from treating African language communities as mere data sources to recognising them as collective data stewards with inherent rights. Dr Omino distinguished between language communities (who preserve languages) and data communities (who create datasets), advocating for community ownership rather than just consent.

Ochola Viola emphasised that “community ownership should be legally entrenched with operationalised mechanisms to reach remote communities,” highlighting the need for robust legal frameworks with stringent data collection rules to protect communities from exploitation. She stressed that local communities should control data from collection through usage with meaningful engagement throughout the process.

### New Licensing Approaches

Dr Omino introduced the Litiyabodo Open Data Licence as an alternative framework, though specific details about its mechanisms were not elaborated in the discussion. She also mentioned that Creative Commons has released preference signaling work that complements existing CC licenses, suggesting ongoing evolution in licensing approaches.

Govender referenced other emerging frameworks, mentioning both the “Noodle license” and “Inkuba license” as examples of alternative approaches being developed, though again without detailed explanation of their specific features.

## Government Role and Policy Challenges

### Potential and Limitations of Government Involvement

The discussion revealed both the potential and limitations of government involvement in language data governance. Rutunda Samuel explained how government AI strategies can raise awareness, create working frameworks, add accountability, and help raise resources for language technology development. He shared Rwanda’s experience with Common Voice, which collected 30,000 hours over six years across more than 10 African languages.

However, Samuel also highlighted practical challenges: “Usually, I don’t know, I was talking to someone and say, government is run by accountants. And accountants, they want facts. They want, oh, what is this going to do? And then it’s still in the early stage of the language technology… So it’s very hard to show the facts.”

### National Strategy Development

Elikplim Sabblah shared Ghana’s experience in developing national AI strategy through inclusive stakeholder consultations, noting that Ghana’s draft strategy includes guidelines for data collectors on collection, storage, and sharing practices. He also mentioned the launch of an AI policy playbook at a UNESCO conference, indicating broader continental efforts at policy development.

However, Ochola Viola pointed out critical implementation gaps, particularly in procurement processes where “the procurement person does not, is not aware of AI, let alone even, you know, any other thing.”

## Capacity Building and Skills Development

A critical theme throughout the discussion was the need for comprehensive capacity building across multiple levels. Dr Wanzare emphasised that communities need understanding of AI model development, governance frameworks, and benefit structures to participate effectively in governance decisions.

Sabblah highlighted the importance of outreach programmes to help communities understand AI’s purpose and overcome fatigue from repeated data collection schemes. He also mentioned research on women-led SMEs that are using AI tools without realising it, pointing to the need for broader digital literacy.

Govender noted her inclusion in the “100 brilliant women in AI ethics” list, highlighting the importance of diverse voices in AI governance discussions.

## Economic Considerations and Investment

The discussion touched on funding and investment strategies for African language technology development. Dr Omino advocated for greater local investment, arguing that “governments need to invest locally in NLP rather than looking externally, and challenge local investors to fund model development.”

Samuel, while agreeing on the need for government support, focused more on changing procurement mindsets and willingness to take risks with emerging technologies. The challenge of demonstrating concrete benefits from early-stage language technology investments emerged as a significant barrier to securing both government and private sector support.

## Technical and Infrastructure Challenges

Beyond governance and legal frameworks, the discussion acknowledged significant technical challenges specific to African language contexts. The predominantly oral nature of many African languages creates unique NLP design challenges that require specialised approaches to preserve cultural nuances and communal knowledge systems.

Infrastructure limitations also pose significant barriers to community participation in governance mechanisms. Viola emphasised that digital infrastructure must be available so remote communities can access benefits and engage with AI technology investors.

## Areas of Agreement and Ongoing Tensions

### Shared Principles

The panellists showed broad agreement on several key principles:

– Communities should have ownership and control over their language data rather than just providing consent

– Current licensing and governance frameworks are inadequate and need reform

– Capacity building is essential for effective community participation in AI governance

– Government procurement systems need updating to handle AI technologies appropriately

### Different Emphases

While not representing fundamental disagreements, speakers emphasised different aspects of the challenges:

– Dr Wanzare focused on community knowledge gaps as a primary barrier

– Viola emphasised government institutional capacity and legal framework inadequacies

– Omino stressed the need for complete shifts away from external funding dependency

– Samuel highlighted the practical challenges of demonstrating early-stage technology benefits

## Conclusion

This discussion revealed both the complexity of challenges facing African language data governance and the emerging consensus among experts on fundamental principles. While significant questions remain about implementation strategies and funding approaches, the shared commitment to community sovereignty and equitable benefit-sharing provides a foundation for future development.

The conversation demonstrated that protecting African language data and communities requires not just new licensing frameworks, but fundamental changes in how AI development is conceptualised, funded, and implemented. The panellists’ references to ongoing work on new licensing models and national AI strategies suggest that practical progress is being made alongside theoretical development.

The path forward demands continued dialogue, experimentation with new models, and commitment to centering community voices and values in all aspects of language technology development. As Irura noted in moderating the discussion, these conversations are part of an ongoing effort to ensure that the benefits of AI development reach the communities whose languages and knowledge make such technologies possible.

Session transcript

Mark Irura: Good evening. Good morning. Hi, everyone. Thank you for joining our session. My name is Mark Irura. I’ll be moderating this session. I think we will start with introductions. I’ll introduce the panel. We have three participants who are online and three who are here on stage. I will start with Deshni Govenderni on my right. Deshni Govenderni is a dynamic force healing from South Africa. Her work intersects law, technology, and social impact. She’s passionate about democratizing AI ecosystems. Her skill sets range from prototyping new open source licenses with local language communities to scaling AI and data science through a boot camp that has been conducted for women that has been conducted across three African countries. She has also co-developed South Africa’s first AI maturity assessment framework. She has passionately worked on birthing the language AI hub for African NLP and co-created an AI policy playbook with Global South policymakers. Deshni Govenderni describes herself as a bridge builder and a co-creator of AI in Africa. She’s an advisory board member of the South African AI Association, a co-founder of the GIZ diverse women in tech network, working group member on AI strategy recommendations for South Africa, and was featured on the list of 100 brilliant women in AI ethics. Later we’ll introduce Samuel Rutunda, who is on my left. Samuel is a CTO and principal researcher at Digital Umuganda, a leading AI driven voice technology organization for African languages. Digital Umuganda is a Kigali based AI and open data organization on a mission to democratize access to information in African languages. Founded in 2018, the company builds large scale voice and text data sets and develops voice AI tools to bridge the language divide and preserve linguistic diversity. With projects spanning 17 African languages, they’ve recorded thousands of hours of speech and digitized countless text samples, fueling models for local and global impact. Founded in Rwanda’s tradition of Umuganda, community uplift through collective efforts, Digital Umuganda unites community contributors, developers, governments and NGOs to build open source language infrastructure by Africans for the world. On the far right, on my far right, I have Dr. Lilian Wanzare. Dr. Lilian is a lecturer at the Department of Computer Science at Maseno University. Her research interests are in artificial intelligence and machine learning, in particular natural language processing, you will hear the term NLP a lot in this panel, and building text processing tools for low resource languages. She has served as the principal investigator for several research projects funded by BMGF, the Lacuna Fund, the Canadian Development Research Agency, IDRC, AI4D, among others. She has pioneered the Kenya Corpus, or Ken Corpus as we know it. which is a Kenyan language corpus for NLP and machine learning research. A project that looks at building data sets for training NLP tools for underserved languages, particularly those spoken in Kenya, with use cases geared towards agriculture, education, and health. She also works on sign language research, particularly Kenyan sign language, researching ways of bridging language barriers using virtual signing avatars. She holds a PhD in Computational Linguistics and an MSc in Language Science and Technology from Saarland University in Germany. Online, I will start with Dr. Melissa Omino. Melissa is the Director of the Center for Intellectual Property and Information Technology CIPIT at Strathmore University, a leading Eastern African AI Policy Hub and Data Governance Policy Center. Her research direction is focused on utilizing an African lens and a human rights lens. Part of the research conducted under Dr. Omino’s leadership involved mapping AI applications in Africa as an initial step in answering the question of what determines African AI and the problems that African AI should aim to solve. Dr. Omino is an intellectual property expert and has served as an advisory board member in several projects that intersect between AI and IP. This also includes driving a national AI strategy process and she has also led IP advisory for a global entity that is funding AI research in Africa. We also have Eli Elikplim Sabla. He is a technical advisor working for the Fair Forward program, a project within the Digital Transformation Center DTC Ghana. Within GIZ, GIZ is a German technical corporation. In this role, ELI focuses on AI policy advisory, open AI resource accessibility and capacity building to foster inclusive and sustainable AI development in Ghana. ELI has worked on the development of Ghana’s national AI strategy, collaborating with the Ministry of Communication, Digital Technology and Innovation through the Agency of the Data Protection Commission. With a strong background in data science, monitoring and evaluation, project management and stakeholder engagement, ELI is working towards enhancing AI accessibility, local innovation and responsible AI adoption in Ghana. And last but definitely not least, we have Miss Viola Ochola. She is the Director of Access to Information. She is an advocate of the High Court of Kenya and a legal practitioner with administrative law, commercial law, human rights and law reforms experience spanning over 15 years. She also holds an MBA in strategic management and has extensive experience both in the public and the private sector. She is the immediate former manager in the Complaints, Investigations and Legal Services Department at the Commission on Administrative Justice, Kenya. Viola is an Open Government Leadership Fellow and a member of the Technical Committee on the Open Government Partnership, the Kenya chapter, in her capacity as cluster lead for the Access to Information commitment. She is passionate about open governance and the empowerment of citizenry to access services and benefit from opportunities offered by government. The reason I’ve gone through the elaborate introduction is for you to understand and know who will be talking to us in this topic this evening and also for you to To look up the panelists and reach out via LinkedIn and and and ask questions and connect and and continue to engage on the topic. Our topic today is Exploring new data governance mechanisms for language data driving NLP ecosystems in Africa the issue of Licensing of language has already come up in various workshops And today we want to have a more practical discussion that looks at Research that is currently going on in this topic Language is culture and culture is identity Yet the digital identity of Africa is skewed, manipulated, misinterpreted or disproportionately commercialized. The language data collection is characterized by a Significant disparity between large-scale publicly accessible resources and numerous smaller isolated projects. The Mozilla Foundation seeks to positively impact the way in which local language data is viewed, collected and stored and utilized. Currently, Mozilla Common Voice is the world’s largest most diverse crowdsourced multilingual Open speech corpus holding more than 30,000 hours and more than 180 different languages and is an example of a successful Community initiative that is also a digital public good It is a self-serving community platform as well as a lab for linguistic inclusion and for traversing data governance issues in NLP but there has been an Awakening and sentiment change amongst the language communities and this is what we will delve into today Speakers who crowdsource data sets and some of the issues that have been raised including inequitable investment locally sensitive community control, and the dynamics around power that are impacting the use of the language to build language technology. So in this session, having introduced and set the background for the problem, we are going to highlight the unintended and intended ripple effects of the CC0 open public license on communities and language data, and we want to look at governance and policy, how do they intersect, and what are some solutions that are being worked on to try and resolve the problem. I will go straight to you Lilian, and I will begin with a question on how can AI training data licenses be adapted to protect cultural sovereignty and ensure equitable benefit, especially for those

Lilian Diana Awuor Wanzare: who have been marginalized. Thank you so much Mark for the introduction and for the question. I think we all know that when it comes to AI systems, data is core, and when it comes to NLP systems, that data is language. But what is language specifically, as Mark has mentioned here, it is really more than a group of words. It embodies really the aspirations of the communities from where they come from, and the cultural identity of the different communities. So if you look at that, and to think about this data, how can it be licensed in a way that still promotes the cultural values from where they come from. I will think about it as one, community centeredness. How do we go about collecting the data from the community themselves? How do we manage the use of this data along the journey as it’s been used in the NLP systems? And if you look at community centeredness, there’s a lot of things that go into it. One of them is consent. As they are going to provide the data, are they properly informed? And this informed consent is a continuous process. An understanding of the journey of their data as it goes around being developed and moves across as it is being used for different NLP systems. Now, how do we balance the issue of within the licensing, the issues of open sharing vis-a-vis benefit sharing? Those things should not be mutually exclusive. We can still have open sharing and still have benefit sharing. How can this be embodied within our existing licensing for us to be able to have both? In such a way that, yes, we still do open sharing to facilitate development of tools, development of systems that promote the language, but still from where the data comes from. It’s no longer extractive. They too have a benefit of whatever tools are going to be developed from them. And how does this move not just from the data community, but from the larger language community? Those who collected data themselves, but the larger community who speak the particular words, languages. And if I think about it in the last bit as a close, in this licensing ecosystem, I know there are different bodies that look about licensing and licensing in general. In this ecosystem, how transparent is it to enable different views and different combinations that support different requirements by different people? You’re able to really pull things together that are aligned to your values. And there’s no one size fits all. There can be different ways of combining the licensing, the ecosystem that still supports the community centeredness with the communities, but still allows for open sharing and development across the journey. That would be my opening remark.

Mark Irura: Thank you so much, Lilian. I will come to you, Melissa. And Lilian has mentioned something to do with the different needs and the different requirements. over the entire, let me call it language or AI language life cycle and she’s talked about values and to it I want to throw in the benefit. I will not just make it economic but when we think about benefit I would like you to help us to unpack that in view of the question like how do we think about sovereignty but we also enable these things based on the work that you’re currently undertaking. Okay, thank you so much Mark. I hope you can hear me. Yes. Excellent.

Melissa Omino: So I think when we think about the benefit I don’t think that we here should be discussing the benefit without referencing the language community because that’s where the benefit should flow. So in part of the work that CIPIT is doing in collaboration with the data science law lab at University of Pretoria is reaching out to these language communities and I’m being very specific about this term language community because there’s also a data community that exists and is made up of African AI data developers who actually collate languages into these particular data sets for natural language processing. We think that that community, the language community should be able to speak for itself and say what type of benefit that they would require for the particular use of that language data set and it has already been mentioned by Dr Lillian that different types of uses might require different types of benefits or might actually require a different thought as to what a benefit would mean. A lot of resistance towards having these language communities speak about quote-unquote a benefit is that it is automatically assumed to be a monetary thing or a royalty based thing but essentially we are saying that it should be given up to the community to decide what that should be. Most of my discussions with various communities, including the Tuluwa community via Masanui University, has been that they want something that is sustainable and that is community-based, meaning something that everyone in this language community can interact with and benefit from. And a monetary or royalty benefit doesn’t quite meet that mark. So, essentially, we need to think about the harmful dynamic that has been created with the current use of language datasets and the fact that these language datasets being commodified in AI systems primarily serves dominant languages and wealthy corporations, while marginalized communities receive no benefits, no matter how you define it, and often leave their cultural protocols, practices, values violated. So, a benefit could actually be the respect of the cultural knowledge that the language carries or even a share or access to this AI tool that has been built using the language data. So, can a licensing framework deliver this? I think that it can. That’s what the new dual license, the new Litiyabodo Open Data License, is meant to do. It’s supposed to provide an avenue where this conversation about what type of benefit would flow to a community would start from. And we are sort of trying to fit it into what currently governs the language dataset regime, which is copyright licensing. So, we came up with an alternative license that has elements of copyright, but also has elements of recognition of cultural knowledge and giving a voice to the community to negotiate about what they would want as a benefit. And here, I also have to signal the Creative Commons. community, where I am a board member, who just yesterday released publicly their work on preference signaling that would work hand in hand with Creative Commons licenses. And this essentially gives data stewards of particular data sets that are being utilized by AI to be able to say what they would prefer that data set to be used for or as. So this is actually signaling that this act of benefit recognition, benefit sharing is something that is being worked on and needs to be worked on. And maybe it’s not for us to determine, because it would just be us imposing our thoughts on these language communities, but bringing language communities to the forefront so they can speak for themselves as to what they would like. Thank you.

Mark Irura: Thanks, Melissa. It’s, I appreciate especially your comment on benefit. And with benefit, obviously, one of the things that is apparent in the continent is the issue of avoiding, you know, the recolonization through language and through AI. And I think this is an important question to ask you, I’ll come to you, back to you and ask about policy. When we think about policy, and we think about policy frameworks, what are broad principles that we can incorporate to think about, you know, equity and anti-extractiveness, so that there is mutual benefit, we do not stifle innovation, as Lillian said, but we’re able to grow and advance because we still need a commons to be able to move forward.

Melissa Omino: Thanks, Mark. I think that in order to have real equity, we need, we are required to think about communities as having ownership. and not just a group that would provide consent. Ownership and consent are two completely different things. The traditional data sharing regime treats communities as sources rather than partners, and this extracts value while leaving these very communities with just the risks and harms. So there has to be a shift where there is community data sovereignty, and I think Lilian has mentioned this or has alluded to this, and you have also alluded to that, Mark, where we legally recognize communities as collective data stewards with inherent rights to govern data about their members, territories, and cultural knowledge, which is where language would fall into. So individual consent is not enough when data affects an entire community. We need graduated consent that requires community consultation before individual agreements. The community gets to weigh in on whether that serves their collective interests. They get to voice what their collective interests are, and this includes verification rather than one-time permission and complete transparency about who is benefiting and how, and also deferring community veto power over harmful applications. So if someone profits from community data, the community must benefit too, and this means a mandatory benefit sharing requirement where communities might get a percentage of profits if that’s what they want, or they might get capacity building investments in infrastructure, education, and priority access to products developed using their community data, and this is not coming from me. This is coming from consultations that I’ve had with specific community members. So in order to prevent exploitation or to make this shift to this new utopia that I’m speaking about, We need strong anti-extractive safeguards, so data should not be sold to third parties without going back to the community for permission. Communities must be able to reclaim their data and take it elsewhere if they feel like, which requires regular audits with results shared publicly in accessible formats to ensure accountability. All these should be backed by penalties for violating community agreements. And I must admit here my bias as a lawyer, I’m really thinking about legal frameworks and structures. So that’s why I’m talking about accountability and I’m talking about enforcement mechanisms. But I think that really works currently in the language data sharing regime because they are using legal agreements being copyright licenses to govern the sharing of this data. So the conversation or rather what I’m trying to highlight here is that it’s ultimately about power and not just viewing data as a tool under a data governance regime, so not just about privacy. It’s really about where is the wealth and power concentrated and how can we then distribute this in an equitable manner. So legal frameworks would be one of the policy considerations that I would think of, but I also think that governments when coming up with their AI strategies and policies, which a plethora of those have happened on the African continent, they need to center culture as one of the main pillars of their strategy. I know the Kenyan strategy does that. It does mention that culture is an important factor. It does mention responsible and ethical AI, which this would be a pillar that this conversation would fall under. And it also talks about model development for problem solving on the continent. And you cannot talk about model development for problem solving if you do not think about language data sets. So I think that this is… essentially how we can get to a balance. It’s not about closing off the data, it’s about ensuring that it’s an equitable exchange between those who want to collect and use the data and the communities that have preserved and curated. And again, I say that there are two communities that exist, the language community that has suffered historical disadvantages in curating and preserving the language, particularly in the context of Africa, but there’s also the data community, which is the African AI developers who put in effort, who’ve used their skills and knowledge in creating these datasets, and who have an interaction with those who fund these activities. So there needs to be a balance for, let’s talk about these three parties in this context, those who would like to use the dataset, those who have curated the languages and preserved them, and those who actually have created the dataset.

Mark Irura: Thanks. Thanks, Melissa. I’m looking at you, Deshni Govender, now. Melissa has taken us to utopia, to canon, but we need to come back now here to what exists now that we could latch on to. And even as Deshni Govender gives her remarks, maybe I’ll ask you, I’ll ask you, Viola, to also be on standby to give us a different perspective, if there’s anything that Deshni Govender will have missed out. So over to you, Deshni Govender.

Deshni Govender: Sure. I think it’s important also to point out that when we mention the concept of extractive practices, that it’s not always a foreign versus local context. And it’s not a cross-border issue, because I think that extractive practices often happen within countries and within the continent under the guise of the open collaboration concept. I do think that policy protections that cover digital work should also actually take their foundational basis from existing protections that are afforded to cultural and indigenous communities which exist in a civil context. So, assuming those foundational building blocks exist, then policy protection can almost come into play in two ways. Sorry. Policy protection can come into play in two ways, which is, A, as a source for human rights, because that’s really important protecting labor rights and gig workers who also often do the unsexy work of labeling data, of training algorithms, but also come in as a counter leverage point in the context of open source and digital public goods. And we’ve heard the speakers mention the concept of quid pro quo. If you take something, give something back. And I’ll just run through very quickly a few points. So, fair sharing is one way. And then my co-panelist Melissa mentioned the Noodle license, but there’s also the Inkuba license that was developed. Another way is if a commercial actor has to cross-subsidize public maintenance for open source AI resources, what would that look like? Does it come with conditions? But the use of open grants or long-term partnerships that actually benefit the community. So, one example was a grant that google.org had given to Ghana NLP, which had really very minimal conditions attached that the community could use as they saw fit. I think the other one for that AI policy could include, which doesn’t often happen and should, is having where there are foreign investors or foreign partners, including local partners as equal collaborators, because oftentimes localized partners come in as just consultants. And when you have an equal collaborator, you have a co-ownership of the data corpora, and that could be often done by MOUs or just general contracts. And I think that policies should make AI developers accountable and that accountability can look like… impact reports or independent audits. I will mention very quickly something that I came across before I hand over to Viola in my research, something that’s called the Nagoya Protocol. And this actually exists in the biodiversity space, which basically requires fair and equitable sharing of benefits in the use of genetic resources. And that’s like plants, animals, microorganisms, etc. And I feel like if we want to learn, we could learn from parallels like this. So establishing something like a linguistic protocol for use of African languages and AI could be a great policy tool for regional principles or codes of conduct. I guess another policy tool could be the AI policy playbook that was recently launched at the UNESCO conference a few weeks ago. But I’ll stop here.

Mark Irura: Over to you, Viola. All right.

Ochola Viola: Thank you. Thank you, Mark. And having speaking after Melissa and Deshni Govenderni, I think most of them have sort of taken out most of the policy requirements. But I would still emphasize on the issue of the data sovereignty and equitable data sharing that both Melissa and Deshni Govenderni talked about. The local African communities should be able to control the data from the point of collection and up to the point of usage of these AI technologies so that they are able to be part of the process. So the whole process has to be inclusive. They should not just be there at the point of information givers or data givers, but they should be involved in the whole process. And Melissa also mentioned having that she’s a lawyer, so she’ll be biased around the legal framework. So I’ll also speak, I’m a lawyer. So definitely the legal framework around the collection of this data. has to be very stringent, has to be very robust, so that the local communities are protected from possible exploitation from the external of the big tech, so to speak, so that even at the point of usage, the benefits, whatever way they may define these benefits, they are able to benefit from that, so that it’s not an issue that they feel that are being exploited. Quickly, the aspect of community ownership should not just be something that is entrenched in the law, but there should be actual mechanisms that have been operationalized within the ecosystem, within the African countries, so that these local communities can be reached, because sometimes you’ll realize that some of these communities are in very remote areas in the African continent, and sometimes even in terms of the digital infrastructure, they cannot even access some of these benefits or some of these issues that the external parties want to develop. So, it would be important for the governments, at least the African governments, to make sure that the infrastructure is available, so that these communities can be able to reach out to these investors who might want to develop these AI technologies using their languages. And with that, like I said, the engagements have to be very meaningful. It shouldn’t just be, like one of my co-panelists said, something that you’re just called to give information or to give data. You have to be aware and to understand what exactly you’re giving out and the possible repercussions. And finally, I’ll speak to another policy as perspective, that one of building the capacity and skills development of the African nations, because we realize sometimes the issue is the lack of skills and the lack of the capacity to do this within the continent. So it’s important for the various policy frameworks to be able to put in place possible training solutions or skills development strategies, so that some of these technologies are homegrown and home-owned also, so that you then now even develop a framework within which you can transfer the knowledge locally beyond just waiting for the external parties to come in. And this, it’s not necessarily to be done within the country. You can also collaborate with the big tech to be able to develop the skills within the continent, then now the skills will be developed from there. So I think I’ll stop there. Thank you.

Mark Irura: Thanks Viola. And also thank you for such a broad response. You covered infrastructure, you covered capacity building, and this speaks to an ecosystem approach. You can’t just develop infrastructure only. You can’t just build capacity. You can’t just develop policy. So I will look at you some now, because when we are coming up with national AI strategies, I know Rwanda has one, the goal is to think about an ecosystem, is to think about where do we want to go and what do we want to achieve. And I will also ask you, Eli, to share experiences from Ghana, since you’ve gone through this cycle. I will start with you, Sam. And it’s an abstract question, but it’s also a simple question. Very simple. Can government support community-led governance? Could government partner? It’s always top-down. It’s always, this is what you need to do. What do you think, like, these strategies can help to support the growth of the AI ecosystem that is coming up? Thank you.

Rutunda Samuel: Thank you, Mark. Yeah, I think, first, the AI strategies or AI policies, they help within these three categories. First, they raise awareness. Usually, once something becomes a strategy or a policy, it makes people to know about it. So AI, once it’s implemented, people are now looking at all the components of AI, of which currently the major one is the language component. Second, it creates a working framework that AI governments and other entities can use as a guideline or as a framework to follow. And then it also adds some accountability, because they have to explain something. And this helps us, where in the absence of that policy, this could not have been a way. And then, in terms of what it creates, it starts creating a discussion. It means now when you go to them, you can have a base of how you can discuss. You have some place from where to start the discussion, and then they can look at it, and they can say, oh, we have actually a plan, we have a policy or a strategy. this is what it says. And then the thing about language is cross-cutting and it touches many aspects of everyday life and then start creating synergies. So for example someone in health can say oh actually we are thinking of using this tool but then they don’t know how to do it but given that there is a policy they have where to ask and then it’s that even us as a community start saying oh how about we work within the health for example medicinal plants is it something we can capture within our languages. So it creates synergies and collaborations and then ultimately the goal is to raise resources. So with these discussions with these collaborations how as a country we start now streamlining how we raise resources because there is a need to raise the resources. Yeah I think that’s what I would say. I’ll invite you Eli to also

Mark Irura: contribute to that point. Bearing also like we have a global audience and we have also ways that we are trying to see and build these ecosystems in a way that others could learn from us. Right thank you very much Mark and so I

Elikplim Sabblah: particularly would say that government should definitely support local communities to take ownership or lead on data governance so far as language data is concerned. Government should actually empower local communities. I mean by thinking about the idea of national strategies AI policies and AI strategies even looking at the way they are developed and drafted it actually includes whichever approach is taken. includes local communities and major stakeholders. And so just by that definition, through stakeholder consultations, ecosystem analysis and research, SWOT analysis, all that process should already include communities that are existing in the space. And so if that is the case, then it is, in the first step, a way of supporting the community to also take ownership of whatever comes out of there, of what data governance is concerning in a particular country. Now, what I’ve learned in the process in Ghana is that currently we have a draft national AI strategy that is undergoing review. And throughout the review processes, we reach out to various groups and trying to understand their specific needs and what they would like to see in the reviewed document. And it has been consistently spoken of how they need to see representation in there or how they would like to be empowered to be able to govern data sets that are generated within the same space. Now, to this, I would say that in the already existing draft, there is a pillar that actually speaks to this, and it’s called the Pillar 5, which says that the strategy seeks to provide data collectors with guidelines and principles for collecting data, storing and sharing it. I think this creates an avenue for the government to empower local communities to be able to take the lead or ownership as far as data governance is concerned. If the strategy would actually pinpoint specific principles and guidelines that these communities need to take, that would eventually influence the level of ownership they would be able to take of the data governance system in the country. So, I think a lot has been said already, and we also need to take a look at the adoption of alternative licensing models like the NUDO that has already been mentioned by Dr. Melissa on the call. in the session. And I think that this, when we want to take this approach, I think it will all go well for the communities involved. Yeah.

Mark Irura: Thanks, Eli. I think this is something also that always comes up with me and this morning in a session I had it. So I wanna put an open question to the panel. So you’ve talked about rules and regulations. I’ve not talked about money. Some before this panel asked me a difficult question about money. And one of the challenges even that came up earlier was the procurement systems is there because procurement provides an opportunity for these communities, developer communities that Melissa mentioned. And even Viola talked about like people who are in remote areas, they cannot benefit because there’s no infrastructure, there’s no connectivity. So to this panel and to anyone who might have a thought on it, the issue about public procurement and the ability to procure innovation, that conversation with government, not just in Africa but globally because I think that’s also an issue. Do you have any reflections on it? We have representation from government but we’ll not put her on the spot. But anyone who has a view, like what could we do in this regard so that even as we talk about governance, procurement becomes an issue and thinking about procuring innovation. Any thoughts?

Rutunda Samuel: Yeah, let me start. Usually, I don’t know, I was talking to someone and say, government is run by accountants. And accountants, they want facts. They want, oh, what is this going to do? And then it’s still in the early stage of the language technology, particularly within our domain, especially for low-resource languages. So it’s very hard to show the facts. It’s something to say, oh, I’m going to take a chance and then I will see. Yeah, but I think there is a need to take a chance. For example, when we worked in the beginning with Common Voice for Rwanda, there was no policy, there was no AI ecosystem, there was nothing. But then there was a leap forward to say, okay, let’s take a chance. And now six years, I think, 30,000 hours have been collected. I think there is, at least last time I checked, there’s around more than 10 African languages that were done. So there is a need to take those chance. But then that requires us to talk to people and to convince and to change mentalities to say, okay, just this is what happened. And then another thing, currently I’m also looking, although we are talking about language, we should look at the settings. So for these technologies to be used, there’s maybe access to the internet or the digital literacy and others. So we’ll have to look globally, but there is a need of changing of the mindset to deploy some use cases and then to learn from it before needing to first having the proofs so that you can deploy.

Deshni Govender: I think I would come in for a bit. So one of the things we know particularly about African language or African NLP or just NLP for indigenous languages is that a lot of the time it’s oral and it’s particularly so for African languages but also for other cultures. And the problem with having culture or language that is intended for oral knowledge, it means that it’s also shaped by tone, it’s shaped by cadence, it’s shaped by who is telling the story and what is that meaning that’s attached to it. And also communal use. And the problem with that is that it creates a little bit of a NLP design flaw. For example, like a design challenge and how do you actually then codify knowledge that is not as easy as taking something that is a book and then making it digital. And so the point I’m trying to make is that when we’re talking about procurement and talking about what it is that we need to do, we need to understand what asset we’re actually working with. And it’s kind of hard to understand the asset that you’re working with if you’re not even sure how to put it into create an asset value or an asset form. You know it’s an asset, but you don’t exactly know how to make this tangible and make it in a form that somebody says, oh, that’s actually interesting. I’m willing to invest in it or I’m willing to do this or I’m willing to do that. And so it’s the difficult part of trying to actually unpack that and then unpack it properly and in a way that you actually shape and preserve and protect the cultures and the nuances that come with trying to take this raw material that is an asset to the people, but then make it a tangible and international value that you could say, cool, as a country, we have this and we have this. Now let’s see how we can use this as a bargaining tool to come in for infrastructure development, to come in for knowledge sharing, but still protect the people.

Melissa Omino: I’m going to ask you a very lawyerly question, which is when you talk about procurement, are we also talking about funding, right? Because when you say money, I think about funding. And if you think about that in the local context, I really think that the challenge is on government to move away from looking to other people to save us. And I’m really stealing that sentiment from Dr. Albert Kahira, who was one of the keynote speakers at COSA, where he said, nobody’s coming to save us. We need to start thinking of ways where we can invest, locally invest in natural language processing so that we can then call the shots or really have the terms, put down the terms of how the language data would be used. And I think this is something that government is very much aware of. A lot of conversation around the Kenya National AI Strategy has been how will it be implemented? The Kenya government made the decision to keep the implementation plan away from public purview, but there is an implementation plan there. There are key performance indicators there, and there are key partners who have been identified to help with the implementation of that AI strategy. Because essentially this conversation that we’re having, we are right at the beginning cycle of natural language processing, and the experts in the room can say that. We are merely talking about data collection when we talk about language data. We need to get into the conversation of building models that will utilize this language data. And that’s why we are up in arms about having that language data open and free for all, because it will minimize the ability for local companies to invest in that language data and build models, because the market will thoroughly thrash them, if you’re talking about market economics, demand, supply, et cetera, which also as a lawyer I might not be very good at. That’s the end of my disclaimers. So I think when we talk about procurement, we need to think about funding. we need to also stop looking outside, we need to think about locally on the African continent, how can we fund? At the Kigali AI Summit earlier this year, there was a conversation about infrastructure, there was a conversation about having data centres, which is very integral to how do we control who can access and use the data, and there was a conversation about starting to have particular data centres in particular regions, and the question was, will it be accessible to African developers, or are we creating data centres for others to use on the continent in order to be compliant with data governance regimes. So I would say for public procurement to make sense, we need to first think about funding. To think about funding, we must challenge local investors to put their money where their mouth is and invest locally, not just in data collection, but in the development of models, because as far as I know, nobody outside is actually funding the development of models in order for us to actually truly have African AI.

Mark Irura: Thank you. Thank you, thank you, Melissa. I’ll come to you Viola, and if you’re online and you have a question that you’d like to pose, please put it in the chat. Over to you Viola.

Ochola Viola: Thank you, Mark. Mine will be quick. Melissa has talked about the funding aspect, because you know, you can’t talk about procurement without the funding bit, but there’s the other aspect of procurement, which is the process, and I believe that was where the challenge you were speaking on was. The question is, does even the procurement officer understand what it is? In government, where I am, there’s always a process. For example, in Kenya, there’s the Public Procurement Act that outlines the process of procurement, and part of the process is you need to give specifications and you say that this is the end product. Now, sometimes the procurement person does not, is not aware of AI, let alone even, you know, any other thing. So it will be difficult for such a person to even appreciate where you’re coming from, if you want to procure this. So maybe as a way forward, and now that Kenya has developed the strategy, it’s very fresh. It was launched in March. And this I will tell you, we may perhaps need to just build the capacity of some of these key offices, for example, the procurement arm of government, so that they’re able to appreciate that this may not necessarily be a tangible item that we are looking at, but it could be something else. So that is number one. And number two, the laws, because now the laws, as we have them now, do not appreciate such things that we may need to review the laws so that they capture these angles. And these laws should not only be reviewed by lawyers, because I mean, Melissa knows, you need to have the technical capacity to be able to put it in the laws in a way that then it will inform what you want to get at the end of the tunnel. So I think I’ll stop there with respect to procurement. Thank you. There’s a friend of mine who says, for government,

Mark Irura: procuring a packet of milk and procuring an AI system is the same. It’s not supposed to be like that. I will come to you, Eli, and I will ask a question. What sort of skills would communities need to build in order to govern their own language technologies effectively? What skills would communities need to build in order to govern their own language technologies effectively?

Lilian Diana Awuor Wanzare: What skills would communities need to build in order to govern their own language technologies effectively? What skills would communities need to build in order to govern their own language technologies effectively? What skills would communities need to build in order to govern their own language technologies effectively? What skills would communities need to build in order to govern their own language technologies effectively? What skills would communities need to build in order to govern their own language technologies effectively? What skills would communities need to build in order to govern their own language technologies effectively? What skills would communities need to build in order to govern their own language technologies effectively? What skills would communities need to build in order to govern their own language technologies effectively? And for them, they really don’t have a governance framework. If somebody wants to use our data, how do they come in? If I want to share my data, how do they come in? If the media wants to share the data, how do they come in? All these data generators, how do they come in? What is their benefit structure? As again, you can see then, it is because they really don’t know what comes together to develop these AI models. There’s a really disjointness in terms of how this comes in, vis-à-vis the model. I want a model that’s able to help me with chemistry in the law. I’m asking, do you even understand what you’re saying? First of all, do you have chemical terms in the law? Before this model can start talking about chemistry in the law, how do we get it there? You see, there’s the utopia of this thing is magical, but there is no understanding of how do we get there, and how do all the stakeholders come into place to make us get there. So that really needs to be put into place. Thank you.

Mark Irura: Thanks. Eli, I’ll ask you maybe almost to wrap it up or to talk a little bit about anything to do with community work, right? Since we are at this place where we’re thinking about governance of products that will be developed for and by these communities and probably in collaboration with them.

Elikplim Sabblah: Thank you very much, Mark. And I don’t know, for the past few responses that have come from the other panelists, one theme is connecting all of it, and you can hear a lot about maybe outreach and community sensitization and all of that. I think that we have to understand that some definite skills have to be built. We need people in the communities who understand digital rights, who understand the importance of data, and who understand linguistics or who have skills in linguistics to be able to maximize the opportunity that this technology brings to their communities. Now, one of the things that I’ve come… to understand is that sometimes there is community fatigue regarding contributing to data collection schemes and so there’s desensitization would make them understand that there is a purpose for this and they may not have immediate should I say benefits in terms of maybe monetary terms or whatever support they may need in the immediate sense or but it goes a long way to contribute to something bigger that can actually benefit them immediately and then also the nation as a whole. So I think that it is important for us to understand the need for outreach programs to reach out to people in communities to let them understand the purpose of artificial intelligence. Recently we did a research trying to understand how women-led SMEs or entrepreneurs are using AI and NLP tools to be able to interact with their customers and partners and all of that and we came to the understanding that most of them are probably even using tools that have AI algorithms working in them but they don’t even know and then also some of them will even express a certain level of fatigue as I already mentioned that they are tired of contributing to data collection schemes and stuff like that but then we actually need people with indigenous knowledge and indigenous experience to contribute to these things. Now one other thing that I wanted to also point out is the need for us to let these models that we’re developing on the African continent to represent African culture and one culture is the shared ownership of resources and when you talk about African culture and oral tradition you’ll notice that proverbs and idiomatic expressions and stories don’t have proprietary ownership, it belongs to the community and so that should be reflected in the models that we build and in our data collection activities so that data and models are openly accessible to all. I think I mashed up a lot of things but yeah basically that’s what I wanted to end with. Thank you. Thanks Eli.

Mark Irura: So I will, a question has come. Wow, and we have just run out of time. Lilian, in 30 seconds, how to bridge the gap between building capacity for local communities in AI beyond collection and increasing usage of AI models within those same communities. 30 seconds, please. It is really about partnership and

Lilian Diana Awuor Wanzare: collaboration. If we think about the whole model, we have the local ecosystem, the government, the funders, internal players. How do we come together collaboratively to be able to make this possible? It cannot be disjointed. It has to be a collaborative effort within all members of the ecosystem.

Mark Irura: Thank you. I don’t want to recap what has been said. I began the panel with an elaborate introduction of everyone. Maybe I didn’t introduce myself properly. I’m Mark I work with the Mozilla Foundation. You can follow us online, each one of us. You can hit the subscribe button and like. No, you just follow us on LinkedIn and feel free. I’m making a pact with each of the panelists. Feel free to reach out and ask them about the work and about this work and about what they’re doing. Thank you so much. Thank you so much and thank you for being part of this panel. We really appreciate it. Thank you.

Lilian Diana Awuor Wanzare

Speech speed

151 words per minute

Speech length

836 words

Speech time

332 seconds

Community-centered data collection with informed consent and benefit sharing while maintaining open access

Explanation

Wanzare argues that language data licensing should embody community centeredness through proper informed consent processes and benefit sharing mechanisms. She emphasizes that open sharing and benefit sharing should not be mutually exclusive, allowing for both development of tools and non-extractive practices that benefit the originating communities.

Evidence

She mentions the need for transparency in licensing ecosystems that support different requirements and values, allowing for different combinations that are aligned to community values while still supporting open sharing and development.

Major discussion point

Data Governance and Licensing for African Language Data

Topics

Legal and regulatory | Human rights | Sociocultural

Agreed with

– Melissa Omino
– Deshni Govender

Agreed on

Current licensing and governance frameworks are inadequate and need fundamental reform

Communities need understanding of AI model development, governance frameworks, and benefit structures to effectively participate

Explanation

Wanzare argues that communities lack understanding of how AI models are developed and what governance frameworks should look like. She emphasizes that communities need to understand the entire process from data collection to model development to effectively govern their language technologies.

Evidence

She provides examples of communities asking for AI models for specific purposes without understanding the technical requirements, such as wanting ‘chemistry in the law’ without having chemical terms in the language or understanding the development process.

Major discussion point

Community Capacity Building and Skills Development

Topics

Development | Sociocultural | Legal and regulatory

Agreed with

– Elikplim Sabblah
– Ochola Viola

Agreed on

Communities need capacity building and skills development to effectively participate in AI governance

Disagreed with

– Ochola Viola

Disagreed on

Primary barriers to effective language data governance

Bridging capacity gaps requires collaborative partnerships between local ecosystems, government, funders, and international players

Explanation

Wanzare emphasizes that building capacity for local communities in AI requires collaborative effort from all stakeholders in the ecosystem. She argues that the approach cannot be disjointed but must involve partnership between local communities, government, funders, and international players.

Major discussion point

Community Capacity Building and Skills Development

Topics

Development | Economic | Legal and regulatory

Melissa Omino

Speech speed

155 words per minute

Speech length

1892 words

Speech time

729 seconds

She references Dr. Albert Kahira’s statement ‘nobody’s coming to save us’ from COSA, mentions discussions at the Kigali AI Summit about data centres and infrastructure, and notes that currently nobody outside Africa is funding model development for truly African AI.

Major discussion point

Government Role and Policy Frameworks

Topics

Economic | Development | Legal and regulatory

Disagreed with

– Rutunda Samuel

Disagreed on

Approach to funding and investment in African NLP development

Deshni Govender

Speech speed

169 words per minute

Speech length

867 words

Speech time

307 seconds

Extractive practices occur both within and across borders, requiring policy protections based on existing cultural and indigenous rights

Explanation

Govender argues that extractive practices in language data collection are not only foreign versus local issues but also occur within countries and continents under the guise of open collaboration. She suggests that policy protections for digital work should build upon existing protections for cultural and indigenous communities, serving both as human rights protection and counter-leverage in open source contexts.

Evidence

She mentions examples like Google.org’s grant to Ghana NLP with minimal conditions, the Inkuba license development, and the concept of cross-subsidization by commercial actors for public maintenance of open source AI resources.

Major discussion point

Data Governance and Licensing for African Language Data

Topics

Human rights | Legal and regulatory | Sociocultural

Agreed with

– Melissa Omino
– Lilian Diana Awuor Wanzare

Agreed on

Current licensing and governance frameworks are inadequate and need fundamental reform

African languages being primarily oral creates NLP design challenges in codifying knowledge that isn’t easily digitized

Explanation

Govender explains that African languages are often oral and shaped by tone, cadence, storytelling context, and communal use, which creates significant challenges for NLP development. This makes it difficult to understand and quantify the asset value of language data, as it’s not as straightforward as digitizing written books.

Evidence

She describes how oral knowledge is shaped by who tells the story and the meaning attached to it, and explains the difficulty in creating tangible asset forms that investors can understand and value appropriately.

Major discussion point

Technical and Infrastructure Challenges

Topics

Sociocultural | Infrastructure | Development

Ochola Viola

Speech speed

132 words per minute

Speech length

894 words

Speech time

403 seconds

Legal frameworks must be robust with stringent data collection rules to protect communities from exploitation

Explanation

Viola emphasizes the need for very robust legal frameworks around data collection to protect local African communities from possible exploitation by external big tech companies. She argues that these frameworks should ensure communities benefit from AI technologies regardless of how they define those benefits.

Major discussion point

Data Governance and Licensing for African Language Data

Topics

Legal and regulatory | Human rights | Consumer protection

Community ownership should be legally entrenched with operationalized mechanisms to reach remote communities

Explanation

Agreed on

Government procurement systems are inadequate for AI and language technology innovation

Disagreed with

– Lilian Diana Awuor Wanzare

Disagreed on

Primary barriers to effective language data governance

Rutunda Samuel

Speech speed

131 words per minute

Speech length

607 words

Speech time

276 seconds

AI strategies raise awareness, create working frameworks, add accountability, and help raise resources for language technology development

Explanation

Samuel argues that AI strategies and policies serve multiple important functions: they raise public awareness about AI components including language, create frameworks for governments and entities to follow, add accountability mechanisms, and facilitate resource mobilization. He emphasizes that these strategies create synergies and collaborations across sectors like health, leading to resource raising opportunities.

Evidence

He provides an example of how health sector professionals might want to use AI tools for medicinal plants and how having a policy framework enables them to know where to ask for help and creates collaboration opportunities.

Major discussion point

Government Role and Policy Frameworks

Topics

Legal and regulatory | Development | Economic

Government procurement requires mindset changes and willingness to take chances on emerging language technologies

Explanation

Samuel argues that government procurement faces challenges because governments are run by accountants who want concrete facts, while language technology for low-resource languages is still in early stages and difficult to prove with hard data. He emphasizes the need for governments to take calculated risks and change mentalities to deploy use cases and learn from them.

Evidence

He cites the example of Common Voice for Rwanda, where despite having no policy or AI ecosystem initially, taking a chance led to collecting 30,000 hours of data and developing more than 10 African languages over six years.

Major discussion point

Technical and Infrastructure Challenges

Topics

Economic | Development | Legal and regulatory

Agreed with

– Mark Irura
– Ochola Viola

Agreed on

Government procurement systems are inadequate for AI and language technology innovation

Disagreed with

– Melissa Omino

Disagreed on

Approach to funding and investment in African NLP development

Elikplim Sabblah

Speech speed

157 words per minute

Speech length

855 words

Speech time

325 seconds

Government should empower local communities to take ownership of data governance through inclusive strategy development

Explanation

Sabblah argues that governments should support and empower local communities to lead data governance, particularly for language data. He emphasizes that the development of national AI strategies should include communities through stakeholder consultations, ecosystem analysis, and research, which inherently gives communities ownership of the resulting governance frameworks.

Evidence

He describes Ghana’s draft national AI strategy development process, which includes reaching out to various groups to understand their needs, and mentions Pillar 5 of the strategy that provides guidelines for data collectors on collecting, storing, and sharing data.

Major discussion point

Community Sovereignty and Ownership

Topics

Legal and regulatory | Development | Sociocultural

Agreed with

– Melissa Omino
– Ochola Viola

Agreed on

Communities should have ownership and control over their language data rather than just providing consent

Communities need people with digital rights knowledge, data importance understanding, and linguistics skills

Explanation

Sabblah argues that communities need specific skill sets to effectively govern their language technologies, including understanding of digital rights, data importance, and linguistics. He emphasizes the need for people with indigenous knowledge and experience to contribute to AI development while understanding the broader purpose and benefits.

Evidence

He mentions research on women-led SMEs using AI and NLP tools, finding that many use AI-powered tools without knowing it, and notes community fatigue regarding data collection schemes due to lack of understanding of the purpose and benefits.

Major discussion point

Community Capacity Building and Skills Development

Topics

Development | Human rights | Sociocultural

Agreed with

– Lilian Diana Awuor Wanzare
– Ochola Viola

Agreed on

Communities need capacity building and skills development to effectively participate in AI governance

Outreach programs are needed to help communities understand AI’s purpose and overcome fatigue from data collection schemes

Explanation

Sabblah identifies community fatigue and desensitization regarding data collection as a major challenge that requires targeted outreach programs. He argues that communities need to understand that while they may not see immediate monetary benefits, their contributions serve a larger purpose that can benefit them and the nation as a whole.

Evidence

He references research showing that women entrepreneurs are tired of contributing to data collection schemes, and emphasizes that African culture of shared ownership of resources like proverbs and stories should be reflected in the models and data collection activities.

Major discussion point

Community Capacity Building and Skills Development

Topics

Development | Sociocultural | Human rights

Mark Irura

Speech speed

125 words per minute

Speech length

Government Role and Policy Frameworks

Topics

Legal and regulatory | Economic | Development

Agreed with

– Ochola Viola
– Rutunda Samuel

Agreed on

Government procurement systems are inadequate for AI and language technology innovation

Agreements

Legal and regulatory | Economic | Development

Legal and regulatory | Development | Sociocultural

Unexpected consensus

The need for collaborative partnerships rather than top-down approaches

Speakers

– Lilian Diana Awuor Wanzare
– Elikplim Sabblah
– Melissa Omino

Arguments

Bridging capacity gaps requires collaborative partnerships between local ecosystems, government, funders, and international players

Government should empower local communities to take ownership of data governance through inclusive strategy development

Communities should be recognized as collective data stewards with inherent rights, not just sources providing consent

Explanation

Despite coming from different professional backgrounds (academic researcher, government advisor, and legal expert), there is unexpected consensus on rejecting traditional top-down approaches in favor of genuine partnership models that recognize community agency and expertise.

Topics

Development | Human rights | Legal and regulatory

The complexity of oral African languages creates unique technical challenges for AI development

Speakers

– Deshni Govender
– Lilian Diana Awuor Wanzare

Arguments

African languages being primarily oral creates NLP design challenges in codifying knowledge that isn’t easily digitized

Community-centered data collection with informed consent and benefit sharing while maintaining open access

Explanation

There is unexpected technical consensus between a policy expert and an academic researcher about the fundamental challenges that oral traditions pose for AI development, recognizing that African languages require different approaches than text-based systems.

Topics

Sociocultural | Infrastructure | Development

Overall assessment

Summary

The speakers demonstrate remarkable consensus across multiple critical areas: the inadequacy of current licensing frameworks, the need for community ownership and control over language data, the importance of capacity building, and the failure of existing government procurement systems to handle AI innovation effectively.

Consensus level

High level of consensus with strong implications for policy reform. The agreement spans technical, legal, and social dimensions, suggesting a mature understanding of the interconnected challenges facing African language data governance. This consensus provides a solid foundation for developing comprehensive solutions that address community rights, technical requirements, and governance frameworks simultaneously.

Differences

Different viewpoints

Approach to funding and investment in African NLP development

Speakers

– Melissa Omino
– Rutunda Samuel

Arguments

Governments need to invest locally in NLP rather than looking externally, and challenge local investors to fund model development

Government procurement requires mindset changes and willingness to take chances on emerging language technologies

Summary

Omino advocates for a complete shift away from external funding and emphasizes local investment as the solution, while Samuel focuses on government willingness to take risks and change procurement mindsets to support emerging technologies, regardless of funding source

Topics

Economic | Development | Legal and regulatory

Primary barriers to effective language data governance

Speakers

– Lilian Diana Awuor Wanzare
– Ochola Viola

Arguments

Communities need understanding of AI model development, governance frameworks, and benefit structures to effectively participate

Capacity building for procurement officers and legal framework updates are needed to handle AI procurement effectively

Summary

Wanzare identifies community knowledge gaps as the primary barrier, while Viola focuses on government institutional capacity and legal framework inadequacies as the main obstacles

Topics

Development | Legal and regulatory | Sociocultural

Unexpected differences

Role of external versus internal capacity building

Speakers

– Melissa Omino
– Elikplim Sabblah

Arguments

Governments need to invest locally in NLP rather than looking externally, and challenge local investors to fund model development

Communities need people with digital rights knowledge, data importance understanding, and linguistics skills

Explanation

While both speakers advocate for local empowerment, Omino strongly rejects external involvement and emphasizes complete local self-reliance, while Sabblah appears more open to external collaboration for capacity building. This disagreement is unexpected given their shared goal of community empowerment

Topics

Development | Economic | Human rights

Overall assessment

Summary

The speakers show remarkable consensus on fundamental goals – community sovereignty, equitable benefit sharing, and the need for better governance frameworks. However, they disagree significantly on implementation strategies, funding approaches, and the role of external actors

Disagreement level

Low to moderate disagreement level with high strategic implications. While speakers agree on problems and desired outcomes, their different approaches to solutions could lead to fragmented or competing initiatives. The disagreements reflect different professional backgrounds and regional experiences, suggesting need for integrated approaches that combine legal, technical, policy, and community perspectives

Partial agreements

Legal and regulatory | Development | Sociocultural

Takeaways

Key takeaways

Language data governance requires a shift from treating communities as data sources to recognizing them as collective data stewards with inherent ownership rights

Community-centered approaches must balance open sharing with equitable benefit distribution, allowing communities to define what benefits mean to them beyond just monetary compensation

Alternative licensing frameworks like the Litiyabodo Open Data License can provide mechanisms for community benefit negotiation while respecting cultural protocols

Government AI strategies should center culture as a main pillar and include communities as equal partners throughout the entire AI development lifecycle, not just at data collection stage

Local investment and funding in NLP model development is crucial for African countries to control their language technology destiny rather than relying on external actors

Capacity building is needed across multiple levels – from procurement officers understanding AI to communities understanding digital rights and data governance

The oral nature of African languages creates unique technical challenges for NLP that require specialized approaches to preserve cultural nuances and communal knowledge systems

Successful language technology governance requires collaborative partnerships between local ecosystems, governments, funders, and international players rather than siloed approaches

Resolutions and action items

Panelists committed to being available for follow-up engagement via LinkedIn for continued discussion on language data governance topics

Reference made to Creative Commons releasing preference signaling tools that work with CC licenses to allow data stewards to specify preferred uses

Ghana’s draft national AI strategy includes Pillar 5 providing guidelines for data collectors on collection, storage and sharing practices

Kenya’s AI strategy implementation plan exists with identified key partners and performance indicators, though kept from public view

Unresolved issues

How to effectively operationalize community data sovereignty mechanisms, especially for reaching remote communities with limited digital infrastructure

Specific implementation details for alternative licensing frameworks and how they would work in practice across different African contexts

How to reform government procurement processes to effectively handle AI and language technology acquisitions

Bridging the gap between data collection activities and actual model development/deployment that benefits local communities

How to address community fatigue from repeated data collection schemes while building sustainable engagement

Balancing the need for open data commons to drive innovation with community ownership and benefit-sharing requirements

How to quantify and preserve the intangible cultural assets embedded in oral language traditions within digital frameworks

Suggested compromises

Graduated consent models that require both individual consent and community consultation before data agreements

Dual licensing approaches that allow for both open sharing and community benefit requirements

Cross-subsidization models where commercial actors support public maintenance of open source AI resources

Equal collaboration partnerships with co-ownership structures rather than consultant relationships between foreign and local partners

Mandatory benefit sharing with flexible definitions allowing communities to choose between monetary compensation, capacity building, infrastructure investment, or priority access to developed products

This comment identified a critical but often overlooked implementation gap – the disconnect between policy aspirations and administrative capacity. It highlighted how existing legal frameworks and human capacity constraints can undermine even well-intentioned AI strategies. This was a practical insight that connected legal, technical, and human resource challenges.

Impact

This comment brought the discussion full circle from high-level policy to ground-level implementation challenges. It influenced the conversation toward practical capacity building needs and helped other participants understand why technical solutions alone are insufficient without corresponding institutional development.

Overall assessment

Explanation

This examines whether infrastructure development will truly benefit local developers or primarily serve compliance needs for external actors

Disclaimer: This is not an official session record. DiploAI generates these resources from audiovisual recordings, and they are presented as-is, including potential errors. Due to logistical challenges, such as discrepancies in audio/video or transcripts, names may be misspelled. We strive for accuracy to the best of our ability.