Digital Democracy Leveraging the Bhashini Stack in the Parliamen
20 Feb 2026 12:00h - 13:00h
Digital Democracy Leveraging the Bhashini Stack in the Parliamen
Summary
The session focused on building an inclusive, open-source voice AI ecosystem for India, emphasizing the need to continuously adapt technologies to diverse languages, cultures and users [1-13]. Amitabh Nag highlighted that AI solutions have a short “shelf life” and must be regularly upgraded because there is no warranty for static systems, especially given the vast linguistic and cultural diversity across the region [5-8][9-13].
Ariane Ahildur introduced the newly released Policy Report and Developers Toolkit, describing them as a joint German-Indian effort that provides best-practice guidance and embodies a shared vision of digital inclusion through voice technology [24-38][42-44]. She stressed that voice interfaces are crucial for low-literacy populations and that responsible, multilingual voice AI can unlock access to public services, aligning with the Hamburg Declaration on AI for Sustainable Development Goals [36-41][49-52].
Harleen Kaur outlined a four-pillar policy framework-treating foundational data as public goods, institutionalising sustainable open-source infrastructure, building open and representative models, and strengthening responsible deployment [73-78]. The accompanying developer toolkit translates these principles into practice by focusing on representation planning, data-quality assurance, and embedding responsible AI throughout the development lifecycle [90-94][97-101].
In the panel, Nag described two main pathways for sustaining data creation: large-scale “brute” collection of diverse speech samples and the generation of improvement corpora from deployed products, including both open-domain and closed-domain sources [121-138]. Ghosh argued for a smarter, cost-effective approach that leverages intrinsic linguistic components rather than exhaustive data gathering, illustrating this with a Telugu project that covered four dialects by identifying common acoustic features and supplementing them with targeted data [154-168][174-184]. Kritika emphasized that industry adoption requires scalable, edge-ready infrastructure, domain-specific model fine-tuning, and compliance safeguards to ensure reliable deployment across sectors such as healthcare and manufacturing [190-199]. Thomas highlighted the intersecting legal challenges of privacy and copyright, urging robust documentation, privacy-enhancing techniques, and clear licensing from the outset to build a trusted ecosystem [205-224]. Ghosh warned that human transcription variability makes traditional word-error-rate metrics insufficient, proposing multi-layered, subjective-objective evaluation methods and downstream feedback loops [228-241]. Nag reinforced that ultimate acceptance of voice systems rests on audience perception rather than absolute rankings, suggesting that standards should be shaped by what end-users deem understandable and trustworthy [256-272].
The participants agreed on the need for a unified, nationally coordinated evaluation framework-potentially a single leaderboard-to drive continuous improvement while fostering collaborative competition [315-321]. The discussion concluded that aligning policy, technical, legal and evaluation efforts is essential to realize inclusive, responsible voice AI that serves India’s diverse population [24-38][73-78][205-224].
Keypoints
Major discussion points
– Dynamic, user-driven data ecosystems are essential for sustainable voice AI.
Amitabh Nag stresses that foundational speech datasets must be continuously created, enriched through user feedback, and treated as digital public goods to keep models improving over time [121-138]. Nihar Desai later summarizes this as “data sets need to be more of lived-in nature… built upon by users” [146-148].
– Inclusive language coverage requires smart, cost-effective collection strategies rather than brute-force data gathering.
Prasanta Ghosh explains that Indian linguistic diversity can be addressed by focusing on intrinsic language families (Indo-Aryan, Dravidian) and balancing data volume with coverage [155-168]. He illustrates the approach with the Telugu dialect project, showing how a “region-anchored” method reduces time and budget while preserving diversity [174-183].
– A four-pillar policy framework and a developer toolkit translate inclusive AI principles into practice.
Harleen Kaur outlines the policy pillars: treating foundational data as public goods, institutionalising sustainable open-source infrastructure, building open and representative models, and strengthening responsible deployment [73-78]. The accompanying toolkit operationalises these pillars through guidance on representation, data quality, and embedding responsible AI (RAI) throughout the development lifecycle [90-108].
– Legal and governance safeguards (copyright, privacy, documentation) are critical to protect trust in the ecosystem.
Thomas Vallianeth highlights the intersecting challenges of copyright and privacy, urging early-stage provenance checks, privacy-enhancing techniques, and robust documentation to enable safe downstream use [208-218][221-224]. He later notes that while the law can accommodate some subjectivity, clear evidence and trust-building measures are needed [286-298].
– Evaluation of voice models must move beyond single-metric, objective scores to a multi-layered, ecosystem-wide approach.
Ghosh points out the variability in human transcription and argues that word-error-rate alone is insufficient; instead, multi-output models, subjective human review, and downstream-application feedback should be incorporated [228-240]. Nag adds that ultimate acceptance hinges on audience perception rather than absolute rankings [256-273], and participants call for a national, collaborative benchmarking framework [315-319].
Overall purpose / goal
The session launched the Policy Report and Developers Toolkit “Building on Open and Responsible Voice Technology Ecosystem in India” and served to (1) showcase the Indo-German partnership that produced the report, (2) present a concrete policy framework and practical toolkit for inclusive voice AI, and (3) mobilise stakeholders-government, academia, industry, and civil society-to adopt open, responsible, and culturally diverse voice technologies that advance public services and sustainable development.
Overall tone and its evolution
– The discussion begins with a formal and optimistic tone, celebrating collaboration and the report’s release [24-34].
– It then shifts to a technical and problem-solving tone as participants detail challenges in data collection, linguistic diversity, and legal compliance [65-84][208-218].
– Mid-conversation the tone becomes reflective and candid, acknowledging the inherent uncertainties, subjectivity, and “no-warranty” nature of AI systems [8-15][256-273].
– The closing remarks adopt a constructive and forward-looking tone, urging continued workshops, benchmarking, and ecosystem-wide trust mechanisms [302-319][322-327].
Overall, the dialogue remains collaborative and solution-oriented, moving from celebration to deep analysis and finally to actionable next steps.
Speakers
– Ariane Ahildur – Dr.; Director General, Department for Global Health, Equality of Opportunity, Digital Technologies and Food Security, German Federal Ministry for Economic Cooperation and Development; expertise in global health policy, digital technologies, and food security. [S2]
– Nihar Desai – Head of JNI; Moderator of the panel discussion; expertise in moderation and digital initiatives. [S3]
– Moderator – Session moderator (unnamed); role: moderating the event.
– Kritika K.R. – Head Artificial Intelligence and Product Researcher, SanLogic; expertise in applied AI and product research. [S8]
– Prasanta Ghosh – Dr.; Associate Professor, Indian Institute of Science; expertise in speech technology research and academia. [S9]
– Thomas J. Vallianeth – Counsel, Trilegal; expertise in legal aspects of AI, copyright, and data governance. [S11]
– Harleen Kaur – Research Manager, Digital Futures Lab; expertise in policy research and developer-toolkit development. [S12]
– Amitabh Nag – CEO of DIBD (also referenced as CEO of Bhashini); expertise in AI ecosystem building and voice technology. [S13]
Additional speakers:
– Shailendra Pal Singh – Senior General Manager, Bhashani; role: felicitate the speakers at the event.
Opening Remarks – Amitabh Nag
Nag opened by stressing that any AI-driven voice solution must be scalable across regions such as Southeast Asia and Africa and continually refreshed, noting that a model’s “shelf-life” can be as short as three to six months [1-5]. He contrasted AI systems with static machines, pointing out that there is no warranty or guarantee for AI models and that diversity of people, languages and cultures makes inclusion a core design requirement rather than an after-thought [6-13]. Nag concluded that progress will be incremental, moving step-by-step toward higher levels of inclusion [17-19].
Keynote – Ariane Ahildur (Director General of the Department for Global Health, Equality of Opportunity, Digital Technologies and Food Security, German Federal Ministry for Economic Cooperation and Development) [24-26]
Ahildur launched the Policy Report and Developers Toolkit “Building on Open and Responsible Voice Technology Ecosystem in India.” She thanked Digital Futures Lab, Art Park, TriLegal, and NASSCOM as key partners [33-36]. The report, a product of a German-Indian partnership, offers best-practice guidance and hands-on advice for policymakers and the tech community [30-32]. Ahildur framed voice AI as a gateway for low-literacy populations to access public services, health care, education and economic participation, warning that failure to provide multilingual voice interfaces can reinforce exclusion [34-41]. She linked the initiative to the Hamburg Declaration on Responsible AI for the Sustainable Development Goals, underscoring that AI should serve people and the planet [49-52].
Report & Toolkit Presentation – Harleen Kaur (Research Manager, Digital Futures Lab) [73-78]
Kaur outlined the four-pillar policy framework:
1. Treat foundational datasets as public goods;
2. Institutionalise sustainable open-source infrastructure;
3. Build open and representative models;
4. Strengthen responsible deployment.
She explained that treating data as a public good means government funding and convening for languages that are not commercially viable [79-81]; institutionalisation involves standardised documentation, collaborative data-steward models and shared national compute resources [82-85]; the third pillar calls for locally curated benchmarks and representative models [86-88]; and the fourth stresses public-value sharing, community buy-in and literacy to prevent misuse [85-88].
The accompanying developer toolkit translates these pillars into practice, focusing on representation planning, data-quality assurance, and embedding Responsible AI (RAI) throughout the development lifecycle [90-108]. Practical recommendations include maintaining a diversity wish-list, using synthetic data, adopting a layered data-strategy (active, passive and synthetic sources), applying robust transcription standards, and implementing continuous post-deployment monitoring [97-111].
Panel Moderation – Nihar Desai (Head, JNI) [122-124]
Desai moderated the discussion and opened with the question: Should foundational datasets be treated as digital public goods, and how can a data-flywheel be created to sustain them?
Data-Creation Strategies – Amitabh Nag [124-148]
Nag described two complementary pathways:
* Traditional “brute-force” field collection that captures diverse speech samples across regions and dialects;
* Product-derived corpora generated automatically from models, including open-domain sources (e.g., YouTube) and closed-domain feedback loops from enterprise or government applications.
He argued that a flywheel of data generation and feedback is essential because datasets must be “lived-in” rather than static [146-148].
Linguistically-Informed Sampling – Prasanta Ghosh [155-184]
Ghosh proposed a cost-effective, language-family-first approach: start from the major families (Indo-Aryan and Dravidian), identify common acoustic components, and then target specific dialects. Using the ResPin Telugu project as an example, his team covered four dialects by first collecting data that captured shared acoustic features and then supplementing with targeted recordings, thereby reducing timeline and budget while preserving diversity [174-184]. This “region-anchored” strategy demonstrates how smart sampling can replace exhaustive data gathering [160-168].
Industry Perspective – Kritika K.R. (Head of AI & Product Research, SanLogic) [190-214]
K.R. highlighted the need for scalable, edge-ready infrastructure and domain-specific model fine-tuning to enable reliable deployment in sectors such as healthcare, manufacturing and automotive. She stressed that model optimisation for device-level intelligence, combined with compliance safeguards, allows open-source models to be deployed on-premise, protecting sensitive data while supporting industry-specific vocabularies [200-207][208-214].
Legal & Governance – Thomas J. Vallianeth [208-224][289-301]
Vallianeth outlined three legal dimensions:
1. Copyright provenance & licensing – even publicly available voice datasets may be subject to copyright and require provenance checks and appropriate licences;
2. Privacy-enhancing techniques at the point of collection to avoid storing personal data;
3. Robust early-stage documentation to provide downstream users with trust and evidentiary support in any legal dispute.
He warned that subjectivity in AI outputs will increasingly surface in courts, and that pre-emptive safeguards and transparent processes can mitigate such flashpoints [289-301].
Evaluation Debate
* Ghosh noted that human transcribers rarely agree word-for-word, making word-error-rate (WER) insufficient; he advocated for multi-layered evaluation that includes multiple hypothesis outputs, subjective human review and downstream task performance [228-244].
* Nag complemented this by asserting that acceptability is determined by whether the end-user understands the output, and that different contexts (e.g., courts versus casual conversation) demand different levels of linguistic purity [256-279].
The panel reached consensus on the need for a national, collaborative benchmarking system – a single leaderboard under “Varshini” – to drive competitive yet cooperative progress across languages and dialects [313-321].
Broad Consensus
Participants agreed that:
(i) Voice technology and speech datasets should be treated as public goods;
(ii) Continuous, feedback-driven data enrichment is essential;
(iii) Open-source governance and sustainable infrastructure must be institutionalised;
(iv) Evaluation must move beyond single-metric scores to multi-dimensional, context-aware frameworks; and
(v) Legal safeguards, documentation and privacy-by-design are prerequisites for trust [1-3][19][73-78][90-108][208-218][256-270][313-321].
Actionable Take-aways
– Adopt the four-pillar framework and publish the developer toolkit to embed RAI practices.
– Establish a continuous data-flywheel that combines field collection, product-derived improvement corpora, and a layered data strategy.
– Convene regular workshops to co-design a national, multi-layered evaluation framework and an annual leaderboard under Varshini.
– Implement early-stage documentation, licensing checks, and privacy-by-design measures to satisfy legal requirements.
– Encourage governments to act as ecosystem stewards, funding non-commercial language projects and maintaining open-source infrastructure [73-78][79-88][90-108][121-148][208-218][313-321].
Conclusion
The launch of the Policy Report and Developers Toolkit marks a concrete step toward an inclusive, open-source voice AI ecosystem for India that can be replicated globally. By aligning policy, technical, legal and evaluation efforts, participants underscored that continuous, community-driven data creation, responsible governance and user-centred evaluation are the pillars upon which sustainable, equitable voice technologies must be built [24-52][73-78][90-108][121-144][256-270][313-321].
including, you know, Southeast Asia as well as Africa and other places. So from that perspective, it is very important that we scale these solutions. We have policies, standards, toolkits which are developed which can be actually replicated. And frankly speaking, in this area, in this situation, nothing is static. You have a shelf life which is sometimes three months or six months or even less. Yes. So we have to continuously upgrade the things as we go by. You know, we can’t be saying that this is what we have done, unlike a machine which we have built up and it works for six years or five years. There is no guarantee, no warranty in these kind of systems which we are building in AI.
AI, and the reason for this is diversity. You know, each person is different. Each language is different. Each culture is different. So there is… There is huge amount of diversity and we have to live with the diversity unlike the earlier digital systems which used to work on only standards. You know, they had standards and they would perhaps keep the outliers away. Here, inclusion is the name of the, inclusion is part of the design, diversity is part of the design. And we would perhaps have to go step by step to define those diversities so that they start becoming standards. Right. You know, it’s a very different kind of a setup which is there and happy to be part of this journey, happy to, happy and acknowledged to the help which is being provided.
And hopefully we are going to get across to the next level and higher steps in the journey as we go by in future. Thank you very much.
Thank you, Mr. Nag for your insightful words and also for your incredible support throughout the last year over the course of the program. Right. Thank you. I will now invite Dr. Ariane Ahildur -Brandt, Director General of the Department for Global Health, Equality of Opportunity, Digital Technologies and Food Security of the German Federal Ministry for Economic Cooperation and Development to deliver the keynote address. Thank you. Thank you.
Dear Mr. Naack, dear partners, distinguished guests, it is a great pleasure to welcome you to this launch today. We present to you the Policy Report and Developers Toolkit Building on Open and Responsible Voice Technology Ecosystem in India. The report and the toolkit are the impressive result of a very productive partnership between Germany and India. And it is the result of a joint effort involving a group of distinguished partners and experts. This is why I would like to start by thanking you, Mr. Nack, and your colleagues from Ascini, for the excellent cooperation. And I would like to thank the Digital Futures Lab, Art Park, TriLegal, and NASSCOM for their invaluable support. Dear guests, you will find that the report and toolkit that we are presenting today is full of best practices and lessons learned.
It will provide guidance and hands -on advice to policymakers and to the tech community alike. But for me, this report is more than useful and more than practical content. It also conveys a shared conviction, shared values, and a shared vision for digital inclusion. In fact, when it comes to inclusion, voice technology has a key role to play. For millions of people. Voice is the most natural and powerful interface to the digital world, especially for those with limited literacy or access to digital devices. When voice AI works in local languages and dialects, it will become a gateway to public services, healthcare, education, and economic participation. When it does not, AI risks reinforcing existing devices and may even become an instrument for exclusion.
This is why responsible, inclusive voice AI is not just a technical issue. As I said, it is part of a shared vision, a shared vision between India and Germany. At a time when artificial intelligence is often framed as a global competition, this report offers a different narrative, and this is a narrative of cooperation. The Indo -German Partnership on AI, and particularly on language, and voice technologies shows what is possible when we join forces. Together with Bashini and the Indian Institute of Science, our initiative Fair Forward has created open voice technologies for nine Indian languages. These language models can now be used by NGOs, state agencies and companies. For example, they can be integrated into voice assistance for health workers, which in turn can improve health care for women.
Or they can be used to advise farmers on crop management. This collaboration, based on the principles of openness, fairness and responsibility, is the foundation for AI that truly serves the common good. And it contradicts those who claim that only fierce competition can generate prosperity and innovation. Ladies and gentlemen, this approach, closely aligns with the principles articulated by the International Cooperation on Climate Change. in the Hamburg Declaration on Responsible AI for Sustainable Development Goals. This declaration, presented by BMZ, our ministry, and UNDP last year, has been endorsed by more than 50 stakeholders already, including governments, international organizations, NGOs, and companies. The declaration reminds us that AI should serve the people and the planet, strengthen inclusion, and support sustainable development.
And our report here is a very practical and relevant contribution to that agenda, translating shared principles into concrete guidance. So let us thus deepen cooperation, strengthen trust, and build voice technologies that truly speak to everyone. Thank you for your attention.
Thank you so much, Dr. Hillbrand. We shall now move on to the formal launch of the report and toolkit. I’ll invite all the representatives of the consortium from GIZ, Tri -Legal, Art Park, NASSCOM, Digital Futures Lab to please come on stage. And Mr. Nag to present the data. Thank you. Thank you. Thank you. Thank you. Now that we’re done with the formal launch of the report and policy toolkit, just to give you a brief overview, I invite Ms. Harleen Kaur, Research Manager, Digital Futures Lab, to present the report.
Good morning, everyone, and thank you for being present. on a Friday morning for the launch of this report, as well as the developer toolkit. So I’ve linked the outputs in case you’d want to see them. If you can take a quick photo, and I’ll move towards discussing the high points of the findings that we had both for our policy report as well as developer’s toolkit. So when we began this work last year, we found that the challenges that are there in the voice tech arena, they are not limited to data collection alone. So the challenges are multi -layered that start right at the data collection stage and curation stage, but then move on to model development, where we see linguistic diversity gaps, lack of standards, uneven documentation, unclear data ownership and structures being a problem.
But then when we move on to the, hosting and licensing aspect, long -term infrastructure costs, costs, governance of open source assets, as well as sustainability of shared resources is something that we felt was a very important problem that needed to be solved in a certain manner. And the last is downstream deployment and impact, where bias, exclusion and lack of accountability for misuse become more visible. All of these are essentially starting at the data collection stage, but they move on to the life cycle of the voice technology ecosystem in India, specifically when you feel like supporting an open voice ecosystem in India. To lay down our approach for this project, we thought about how can we move on from the traditional government systems where government has primarily acted as a regulator, it enforces rules, it corrects market failures, to a newer active role, and that we have seen with Bhashani.
We encourage governments across the world to adopt this framework where the government acts as a steward of public good. ecosystem convener, as well as a standard setter, not just through licenses, but actually through practice as well. This is the overview of our policy framework. Based on this approach, we have structured our policy framework around the four pillars that you see on the screen. The first is treating foundational data sets as public goods. Second is institutionalizing sustainable open source infrastructure. Third is building open and representative models. And finally, strengthening responsible deployment. And what do we mean when we say this? When we say treat foundational data sets as public good, we are saying that government should be encouraging both funding and convening for public good functions.
For example, supporting languages that are not commercially viable as such. Institutionalizing governance. Governance framework. Thank you. to strengthen RAI practices, for example, through procurement, etc. On open representative models, we believe that local and contextually relevant benchmarks that are curated by government bodies not just at the center, but at the relevant diversity ecosystem, whether it is state, district, etc., is important. Shared national compute infrastructure, preferential treatment to open source ecosystem is something that we propose. On open source infrastructure itself, standardization of documents and promoting collaborative data steward models is something that has already been written in the report. Strengthening responsible deployment, public value sharing is another aspect of the report. We believe that public value sharing comes not just from financial arrangements, but also a buy -in of communities into what kind of…
uses of voice technology are there. And of course, supporting public literacy to protect against misuse and preventing harms is the policy side of our suggestion. Moving on to developer’s toolkit. You know, policy intent alone does not ensure inclusive AI systems. So alongside the policy framework, we’ve developed a developer toolkit that translates some of these principles into practice for developers. So it focuses on three broad areas, representation being the foremost through diversity planning, et cetera. Second being data quality and evaluation. And the third one being embedding RAI practices throughout the lifecycle of development of open voice I’ll just give you a brief overview of what we mean when we say this. So for developers, we have a toolkit that includes best practices that we’ve seen in industry.
And we have a toolkit that we’ve seen in India and outside on what does it mean to ensure adequate representation. on what does it mean to ensure adequate representation. So we have a toolkit that we’ve seen in India and outside So we have a toolkit that we’ve seen in India and outside on what does it mean to ensure adequate representation. Things like having a diversity wish list, making sure that you’re not collecting data from one source, applying linguistic expertise, using synthetic data, training model for linguistic and environmental nuances, and also layered data strategy. Which again means that don’t just use one source of data. Don’t do active or passive collection alone. Use a hybrid layered structure to make your models more diverse.
Once the developer move on from data collection to curation, we suggest many, many ways. This is just a very bird’s eye view overview in which data quality can be enhanced in the constraints that we operate in, in countries like India. And there are suggestions to make the applications inclusive and useful in practice, including robust transcription standards, contextual benchmarks. using data cards, model cards that are standardized, as well as continuous post -deployment monitoring. You can find more details in the report itself. And the last aspect of the developer’s toolkit is actually embedding RAI practices. We’ve taken another lifecycle framework within this where we believe that RAI practices are not the domain of policy alone. At enterprise startup developer level, ensuring a framework that serves to support them by providing them clarity on what does it mean when we say your output should be responsible.
So things like be mindful of engagement with the communities from whom you are taking data, annotation is happening, consent protocols, privacy enhancing techniques. So this report essentially is compliance plus. It actually shares practices that we believe are useful to promote open, responsible AI voice technology ecosystem. Please feel free to engage with the reports We’ll be very happy to take your comments, suggestions Thank you so much
Thank you, Harleen We shall now move on to a short panel discussion On voice technologies in India Unpacking the present and future Of the voice AI application ecosystem For India and beyond Joining us today, I will invite to the stage Mr. Amitabh Nag, CEO of DIBD Dr. Prasanta Ghosh, Associate Professor At the Indian Institute of Science Ms. Kritika K .R., Head Artificial Intelligence And Product Researcher, SanLogic Mr. Thomas Valunith, Counsel Trilegal And this discussion will be moderated By the Board of Directors of the Indian Institute of Science And Product Researcher, SanLogic Mr. Nihar Desai, Head of JNI Thank you.
Hello. Hello. Am I audible? Okay. Thanks everybody for joining. So, I just delving right deep into it. My first question to you would be Mr. Nag. As we saw in the toolkit, we were arguing that data set like foundational data sets, speech data sets, must be treated as DPIs and DPGs and hence be available in general. From your experience in driving this ecosystem for about two years since I’ve been a part at least, what does it take to continue creation, ongoing facilitation of such innovations being put up as a digital public good while ensuring trust safety, right? And is there a way for us to have a flywheel of data sorts, data goods of sorts?
Yeah, that’s a very important aspect of what we should be doing. That means continue the creation of data sets because it will then improve the models as we go by. Now, continuation of creation of data sets are, I would say that these are going to be in two or three ways, you know. One is the way which we have been… doing, which is the brute data collection, which is going to the various fields and then picking up the data from there and then creating the diversity which is required to actually build the model. So that is one way of doing it and that will continue. We will have to keep the focus with respect to saying that now I am doing for this particular area, this particular dialect, this particular language, while as it will be for other language in some other way.
The second is to actually look at using the products which have been developed using these models and creating such open domain activities to create the digital data. So you are creating the digital data which you are speaking, automatically creating the parallel corpus and then finding a way to actually vet this out and annotate and label and saying that, okay, this is the improvement corpus. That is the second thing. So one, you are creating a primary corpus. Second is… you are creating an improvement corpus which can be again fed back to the model and say that this is what is to be used and that is a big area of work as we look at. Allied to that is a lot of also the digital data is getting created any which way in the open domain which we can actually use to build the corpus again.
So you know YouTube videos today the world is more digital than it was yesterday. But the conscious way of looking at it as a program is what is required. How do I look at it as a program that I will be creating a data corpus at various places and this need not be necessarily an open domain. Open domain is kind of an easy way to work upon it. It can be a closed domain as well that there is an application which is working in an enterprise or a government and the people there are given an option to give suggestions to the translations or the answers or the things which you have gone in and that can get into a wetting pipeline and you are able to create that.
So those applications which are related to this when we are looking at AI portfolio not only languages but otherwise AI portfolio is very important for us to be on a continuous improvement journey. The most important aspect hence would be that if a person for example is working on a enterprise system of mails for example and it is actually deriving some summary of a document in perhaps a known language also or not a known language. The summary differs from what he thinks as a manual activity. He should be able to put that down somewhere and that goes as a feedback to the model. Currently that is something which is a concept which which may or may not exist, some enterprise would have done it, other enterprise would not have done it.
So looking at these kind of interventions which can be run as a program in a conscious way that everybody is able to contribute into the system his or her own things and then take it back from the, you know, improve the model or improve the AI systems, because they still require a lot of interventions from each and every person. The knowledge still is deficient. Thank
So what I’m taking away is that data sets need to be more of lived in nature. It’s not static. It has to be built upon by users and by others. And also just the fact that the feedback itself could lead to better data quality and which is something that enterprises might be doing, but it could definitely be done more. Thank you for that input. But to his point on the first question on data set inclusivity, Prashanta, like in going back to your research activity. mostly on inclusive data sets. The toolkit also argues that inclusivity must be designed at the foundational data layer at the time of designing data sets. But still we do find data sets which do lack this aspect.
What’s your take on what are the gaps over here at the research and academia level in terms of designing better inclusive data sets that could hence lead to better applications down the road?
That’s a very deep and good question. So to cover the diversity and become more inclusive, one approach would be to cover in the data, right? But if we think about the diversity that is there in Indian languages, right, that is a function of the culture, caste, local knowledge and everything, right? And while we see the diversity, they are not independent elements. There are certain commonalities and certain uniqueness in each of these languages and dialects and accents that we talk about. So one important direction in modeling would be to think about this intrinsic basis components that finally leads to this diversity. Instead of a brute force way of covering data from all parts of the country.
So if you can discover, for example, just an example, I’m not an expert of linguistics, but if you look at the Indian languages, there are two broad, right? One is Indo -Aryan and the other is Dabirian. Now, while there are multiple languages within each of the streams, we may say, well, can we go and then to cater certain technologies? Two speakers of these languages. should we go ahead and collect a good amount of data in everything, each of those. That may not be the only way to think about. How do we balance and make a trade -off between the amount of data we collect? We know that’s challenging and costly as well, to a novel modeling where we start from those intrinsic basis components and then manifest into those individual diversities.
I think that may help us to jointly think about modeling and collection for catering to this diverse population.
If you could help the audience with one example of when you say balance both aspects. Let’s say if we could pick up one of your initiatives, Syspin, Respin or Wani or any other data set. How did you manage or balance inclusivity versus model building activities versus maybe other factors that might be coming into factor while designing specifications?
Yeah, so the aspect of modeling that I brought out is something I would say not very well established at this moment. But from my experience in the project ResPin, I can give a concrete example. For example, if you take the Telugu as a language, right, there are, we worked with four major dialectal variations. One is in the region of Krishna Guntur, another is Vishakapatnam Vizag, another is Anandpur Chittoor, another is Nalgonda. Now, when you look at their intrinsic variations, we see that there are some commonalities. And then there are some unique aspects in each of those dialects. So now think about a brute force approach that I collect thousand hours in each of them. Versus think of collecting certain kind of stimuli to cover the actuality.
Acoustics case of the speakers, maybe from one region that will automatically cater to the other region. And then collect something that will complement. in each of the other regions, right? So that way, our overall timeline, budget, cost will all go down. And there has to be a novelty in terms of having a model that will start from the intrinsic one and then naturally diversify itself to cater to those populations. So that has to become a region -anchored approach that we started later on in one.
I see. Okay. Thanks for that input. Just to summarize, what I’m taking away is that instead of having brute force approach, what we’re essentially saying is balancing across various parameters on the basis of which you would train a model, such as linguistic diversity, acoustic diversity, and then using some sort of a smart approach to dissect the current audience, ways of collecting data, to maximize the output while maximizing bang for the buck. Thanks for that input. But this… This is also slightly… you are coming from the perspective of academia I would like to switch to Dr. Krithika from the perspective of as an applied AI researcher you are also one of the people in this panel who has really deployed speech AI solutions what is your take on challenges that you faced with inclusivity either at the data set layer or the application layer
More towards on core of the enterprise applications, knowledge repo integrations are coming up, beta healthcare, or even the manufacturing automobiles. So voice being the go -to interface for different applications and enabling the workforce across the industries is coming up. So in that case, again, as I said, on the consistency with the various user scenario and more specific to the domain adoption. Specialized domain adoption is required. That feedback loop is more important while the system is in the practice or while the system is in progress. I would say that point. And more critical aspect is on giving the scalable and sustainable infrastructure that comes with more optimized models and also like bringing the edge deployments also.
So that the real adoption can be scaled across multiple… industries and the normal usage for… various sectors across the industry. So I’m talking more on the end user perspective and using, getting the data. Data is one source of it, but making it reliable across the infrastructure and also giving the required scalable model at the device intelligence level is also important when it comes to the real adoption of these AI models.
Thanks for the input. So I guess after all, industry is also using feedback as a tool. It’s a nice validation over here. Yeah, maybe coming to Thomas, switching tracks to slightly legal sites. We’ve seen that, at least in the toolkit also, we’ve argued that speech models and speech data sets are at the intersection of copyright law, you know, data governance and security, etc. And how do you propose, how do you propose balancing sort of innovation? versus caution on these sites, especially with all the researchers and practitioners in the room?
Thanks, Nihal. That’s, again, a very helpful question. I think Harleen had articulated it quite well in the beginning when we have to consider the entire ecosystem as a whole. There is a common myth in India that anything that is public is freely available. I think what we have to think about is also that, you know, all data sets operate at the intersection of privacy law and copyright law. Under privacy law, most publicly available data sets are essentially freely available to be used under, you know, even the new legislation. But under copyright law, even if it is publicly available, somebody else may own the copyright on that. So there has to be careful thought put in place right from the beginning itself in terms of what data sets you’re collecting, what is the copyright provenance of it, are you able to defer to, you know, freely licensed and open source kind of material to compile it, compile that data set, and if not, are you able to obtain the licenses to do so?
So the thought process from the beginning in terms of how you’re structuring the way to get this and also how to reduce the surface area of the impact of some of these laws. So for instance, in relation to privacy laws, if you’re collecting somewhat more private data sets, if you can use privacy enhancing technologies or you’re able to extract data such that no personal data is ultimately captured or stored at the point of data collection, all of these are various ways in which you can put in place mechanisms right from the start of when the ecosystem begins to ensure that downstream use cases are also protected in that sense. The second big aspect is, of course, the documentation, right?
Now, the data collector, the data creator is essentially the person who is the gateway to the entire ecosystem in some senses. The documentation has to be robust right from the beginning to enable everybody in the downstream chain to be able to use this data and to ensure that there’s a good and safe and trusted ecosystem created. with respect to that specific data set. So yes, there are flexibilities that are available under the law in terms of how you are able to use voice data sets, but at the same time, there’s some caution that you have to put in place right from the beginning and throughout the life cycle of this in terms of figuring out how to be able to use these data sets effectively.
Of course, the last kind of related aspect to this is to think about the various layers in which these legalities operate. So of course, you can think of the speech data set itself as being copyrighted, but equally, if they are reading out of a book passage or if they’re reading specific performance and so on, there may be separate rights that are allocated in relation to some of these other tangential elements as well. All of these are to be accounted for from the very beginning of the ecosystem itself such that downstream usage is not… in that sense impacted. So I would say, you know, the report’s argument in that sense is that think about it as a whole.
Don’t think of each action in isolation. Think about the entire impact downstream as well. And then account for both either enabling maneuvers under law in terms of documentation, privacy enhancing techniques and so on, or implement the appropriate cautionary mechanisms to ensure that downstream usage is also protected.
Yeah, at least in some of the hats that I wear, I am also collecting data sets and those are important points that we keep in mind. And hopefully we’ll be able to take the learnings out of toolkit to actually implement in our processes. Switching tracks slightly to Dr. Prashanta here, we’ve without measurement, right, we don’t really get anywhere in terms of implementing the right frameworks, implementing the right legal processes, etc., in terms of implementing, measuring quality. what you’ve also spoken about evaluations being broken as far as Indian context are concerned can you elaborate a little bit on what challenges we face on a day to day basis where do they come across and how do you foresee this sort of challenges either getting resolved or getting amplified again I think this is an important area that all of us together should explore and contribute to
so when we build something like an automatic speech recognition system that is being used in many many applications think of this to be yet another human who is listening to the audio and trying to spit out what is spoken in text now if you go out in the real world as we have realized that multiple number of times and experienced through multiple projects in ResPin as well as Vani and many other projects that I have done and that we are able to do and that we are able to do and that we are able to do and that we are able to do is that if you give a piece of audio to two individuals, they never exactly agree on what they hear.
And I’m telling from my experience, not from two different parts of the country, I’m talking in terms of, you know, two people from the same district. In fact, there was an incident where we realized that these two people were just three kilometers away in terms of their location, but still they did not agree how that should be written from the audio they hear. So what it tells us is there is an inherent variation or variability in the way as an individual, as an Indian, I perceive or I like to see the text as, right? Now, if we accept that fact that exists today, we need to think of building our systems and system evaluation to cater to that variation.
So we need to think of that variability and to be… robust to that variability. So if, as I said in the beginning, if we treat the system also as a human, it will also not agree with another human. So if we just go by word -by -word comparison of how the system performs compared to some of the humans, certainly it will not be 100 % accurate. Or in other words, we calculate using what we call word error rate, which is objective way of evaluating. So a word -based comparison is not probably the right way to go at this point. Maybe the ASR system is doing pretty well, but just because it made a mistake slightly in one of the words, we are penalizing and telling that it’s not doing well.
So now we have to think about how do we solve this problem. It could be that we have a multiple evaluation system where we just don’t use word error rate. That’s one aspect. Another way to think about this will be to build ASR so that it itself can give not just one output, rather multiple outputs. which could be potentially right and then evaluate that not just objectively but also subjectively through human because human can absorb that error and say yes still it’s okay third will be to take that to the downstream application where depending on what you are using could be an LLM or any other QNA system that can absorb that robustness so I think we need to break down the entire evaluation system into multi -layered evaluations and then they are not really independent we need to take feedback all the way down to the final application back to ASR and so on so forth so I guess here individuals from the application areas, individuals from the linguistic background, engineers everyone has to come together and
so what I am hearing is that to solve this is more of an ecosystem level challenge right and then And maybe before our ecosystem champion over here, Mr. Nag, before you come in on this, I would just like one industry perspective of Dr. Krithika, how do you solve this from an application standpoint? Prashanta explained this challenge from more of an academic or foundational research standpoint. But how does evaluation play a role in your daily application layer?
Yeah, so as I said, right, so the applications are varied. So now the adoption is at the conversational level, right from bringing the analytics out of the data. Then now it is more on the voice interface and the multilingual conversation. Now with the speech -to -speech translation, those things are more prevalent with the conversation right now. Now coming to the industry application, industry aspect of it, yeah, adopting these models to the custom data set is one way. And also right pick of sourcing the data. From the available open source so that this model will be more specialized to those particular tasks. and the work they are supposed to do it. So now coming with the LLMs, these models are more adaptable to the industry jargons or even the core of the industry workflow.
Now making AI with the ASR models also enabling with the LLM, you have various methods from the data creation perspective, leveraging the open source data, and also like custom tuning the data to the various industry use cases. Definitely with the required compliance and these open source models are also enabling the on -prem deployment of these models, which enables the security aspect when it comes to creating the model for different core industry applications so that the models can be much more fine -tuned or trained across the domain, keeping the compliance aspect and the security aspects intact.
so having heard both of these perspectives Mr. Nag, how do you just from your experience standpoint, how do we approach resolving this conflict where all of us sort of concur that evaluations need we need a better framework to evaluation but it’s also in some ways nobody’s problem at the moment so is there a way to break this
so let’s step back and let’s evaluate our conversation itself you know, is there a framework by which we can say that who was saying, who has spoken better language right, it was as good as other people understand it you know, if the audience is able to understand what I am speaking and what I am intending to speak that is what is going to be the final evaluation by any aspect. What we have to actually look at it is that we have to reach a level by which it is acceptable to the people who are sitting in front of me. I don’t think we will be able to ever reach a situation where we will be able to say that this is the best, second best, third best.
It is a situation, ultimately the audience decide whether they are in a position to do that. We are looking at few of the use cases where we have actually deployed these technologies and we incidentally, you know, go to various evaluations. One of them is grievance incidentally and when we were giving it to the last, to the person who is actually the owner of the system, the acceptance was supposed to be taken up by various ministries. So one ministry would say that this model is better. The other ministry would perhaps display. It’s a question of perception and ultimately the audience would decide. And some would like the tone of speaking, some would like the modality, some would like the pronunciation.
So it’s all based on what the person’s perception is. Now, is there a common way in which we can say that this is the acceptable thing? Right? But then also we will have differences. You know, many of the public figures, for example, when they speak, you know, Hindi or English or whatever language, there are gaps in the language, but still, you know, they are understood. They are able to connect to the people. So we have a difficult challenge. Rather than looking at it only from a perspective of application or academics, we would have to look at it from a perspective of audience. But then we also have some issues. You know, we have situations where we have a lot of people who are not aware of the situation.
And we have to look at the situation from a perspective of the audience. which require accurate and perfect transcriptions. Like, for example, if I’m arguing a case in a court, you know, I can’t have variations in terms of languages. If I am, for example, trying to be in a meeting where I am saying something, again, I cannot have variations. But for that also, we will perhaps have to step two steps back and look at purity of language with respect to the acceptance. Because most of our language has become impure because of the fact that we are, you know, using mixed code most of the time, especially in the cosmopolitan area. And in the other areas, even if we are having native language, dialects are taking over.
So it’s a very complex problem. It’s not an easy problem to solve. At this point in time, when we are looking at how do we actually take it forward, I would tend to say that we should look at what is acceptable to the audience and then start working back to define an acceptable way by which in which the models can go out in the market.
Yeah, that’s an important point that so far we’ve been looking at mostly, at least I have been looking at mostly from the lens of application versus academia, but maybe we need to go what works point of view and not really from just the traditional ranking point of view. But Thomas, in a world where, and this is, we’ve not talked about this and this might be a curveball, but in a world where evaluation is slightly subjective and no longer objective, how does law see this? How do you make decisions for procurement? How do you resolve arguments, differences between two opinions, and especially in cases where both might be right and it’s a gray area? Like, do you foresee these sort of scenarios coming in, especially with Gen AI, which is like…
Like, do you foresee these sort of scenarios coming in, especially with Gen AI, to be fair I think the legal principles at least on this are somewhat more clear at least in terms of some of the more privacy facing or copyright facing principles they occur much before outputs for instance are produced or any of these methodologies are implemented and we have a body of law that existed for many years in India it’s just a question of how do you lead evidence in relation to some of these matters so if it ever comes to the question of is a specific output right or is a specific output implying this or a specific output implying that I think where we haven’t caught up as a country is in terms of how to evaluate the evidentiary standard in relation to that the principles of course are fairly laid out saying that this is how you would decide it but what you would show the court to say this is the evidence for that that’s something I think that’s still evolving but I think it also brings me to I think a larger point and I think we’re making on the in the report as well is that you know there is a measure of trust that needs to be put in place in the ecosystem as a whole, right?
Irrespective of what the outcome of evaluation may be, there are measures that you can put in place right from the get -go. And one example I can give you is in relation to harmful content, right? Now, if there is a debate in relation to whether content is harmful or not, and it is a subjective determination, you can avoid that question to some degree by putting in place the necessary rails and safeguards right from the beginning itself so that trust is engineered into the process already as opposed to having to face that choice kind of downstream. But yes, to your point, and if we’re coming to a place where we need to face that question, I think the principles exist, but how you lead evidence, how you show the court that one is the interpretation over the other, still developing and very, very subjective.
I think some of the cases that, you know, the prominent AI players have in the country will go a long way to develop. Some of those standards, but at least as of now, the court system is still trying to catch up. to some of these principles. Documentation goes a long way to show intent. Methodologies that you have implemented that go to the extent of showing that you assumed reasonably high enough safeguards, reasonably high enough principles. All of these go a large extent to show intent. And so the subjectivity, I think, in that sense is far reduced if you put in place some of these measures that bring trust in the entire ecosystem. So I think at that one flashpoint of failure perhaps is tough to look at for the courts as well.
But if you look at it from an ecosystem perspective, I think there’s a lot of that that may reduce those flashpoints of failure or those flashpoints of evaluation at least from a legal perspective.
I see. Thanks for that summarization. That the law as such is at a stage where it can accommodate some amount of subjectivity but there needs to be dialogue and more policy decisions to make it crisper and of course follow on into the application of the law. Thanks for that input. last question is leaving the floor open in terms of any inputs we do have the topic at hand is challenges and best practices for speech models and data sets at the ecosystem level or from your experiences any open points any arguments that you would like to make or any sort of a call out that you would like to make to the ecosystem right here it means the call out is means like you know many of the things which were indeterministic or unknown a few days back have started coming into a situation where we are able to crystallize it so I think we need to get into more workshops more discussions to think about it as how to do it take more use cases study more use cases in detail to figure out a framework by which acceptability and evaluations are properly benchmarked.
That’s a good point. Go ahead, Thomas. I have a point to add here, which is, you know, I think there is a certain sense of affinity in this ecosystem towards open source data sets or open models. I would be more thoughtful in terms of how and when these are suitable. Are there particular safeguards you need to put in place for open source data sets is something you need to think about. Are there end -use considerations that need to be tailored? And a good example is, you know, we have, I’ve seen an example where somebody is training a model to detect hate speech, right? Now the safeguards you would put in place to detect hate speech in a model is different from a data set and a model that you would develop to detect regular speech -to -speech translation.
So the decision as to what licensing frameworks, what documentation frameworks are fairly, need to be informed by what end -use case you’re doing, what unique… Thank you. attributes arise as a result of the specific data sets and applications that you are considering and finally on the basis of what downstream users you are expecting the choice needs to be made I think in a little bit more of a conscious fashion Nishant you wanted to say sure this your question actually stimulates me to think about you know English I mean you know sort of models that were built on American English so there have been always a standardization on evaluation in fact NIST evaluation if you look at there have been various protocols and there have been call out every year who beats the best baseline so far achieved I believe we have to do in our country in India at least for Indian languages and it’s very diverse as we just discussed so first of all thinking about how to evaluate and then creating a national level framework for evaluation.
And every year, let’s assess ourselves, all these stakeholders, right?
It could be general evaluation, could be application specific in each language or dialect. And then we really have a leaderboard, which, of course, you know, there are many individual leaderboards across the country, but let’s have only one under Varshini, let’s say, right? And that should be elaborate enough to cater to all languages and dialects. And maybe that’s not the right way, but you think through and make sure every year we make progress in each of those three. I think that has to be brought in in the system to bring competitiveness in a collaborative way, of course. And overall, that can help improve the voice technology in Indian languages. And the reason I’m saying it
mostly from my understanding and experience with the English that has happened, in the past. Yeah. interesting points, Prasanta, in terms of I hear you sort of speak passionately about evaluation and now you’re taking it one step further in terms of how do we really create a unified framework for evaluation within competitive but yet collaborative manner for the ecosystem housed under a central, unpartial entity like Bhashani. This is a great point. I hope the audience found some of these points helpful and enriching. Thank you so much for making time in what is sure to be a very busy event and hope you have a rest of a good day. Thank you. Invite Mr. Shailendra Pal Singh, Senior General Manager, Bhashani to felicitate the speakers.
Thank you. Mr. Amitabh Nag Dr. Prasanta Ghosh Dr. Krithika K .I. Mr. Thomas Salenat I’m Ms. Harleen Kaur Thank you to all our speakers for walking us through this rich tapestry of voice technologies and their life cycle in the Indian context and we hope you read our report and the toolkit and find it useful. Thank you so much. Thank you so much to the audience for staying with us patiently throughout this entire hour. Thank you. Thank you.
Dimitrios Kalogeropoulos: Yeah, hello, everyone. Forgive me, but I will read. So the title for me today is Building Bridges for Tomorrow’s Population. I’m going to sort of delve into AI in healthcare …
EventDevine Salese Agbeti: Thank you. Firstly, we have to align AI with international human rights standards. In that, for example, currently the Cyber Security Authority is working with the Data Protectio…
EventRicardo Israel Robles Pelayo: Thank you very much. Good afternoon, everyone. It is an honor to be here and share a reflection on a topic that is crucial to our present and above all. Our future. Artif…
Event### Community-Led Development Abhishek Singh: One part is that, of course, the way the technology is evolving, there is IP-driven solutions and there are open-source solutions. So what we need to emp…
EventThis implies that active engagement and participation from individuals are key factors in driving meaningful discussions around data and its applications. To build a strong and sustainable data ecosys…
EventImplement layered data strategies using multiple sources (active collection, passive collection, synthetic data) rather than relying on single approaches Use hybrid approaches combining brute force d…
EventMoving beyond the initial 22 constitutional languages to serve broader linguistic diversity requires scalable data collection methods
EventFabio Senne: No, yes, I agree with this discussion of the cycle. It’s interesting because if you take, there’s a very strong correlation between the GDP of countries and the availability of statistics…
EventThank you very much, Peggy, and thanks for having Microsoft here. So, yeah, I want to start with the inception of our responsible AI approach, which was in 2018. And at that stage, you didn’t have cod…
EventChris Martin: Thanks, Ahmed. Well, everyone, I’ll walk through I think a little bit of this presentation here on what DCO’s view is on ethical AI governance. And then my colleague, Matt Sharp, w…
EventIn summary, the fear of government access to data poses a threat to the free flow of data with trust. Microsoft’s statistics highlight the extent of government requests for user data, raising concerns…
EventThe perspective of Internet service providers (ISPs) was provided byMrMike Silber, head of the legal and commercial department at Liquid Telecom, South Africa. Among the key pillars of the eco-system …
EventEmily argues that privacy and criminal justice are not in opposition but can coexist within proper legal frameworks. She emphasizes that privacy protections are essential in healthy democracies but ca…
EventAmandeep Singh Gill reinforced this perspective by advocating for multidisciplinary approaches that embrace socio-technical paradigms. This approach requires standards organisations to engage with dis…
EventEvaluation must go beyond model‑centric metrics to include institutional practices, DIY science, and broader socio‑technical contexts.
EventThe tone is consistently positive, celebratory, and grateful throughout the discussion. It begins with formal appreciation and maintains an upbeat, accomplished atmosphere. The speakers express relief…
EventThe tone began very positively and constructively, with the Chair commending delegations for focused, specific interventions rather than general statements. Speakers expressed appreciation for the Cha…
EventThe tone throughout is consistently formal, diplomatic, and collaborative. Speakers maintain an optimistic and forward-looking perspective, emphasizing partnership and shared responsibility. The discu…
EventThe tone throughout the discussion was consistently formal, optimistic, and collaborative. It maintained a ceremonial quality appropriate for a launch event, with speakers expressing gratitude, shared…
EventThis comment emphasizes the critical importance of collaboration while also pushing for concrete actions rather than just discussion. It sets a tone of practical problem-solving.
EventThe discussion maintained a constructive and collaborative tone throughout, with speakers sharing both challenges and success stories from their respective regions. While acknowledging significant obs…
EventThe discussion maintained a consistently professional, collaborative, and optimistic tone throughout. The speakers demonstrated expertise while remaining accessible to a diverse audience. The tone was…
EventThe discussion maintained a professional, collaborative, and optimistic tone throughout. Panelists demonstrated mutual respect and built upon each other’s points constructively. The tone was forward-l…
EventThe discussion maintained a formal, academic tone throughout, characteristic of a research presentation or conference session. The tone was collaborative and solution-oriented, with both presenters wo…
EventThe tone was primarily analytical and forward-looking, with the speaker presenting evidence-based predictions while acknowledging uncertainties. There was an underlying tone of caution about hype cycl…
EventThe discussion maintained a cautiously optimistic tone throughout, balancing enthusiasm for AI’s potential with realistic concerns about its challenges. While speakers acknowledged significant risks a…
EventThe tone was consistently optimistic yet pragmatic throughout the conversation. Speakers maintained an encouraging outlook about AI’s transformative potential while acknowledging significant challenge…
EventThe discussion maintained a collaborative and constructive tone throughout, with participants building on each other’s points rather than disagreeing. The tone was professional and solution-oriented, …
EventThe discussion maintained a collaborative and constructive tone throughout, characterized by diplomatic language and mutual respect. While there were some tensions around specific content (particularl…
EventThe tone throughout was consistently formal, diplomatic, and optimistic. It maintained a collaborative and forward-looking atmosphere, with speakers expressing mutual respect and shared commitment to …
EventThe discussion maintained a constructive and solution-oriented tone throughout, characterized by: The tone remained consistently professional and forward-looking, with panelists building on each othe…
EventThe discussion maintained a tone of “measured optimism” throughout. It began with urgency and concern (particularly in Baroness Shields’ opening about AI engineering “simulated intimacy”), evolved int…
Event“Diversity of people, languages and cultures makes inclusion a core design requirement rather than an after‑thought”
The knowledge base stresses that diversity of languages, cultures and people is essential for inclusive AI systems, as noted in [S10] and reinforced by Yann LeCun’s comment on the need for multilingual training in [S101].
“Voice AI is a gateway for low‑literacy populations to access public services, health care, education and economic participation, and failure to provide multilingual voice interfaces can reinforce exclusion”
Multiple sources describe multilingual voice AI as a way to bridge digital exclusion and serve low-resource users, e.g., the discussion on multilingual AI bridging gaps in [S73] and the emphasis on voice-driven multilingual interfaces for equity in [S113].
“The initiative is linked to the Hamburg Declaration on Responsible AI for the Sustainable Development Goals”
The Hamburg Declaration on Responsible AI for the SDGs is documented in [S17], confirming the report’s reference to this framework.
“The policy report and developers toolkit are a product of a German‑Indian partnership”
Broader context on Indo-German AI collaboration is provided in [S111] and the German-Asian AI partnership overview in [S108], which illustrate the existence of such bilateral initiatives.
“Institutionalising sustainable open‑source infrastructure is a pillar of the policy framework”
The importance of open-source solutions for governments in the Global South is highlighted in [S104], adding nuance to the report’s emphasis on open-source infrastructure.
There is strong consensus that inclusive voice AI must be treated as a public good, that data and models require continuous, feedback‑driven enrichment, that open‑source governance and robust documentation are essential, and that evaluation metrics need to evolve beyond simple error rates to multi‑layered, context‑aware frameworks. Participants also agree on the need for scalable, replicable policies and toolkits to extend impact globally.
High consensus across government, academia, industry and legal stakeholders, indicating a solid foundation for coordinated policy action, standard‑setting and investment in sustainable voice AI ecosystems.
The discussion revealed three principal fault lines: (1) how to evaluate voice AI—whether through audience perception, multi‑layered/context‑aware metrics, or standardized national benchmarks; (2) the optimal data‑collection strategy—brute‑force field work plus product feedback versus linguistically‑informed modeling to cut costs; (3) the balance between an open‑source public‑good mindset and the legal safeguards required for copyright and privacy. While participants share common goals of inclusivity, continuous improvement, and multi‑stakeholder collaboration, they diverge on concrete pathways to achieve these goals.
Moderate to high. The disagreements are substantive enough to affect policy design, funding allocations, and implementation road‑maps, requiring coordinated effort to reconcile technical, legal, and evaluation perspectives for a coherent voice AI ecosystem.
The discussion was shaped by a series of pivotal insights that moved it from a generic launch event to a deep, interdisciplinary exploration of voice AI in India. Amitabh’s opening remark about AI’s fleeting shelf‑life framed the need for continuous, inclusive data pipelines, which Harleen then codified into a four‑pillar policy. Prasanta’s linguistic‑family approach and critique of word‑error‑rate evaluation introduced strategic efficiency and methodological rigor, prompting the panel to rethink data collection and performance metrics. Ariane’s emphasis on inclusion and cooperation set a moral compass, while Thomas’s legal analysis anchored the conversation in compliance and trust‑by‑design. Together, these comments redirected the dialogue toward user‑centric evaluation, sustainable open‑source ecosystems, and proactive legal safeguards, culminating in a consensus that future progress will require coordinated workshops, national evaluation standards, and a holistic, trust‑engineered approach.
Disclaimer: This is not an official session record. DiploAI generates these resources from audiovisual recordings, and they are presented as-is, including potential errors. Due to logistical challenges, such as discrepancies in audio/video or transcripts, names may be misspelled. We strive for accuracy to the best of our ability.
Related event

