Ensuring Safe AI_ Monitoring Agents to Bridge the Global Assurance Gap
Summary
The closing session of the India AI Impact Summit focused on building a trustworthy, responsible AI ecosystem through “AI assurance,” a framework for measuring and communicating the safety and reliability of AI systems [1-3][5]. Organisers highlighted the recent Delhi Declaration as a catalyst for accountability and policy work, and introduced two new papers-one on strengthening the AI assurance ecosystem and another on closing the global assurance divide-to seed further development and implementation [2][7-10][14-23][25-33]. Participants were reminded that robust national AI strategies must pair industrial ambitions with comprehensive assurance measures, especially as the declaration calls for clearer usage-data sharing and multilingual evaluation standards [15-18][24-30][31-34].
Singapore’s Minister Josephine Teo noted that agentic AI has moved from obscurity to widespread deployment, offering productivity gains but also introducing autonomy-related risks that can amplify harm when systems malfunction or oversight erodes [46-55][58-66]. She argued for a shift from reactive regulation to proactive governance, citing Singapore’s sandbox partnership with Google and a living model-governance framework that emphasizes testing, standards, and independent third-party assurance [69-76][97-109][110-112]. Teo concluded that building confidence in agentic systems requires continuous collaboration with industry and global partners to refine these safeguards [111-112].
Moderator Madhu Srikumar defined AI assurance as the independent verification of trustworthiness, likening it to a safety inspection and stressing its relevance to the Delhi Declaration’s multilingual and contextual evaluation commitment [124-131][132-138]. Frederic Werner highlighted the difficulty of translating high-potential AI use cases across regions, emphasizing that standards must embed human-rights, inclusivity, and local relevance to avoid a “global south” gap [145-166][167-176]. Vukosi Marivate added that limited data collection, annotation, and policy capacity in many Global South countries demand locally-driven evaluation frameworks and capacity-building rather than top-down mandates [231-240][242-247].
Owen Larter described agentic systems as autonomous tools that will permeate everyday tasks and argued that interoperable technical protocols-such as Google’s agents-to-agents and universal commerce standards-are essential for safe interaction and scalability [186-204][205-216]. He warned that connecting autonomous agents to sensitive accounts raises security concerns, and noted ongoing work with VirusTotal and internal safety frameworks to scan for malware and assess risks before deployment [222-223]. Larter also stressed the need for affordable, low-compute models to enable widespread testing and third-party assurance, especially for resource-constrained regions [351-354][355-357].
Stephanie Ifayemi outlined PAI’s two papers, which identify six challenge areas-including language diversity, risk-profile differences, and infrastructure bottlenecks-and propose incentives, professionalisation, and a tiered assurance approach to bridge the global divide [255-267][268-276][277-285][286-295][296-301]. She emphasized that north-south collaboration on standards for agents, such as emerging work by NIST and the Center for AI Standards and Innovation, is crucial to ensure that Global South perspectives are not excluded from future attribution and testing frameworks [292-300].
Closing remarks from Natasha Crampton and Chris Meserole reinforced that AI assurance must become an operational, continuous-monitoring discipline, shared across borders and sectors, and called on all stakeholders to contribute to building the necessary infrastructure and standards to realise trustworthy agentic AI worldwide [411-420][425-433][444-452][456-464][465].
Keypoints
Major discussion points
– Building an AI-assurance ecosystem for agentic systems – The panel repeatedly stressed that trustworthy agentic AI requires a three-part foundation: rigorous testing of technical robustness, the creation of clear standards, and independent third-party assurance providers. Josephine Teo outlined these pillars and argued they are essential for “building confidence” and for differentiating safe products in the market [96-110].
– Closing the global assurance divide – Participants highlighted that current assurance practices are uneven, with major gaps in multilingual evaluation, infrastructure, and risk-profile understanding for the Global South. Stephanie Ifayemi listed six challenge areas (language diversity, risk profiles, infrastructure, etc.) and noted the need for “north-south collaboration” to avoid exclusion [260-269][276-283][292-300]; Vukosi Marivate emphasized limited data-annotation capacity and policy expertise in many low-resource regions [231-240]; Natasha Crampton warned that without deliberate action the shift to agents will “make that divide even worse” [415-418].
– Technical standards and interoperability for agents – Industry representatives described concrete work on protocols that let agents communicate with each other and with services (e.g., “agents-to-agents protocol”, “universal commerce protocol”) and stressed that standards are a prerequisite for safe deployment at scale. Owen Larter explained Google DeepMind’s efforts to define such standards and to embed security checks (e.g., malware scanning of downloaded skills) [198-205][222-224]; he also called for affordable, low-compute models to broaden access [351-354].
– Collaborative, shared-responsibility model – The discussion repeatedly called for multilateral, public-private partnerships and a professionalised assurance community. Rebecca Finlay framed the Delhi Declaration as a catalyst for “accountability work” [2]; Josephine Teo described Singapore’s “sandbox” with Google and a “live” governance framework [68-76]; Frederic Werner highlighted AI for Good’s inclusive network of UN agencies and NGOs [307-315]; Chris Meserole summed up the need for “shared responsibility” and urged everyone to get involved [452-456].
Overall purpose / goal
The session was convened to launch and contextualise two new Partnership on AI papers on AI assurance, to align the conversation with the newly adopted Delhi Declaration, and to mobilise a global, inclusive effort that equips policymakers, industry, and civil society-especially in the Global South-to develop, test, and govern trustworthy, agentic AI systems.
Tone of the discussion
– Opening (0-15 min): Formal and forward-looking, emphasizing the significance of the Delhi Declaration and the need for a robust assurance ecosystem.
– Middle (15-35 min): Becomes more technical and urgent, with detailed descriptions of testing, standards, and the risks of autonomous agents, while simultaneously stressing inclusivity and the challenges faced by low-resource regions.
– Closing (35-56 min): Shifts to a collaborative, hopeful tone, featuring calls to action, acknowledgment of shared responsibility, and a rallying message to “download the reports, get involved, and roll up our sleeves” to grow the assurance “seed.”
Overall, the conversation moves from setting the agenda, through deep-dive problem-solving, to a unifying call for collective, cross-border effort.
Speakers
– Chris Meserole – CEO of FMF; Executive Director of the Frontier Model Forum, focusing on frontier AI safety and security [S3].
– Vukosi Marivate – AI researcher and co-founder of Lilapa AI; leads African language NLP initiatives such as Masakane, building AI for Africans by Africans.
– Frederic Werner – Chief of Strategic Engagement, International Telecommunication Union (ITU); works on AI governance, standards and AI-for-Good initiatives [S5].
– Josephine Teo – Minister for Communications and Information, Singapore; leads Singapore’s AI assurance strategy and government-industry collaborations on agentic AI [S7].
– Natasha Crampton – Chief Responsible AI Officer, Microsoft; advocates for operational AI assurance across borders, languages and cultures [S9].
– Stephanie Ifayemi – Senior researcher at the Partnership on AI (PAI); co-author of reports on closing the global AI assurance divide. [S12]
– Madhu Srikumar – Moderator of the panel; senior leader at the Partnership on AI, involved in AI safety and policy coordination [S14].
– Rebecca Finlay – Representative of the Partnership on AI; focuses on AI assurance ecosystems and policy frameworks [S17].
– Owen Larter – Senior staff, Google DeepMind (also noted as responsible-AI public policy lead at Microsoft) [S19]; works on agentic AI standards, protocols and safety research.
Additional speakers:
– Rameca – Mentioned by Chris Meserole in closing remarks; no further role or title identified in the transcript.
Rebecca Finlay opened the closing session by reminding participants that the India AI Impact Summit brings together more than a dozen countries to “unlock innovation through trustworthy, responsible, beneficial AI” [1] and that the recent Delhi Declaration – adopted the day before – provides a timely catalyst for “accountability work” and the development of scientific evidence for policy [2]. She announced that the Partnership on AI (PAI) will launch two new papers that originated at the Paris Action Summit: one on “Strengthening the AI Assurance Ecosystem” and another on “Closing the Global Assurance Divide” [7-10][14-23]. QR-codes for the papers will be displayed immediately after her remarks so attendees can download them on the spot [24-33][25-30].
Madhu Srikumar then defined AI-assurance as “the process of measuring, evaluating, and communicating whether AI systems are trustworthy… a safety inspection, but for AI” [124-130]. She linked this definition to the first Delhi Declaration commitment, which calls on Frontier AI companies to share usage data with progress tracked through 2025 [46-55], and to the second commitment that urges “multilingual and contextual evaluations” to ensure AI works across languages, cultures and real-world conditions [132-138].
Singapore’s Minister Josephine Teo explained that “agentic systems have taken off” since the Paris summit, offering productivity gains but also introducing “new risk” because autonomy can amplify harm when systems malfunction and human oversight erodes [58-66]. She advocated a shift from “reactive regulation” to “proactive preparation”, describing Singapore’s sandbox as a place where the government “eats its own dog food” by testing agents in partnership with Google [69-73]. The sandbox operates under a “live model-governance framework” for autonomous agents [74-78]. Minister Teo outlined a three-pillar assurance model-testing, standards, and independent third-party assurance-as essential for building confidence and giving companies a market differentiator [85-89][97-109][110-112].
Vukosi Marivate (Masakane) reinforced the Global South perspective, observing that “there is likely not as much collection … as in Europe or North America” and that limited data-annotation capacity makes assurance feel “far away” from developers [231-235]. He argued that effective assurance requires “local understanding” and “capacity and capabilities of policymakers”, rejecting a purely top-down approach [236-240][242-247].
Frederic Werner (AI for Good) highlighted the difficulty of translating high-potential AI use cases across regions, noting that “trust is the biggest challenge” and that standards must embed “human-rights, inclusivity” and be adaptable to local contexts [145-166][167-176]. He warned that without such safeguards the promise of AI for Good could falter, especially for the 2.6 billion people still offline [173-176]. Werner also described the summit as the “Davos of AI” [145-166].
Owen Larter (Google DeepMind) described agentic AI as autonomous tools that can achieve goals on behalf of users – for example, arranging a dry-cleaning service without step-by-step instructions [186-190]. He announced concrete technical work on an “agents-to-agents protocol” and a “universal commerce protocol” to enable safe, interoperable communication, likening them to early internet standards such as HTTP [202-209][205-208]. Larter noted that the U.S. Korea Collaboration (US KC) agent-standards initiative is being launched by the U.S. government this week [210-214]. He warned of security risks when agents access sensitive accounts and detailed collaborations with VirusTotal to scan downloaded skills for malware [222-224]. To broaden access, he highlighted the development of low-compute “Flash” models that are “relatively cheap, quite efficient, very, very quick”, intended to lower testing costs for resource-constrained settings [351-357].
Stephanie Ifayemi (PAI) summarised the two papers, identifying six challenge areas that keep the assurance divide open: language diversity, differing risk profiles, infrastructure bottlenecks, incentive structures, professionalisation of assurance practitioners, and the need for a tiered assurance approach [255-262][268-276][277-285][286-295][296-301]. She gave an example of risk-profile priorities: Pacific Island nations focus on environmental impacts, whereas other regions may prioritise privacy or fairness [166-176]. She stressed that “north-south collaboration” is vital so that Global South countries are not left out of emerging standards on agent attribution and identity [292-300]; the Centre for AI Standards and Innovation (NIST/Casey) has released an opportunity to comment on a paper around agent attribution and identity [292-300]. The paper also calls for incentives such as insurance products and accreditation schemes, citing the UK AI Safety Institute’s $100 million inaugural fund as an example [363-376]. Additionally, Ifayemi referenced a PAI paper on real-time failure detection and monitoring of agents [420-423].
All speakers agreed that a comprehensive, global AI-assurance ecosystem-combining rigorous testing, clear standards, and independent verification and embedded from the start of system design-is essential. They also concurred on the importance of multilingual and contextual evaluation to make AI trustworthy across diverse languages and cultures [133-136][24-26][262-267][429-433][452-455]. Finally, they emphasized that global collaboration-among multilateral bodies, governments, industry, and civil society-is required to avoid exclusion of the Global South [140-144][307-314][291-300][433-438][452-455].
Disagreements emerged around implementation. Minister Teo framed assurance as a “strategic competitive advantage” that companies can leverage, whereas Ifayemi and Natasha Crampton argued that assurance should be treated as shared public infrastructure, requiring incentives such as insurance and professional accreditation [85-89][433-438][363-376]. Larter advocated for top-down, universal technical standards (agents-to-agents, universal commerce) to ensure interoperability, while Marivate warned that such standards risk being “top-down” and missing local values unless capacity-building is prioritised [202-209][236-240]. On compute resources, Ifayemi highlighted the massive GPU-hour requirements of current evaluation pipelines [280-282], whereas Larter suggested that the new low-cost Flash models could mitigate these barriers [351-357]; the tension reflects differing views on whether technology alone can close the infrastructure gap.
Key take-aways included:
* Minister Teo’s proactive sandbox approach, positioning the government as an early-adopter and credibility builder [69-73];
* Werner’s reminder that “trust is the biggest challenge” and that standards must embed human-rights;
* Larter’s analogy of agent protocols to early internet standards, providing a concrete roadmap for interoperability [202-209];
* Marivate’s emphasis on local data and policy capacity, underscoring the risk of unsuitable top-down frameworks [231-240];
* Ifayemi’s systematic breakdown of six challenge areas, offering a clear agenda for closing the assurance divide [255-262][292-300];
* Crampton’s assertion that assurance must become an “operational discipline” built into the development lifecycle, with continuous post-deployment monitoring [425-433][420-422].
The panel produced concrete actions and identified unresolved issues. The two PAI papers will be released via QR codes for immediate download and community feedback [14-23]. Singapore will continue operating its agentic-AI sandbox, keeping its governance framework “live” for iterative improvement [69-76]. The Delhi Declaration’s commitment to multilingual evaluation provides a policy anchor for future standards work [25-30]. Google DeepMind will advance interoperable agent protocols, make low-compute Flash models publicly available, and support the US KC standards initiative [202-209][210-214][351-357]. The ITU and other multilateral bodies were urged to facilitate inclusive standards development, capacity-building, and the creation of shared evaluation infrastructure [140-144][307-314]. Open questions remain on designing scalable multilingual benchmarks, ensuring equitable compute access, operationalising tiered assurance that matches risk profiles, governing real-time monitoring and accountability for autonomous agents, and expanding the pool of independent third-party auditors, especially in low-resource regions [262-267][280-282][351-357][420-423][107-109].
In the closing keynote, Natasha Crampton stressed that AI-assurance must move from theory to an “operational discipline” embedded throughout the system development lifecycle, with “continuous monitoring, real-time detection and clear accountabilities” for agentic systems [411-422][425-433]. She called for shared evaluation infrastructure, common taxonomies, and investment in Global South capacity, framing assurance as the foundational infrastructure that will enable trust and adoption of autonomous agents [435-438][440-443]. Chris Meserole echoed this sentiment, summarising three core themes – evolving assurance understanding, global collaboration, and shared responsibility – and issued a final call to download the reports, join the collaborative effort, and treat assurance as core infrastructure [444-452][458-465].
Overall, the session linked the Delhi Declaration’s policy momentum with concrete technical and governance proposals, highlighted the urgency of closing the global assurance divide, and produced a clear roadmap of collaborative actions required to build a trustworthy, inclusive AI ecosystem for the emerging era of autonomous agents.
in 19 -ish countries, and we’re all focused on what does it mean to unlock innovation through trustworthy, responsible, beneficial AI. And so, of course, no surprise, gatherings like the one that we’ve had this week are really crucial for the work we do, and with the Delhi Declaration adopted yesterday, this is an even more important moment to build on where we have come from, to lean in, and to really get to work around some of the questions of the accountability work that needs to be done, the scientific evidence that we need to build around frameworks and good policy moving forward. And, of course, it’s extraordinarily important that this is happening in India, that it’s bringing a whole set of voices and perspectives and leadership that is not optional.
At PAI, we believe… We believe that that is fundamental to building a global community committed to this work, and it’s great… to see it in action this week. So thank you all for being here with us. So today we’re going to give you an opportunity to see two of our latest papers. These are papers that were begun out of the Paris Action Summit. And at that time, as we were thinking about moving into action and invasion, we felt that work needed to happen with a good sense of what the assurance ecosystem looked like. So we’ve had working groups underway developing these two new resources. They’ll be up on the screen at some point. You’ll be able to get a QR code and download them.
Feel free to talk to any of us. The first one is Strengthening the AI Assurance Ecosystem. It really looks at telling and helping national policymakers, if you’re building a robust industrial AI strategy, you better have a comprehensive AI assurance strategy as well. And you need to be able to do that. And so we’re going to be talking about that. We need to think about all those actors and what they look like. We’re going to hear about one of the experts, of course, in this as soon as the minister comes to join us. The second piece, which is really important, we think, for this conversation is what does it mean to do AI assurance? globally around the world?
How do we close the divide that exists? What is different about the challenges faced by countries in the Global South versus others? So we’re really hoping that these resources not only are good, substantive contributions to the work that needs to be done, but the idea is to just catalyze, you know, sort of plant a number of seeds across a number of ways in which assurance works so that those can grow and really come to life out of this. And just two quick comments on that. Now that we have half the declaration, and so now we can, as opposed to earlier in the week, start to articulate it, really leaning in with regard to the commitments around, in commitment one, around usage, clarity around usage data, really trying to give some empirical grounding to this work.
In 2025, in our progress report around foundation model, impact. We made exactly this recommendation. We directly called for Frontier AI companies to share usage data. We’ve been tracking progress, and there has been some progress in that regard. So we are delighted to see this particular commitment to come about and to start to see some standards about how that usage data is going to be shared. So we’re very pleased to see that work. We’re also very pleased to see the second commitment around strengthening multilingual and use case evaluations. And you’ll see, if you do download the report on the global assurance divide, that that is clearly a key piece of work that needs to happen. So this afternoon, we are going to give you an extraordinarily expert panel that brings a real diversity of perspectives to this work.
And so we want to take the assurance question and apply it to agents. Because that’s where the world is going. We’re all seeing them in the news every day. We’re seeing them integrated into foundation model systems. So what does it mean? to take what we know about assurance and think about the applications that agents will add to the complexity of that work. So let me begin by introducing our first speaker. She’s probably been one of the most visible ministers this week because of the extraordinary leadership that Singapore has taken when we think about AI assurance. I know you’re going to talk a little bit about that. Such a pleasure to welcome you, Minister Josephine Teo.
She’s going to come and say some words for us before the panel begins. Thank you.
Thank you very much, Rebecca, and also very much appreciate Partnership on AI for the invitation. When this series of summits first began in Bletchley, AI agents were not a thing. Nobody was talking about them, even just 12 months ago. When we had the AI Action Summit in Paris, it has barely crept into the conversation at the time. the preoccupation was all around DeepSeq and what it told us about the capabilities that is emerging out of China. But today, as Rebecca correctly identified, agentic systems have taken off. They are increasingly being used and we need to have a better grasp on how to deal with this issue because agentic AI certainly offers transformative possibilities in how we delegate and orchestrate work when deployed strategically.
Agents functions as invaluable teammates, unlocking productivity gains and time savings, which we all want more of. However, I should also add that this autonomy, the very nature of how agents can be helpful to us is autonomy. This autonomy also introduces new risk. The potential for harm increases when systems malfunction and human oversight is normalized. We are no longer present. or at least diminish to a very large extent. The implications may be complex and not fully predictable. So the way my colleagues and I have been thinking about this is that there needs to be a shift. There needs to be a shift in terms of how we might want to rely on reactive regulation to a different kind of stance, which is proactive preparation.
And in Singapore, that’s what we’ve been trying to do. We’ve tried to be proactive about governing the new risks in the era of agentic AI. And I think it starts with the government itself being a leader and not a laggard in using agentic AI. We need to test it. We need to look at how the solutions can not only enhance public service delivery, But we also need to be able to put in place more controls. Government is high risk because the touch point with citizens are very sensitive. No citizen and no government wants to make serious mistakes when they interact with their citizens, telling them things about their health, telling things about their social security, telling them about things to do with their benefits that are not accurate, and having them not just being told but acted upon.
So this need to ensure that we know what we’re doing is a very high one. And the way we are also thinking about it is to try and work with industry. So, for example, between Google and Singapore government, we have a sandbox on agentic AI. It’s one of the ways. We think we can, in a way, eat our own dog food. Try it. You know, does it taste all right? hurt us in a very significant way because if we were not able to do so, I don’t think we have a lot of credibility in terms of how we want to govern agentic AI. But we can’t wait, you know, for the dog food to materialize in its consequences for ourselves.
In the meantime, my colleagues have put together a model governance framework for agentic AI. It is meant to provide practical support to enterprises so that they can also deploy autonomous agents responsibly and to mitigate the risk. We know that this is not a complete solution and this document that we put out has to be a live document. We very much encourage feedback and as a way for us to keep improving the guidance to enterprises. Can I also just add that as we do this work, what is the… meaning and what is the purpose behind it. Ultimately, it is to build confidence in the use of agentic AI systems. And we think that at many levels, this confidence has to be presented, has to be demonstrated to boards of organizations, to customers, to other stakeholders.
And how do we demonstrate that the risks have been managed well? And that is where the assurance ecosystem that Rebecca talks about comes in. It is an absolutely essential part of building trust over the medium to longer term so that there is a way, a foundation upon which agentic AI systems can be made more readily adopted and available. I should also say that for companies that are thinking about it, and I see Microsoft here, and I’m sure that there are other companies represented. If we are to trust these agentic systems, the safety aspects should not be downplayed. And I would venture to say that a company that is able to give a high assurance on safety will find itself being differentiated from their competitor.
It’s more likely to translate into stronger interest in a product and service. So rather than think of it as something that you are unhappy to comply with, think of it as a strategic competitive advantage. And that is a way I think that will give us the confidence to put it forward. The question, however, is that are we completely without experience in this regard? And the answer is no. In aviation and healthcare, there are a lot of measures being put in place to give assurance to passengers. When we board a plane, we usually expect to arrive. when we visit the hospital, we generally expect to be treated, except for disease conditions that are not yet well understood.
But the trust in these systems have to be built over time, and they don’t come without some assurance being put in place. The question is for AI, and specifically agentic AI, what would be the components? What leads to an assurance ecosystem system that would be robust enough? We think that there are at least three components. The first is that there must be testing. We need some way of making sure that there are technical assessments of the system to make sure that the systems are robust, they are reliable, and they’re safe. And a lot more work needs to be done in this space, developing the testing methodology, building the testing datasets, and also making sure that the testing of agentic systems take into account that these systems are robust.
These systems are going to be much more complex than multi -agents, for example, and it’s not just the output, but the in -between steps, how the reasoning takes place, and what is the orchestration that is being built into the GenTech systems. So that’s the first, testing. Second is that eventually we will need standards. We cannot just define what is good enough. We also need to assure the users that it has met expectations in safety and reliability, and so these are still very early days. Thirdly, we think that this ecosystem cannot do without third -party assurance providers. It’s one thing to claim that your agentic AI system is safe. It’s another thing to have someone attest to the safety of it.
So these could be technical testers, auditors, and they provide independence, augment in -house capabilities, and also help to identify the blind spots, and it’s necessary for us to strengthen this pool as well. So I’m going to stop here. I want to conclude my remarks to say that Singapore is actively building these components. and we welcome conversations with partners and colleagues because we know that we cannot do this alone. So we look forward to discussions in the three panels on how we can meaningfully collaborate on assurance for agentic AI. Thank you very much once again, Rebecca.
Thank you. Thank you. We’re all here. It’s the end of the conference, and we’re all intact. Thank you so much, everyone, for joining us. Thank you, Minister Teo, for the keynote. One quick note before we dive in. Our panelist, Fred, has a flight to catch, so he’ll need to slip away a few minutes early, but, Fred, we’ll make sure we get your best insights before you escape. No pressure. So we are the last session, so we are standing between you and whatever you have planned right after. So I promise we’ll make this worth it. We have an incredible panel and a lot of ground to cover. So before we get started, what do we mean by AI assurance?
Because you’re going to keep hearing that term quite a bit here. So really put simply, AI assurance is the process of measuring, evaluating, and communicating whether AI systems are trustworthy. Are they safe? Do they work as intended? Can the public actually trust them? So really think of it like a safety inspection, but for AI. You wouldn’t want, you know, you’d want an independent inspector checking a building. Not just the builder saying, trust me, it’s fine. So really, AI assurance is about independent verification, as Minister Teo went over. And why this panel? Why now? So the summit unveiled the New Delhi Frontier AI commitments just yesterday. And the second of those commitments is about strengthening multilingual and contextual evaluations.
So really making sure AI systems work across languages, cultures, and real world conditions. And really, that’s the assurance challenge in a nutshell. And our panel today is about whether we are actually equipped to deliver on that promise globally and not just in a handful of countries. So really, our panelists span the ITU, Google DeepMind, the University of Pretoria, and PAI. So we have the range to actually wrestle with this question. So with that, I’m going to get into our first question for today. Fred, that’s going to be you. ITU has been convening on AI governance through AI for Good and working on standards across borders. So really, when we talk about AI assurance, what does it mean to you, ensuring that these systems are safe and trusted?
And how do we think about assurance when 2 .6 billion people remain offline and may be excluded from the frameworks being designed?
Yeah, thanks for that great question, and thanks for having me here. So I think that safe to save is no. There’s a huge shortage of high -potential AI for Good use cases, everything from affordable health care to education for all. food security, disaster response, and also looking at more applications in the physical manifestations of AI that you see in robotics, embodied AI, brain -computer interface technologies. The best part of my job at AI for Good is I see these use cases coming across my desk every day. And I can tell you when we started AI for Good in 2017, it was mainly in PowerPoint slides. They didn’t really exist. But as we got into, say, the 2023 with GenAI, last year, the unofficial theme of AI for Good was the rise of the AI agents, a bit scary, Terminator -like, but that’s what people were talking about.
And we’re really going from sort of the promise to the pilots to the use cases and now scaling. Now, when you’re looking at these use cases, I think one big challenge is trust. How do you trust them? I mean, there’s always the good intention, right? But is that trust there? And also, are they replicable and scalable? And I’ve yet to see, you know, high potential use case developed in Brussels work equally well in Johannesburg and Shenzhen and maybe Panama. Like, it’s just, we haven’t really reached that yet. And if you look at these sort of fast -emerging governance frameworks around the world, whether you’re in the U .S. or EU or China or everything in between, I think there’s a lot of good intentions, a lot of good thinking.
But how do you turn those ambitious words and principles into actions? Because the devil is in the details, and I think standards have details. So when you’re thinking about how do you – especially when you start to get into AI agents and you really – that trust element is becoming ever more critical, how can you bake in a lot of the common sense things that we’ve been talking about all week or even for the past years at AI for Good? Are they trustworthy? Are they verifiable? Are they secure? Are they safe? Are they designed with human rights principles in mind? Are they inclusive? Are people from the global south appetizing? Are they able when we’re drafting and developing these standards?
So these are not always natural reflexes, and at the same time, it’s hard to turn words into action. So one of the tools, I’m not saying it’s the only tool, but I think as these solutions start to scale and businesses start to interact internationally or even internationally, at one point you’re going to need standards, and it’s within those standards that you can kind of bake in those common sense principles that we’ve been all talking about. And I forget the last part of your question. It was really a question about… Oh, connectivity. That was it, yes. …2 .6 billion people who remain offline, yeah. Yeah. Yeah, so, you know, ITU’s mission is connecting the world, and a third of the world is still offline.
And, you know, large parts of the world actually have connectivity, but there’s actually no incentive to connect. So if there’s no content in your local language or dialect or no access to government services or useful applications that are fit for purpose in where you live… you know there’s why would you connect so i i think ai can actually help to remove that friction where you have a lot of bottlenecks for example literacy disabilities again like content in your own language or dialect so i think one thing is closing the connectivity gap but the other thing is actually using ai to remove that friction and the last thing i would say is i think sometimes there’s a comparison where um if you take east africa for example and you have the the mobile payment miracle or revolution with mpeza right you effectively leapfrog decades of infrastructure legacy infrastructure and there may be a kind of optimism that well the same thing could happen with ai in the global south maybe but i don’t think we can take it for granted that if that happens it goes in the right direction it’s not a guarantee that just by putting the tool in the hands of the people that they’re going to create value they’re going to use it responsibly they’re going to use it to solve local challenges build more cohesion and community, but those aren’t for granted.
So I think that whole AI skilling angle of really educating people from grade school to grad school to diplomats and everyone in between, if you don’t address that literacy piece, then it’s just going to be a crapshoot. We’re not sure
Great. I mean, it’s a good transition. Speaking of standards, Owen, Google DeepMind recently deepened its partnership with the UK AI Security Institute on safety research, so including work on monitoring chain of thought and evaluations. So really from an industry perspective, you know, what does robust AI assurance look like? Where do you think the gaps and opportunities are between what Frontier Labs kind of do internally and what’s needed for broader public trust?
Yeah, thank you, Madhu. And thank you to Rebecca and Partnership on AI for convening this really important conversation. And a big congratulations to our Indian hosts for a fantastic week at the summit. This week, maybe start talking a little bit about what… agents are, we’re increasingly excited about them at Google DeepMind. They’re essentially more autonomous systems that instead of just following basic instructions can actually achieve goals. So let’s say I want to get my suit dry cleaned on Thursday, instead of taking an AI system and say, find a website for a dry cleaning company, see if it’s open on Thursday, see what the hours are, see if it’s within my budget. You can just say to your agentic system, go find a way to dry clean my suit, make sure it’s being picked up by Friday, and it will go and interact with those different websites and try and find a way to meet your goals.
All kinds of fantastic applications already that we’re seeing right across the economy. We’re using increasingly agentic coding systems at Google and Google DeepMind to do a lot of our coding. So we have our anti -gravity framework, which is fantastic. You can interact with it in normal, natural language and say, build me a website, build me a tracking system to follow a particular bill that I’m interested in, and it will really help you achieve these goals. I think you’ll increasingly see agents used right across the economy as well. I think we’re just in the early years of a new AI enabled agentic economy. I think you will have very normal interactions with agents on a regular basis that will pop up on your phone screen and say, hey, it’s been a few weeks since you bought toothpaste.
Would you like me to go and take care of you and get some more toothpaste for you? You mentioned standards, which I think is going to be a critical part of getting all of this right. There’s a couple of dimensions to the standards. So firstly, we need to create the sort of technical protocols to actually underpin this agentic economy. So we’ve been trying to contribute to this conversation. There is the agents to agents protocol that Google has launched. There’s the universal commerce protocol. This is basically a way of helping agents talk to each other and agents talk to websites so that you have standardized sets of information. An agent will basically come to an agent or an agent will come to a website and say, this is my ID.
These are my capabilities. These are what I’m trying to do. I think in the same way that we developed protocols and standards in the early 90s to underpin the internet like HTTP, like URL, we’re going to have to build these out. There are then also assurance standards, which are related, but I think very important as well. We need to make sure that we’re understanding the capabilities of these systems. We need to keep making progress on how we can test for the risks that they may pose and then work right across society to come up with ways to mitigate that. I think the work that the safety and security institutes are doing around the world is absolutely critical.
So Minister Teo mentioned some of the work that we’re doing in Singapore. The UK Security Institute has been world leading on this. I think this is an area that we’re going to see more from the ACs and KCs right across the world. The US government also, through their KC, launching an agent standards initiative this week as well.
Great. And if you don’t mind a follow up question, that’s a really important point that you pointed out, that we currently need interoperability. We need agents to flourish. We need to find a new way to kind of imagine this paradigm. But I’m curious if there’s a safety challenge when it comes to agents. Instead. yeah that keeps you up at night
yeah i think there are definitely risks to be mindful of so i think agent security is something that we should all be thinking a lot about if we’re connecting increasingly autonomous systems into different accounts different email accounts different bank accounts i think we want to be pretty careful about how we do that and come up with superior security protocols and that can be helpful there we’ve actually been doing some work with virus total which is part of the the google security operations team at google to make sure that when certain agentic systems are downloading skills or downloading apps from agentic websites they’re being scanned for malware or vulnerabilities that are being detected so that they can be addressed before people put them onto their their computer i think there’s also a concern that these agentic systems could create new capabilities that could be misused so across the cyber security dimension domain for example i think some of the frameworks that we have already at google deep mind will be helpful here so we have our frontier safety framework which we use to test models before we put them out into the real world.
We think about how those models are going to interact with systems, how they might be parts of agents as we’re doing that work.
All right. Just speaking for myself, I can’t wait to use agents. I feel like it’s a lot of developer communities that have, you know, started playing around with these systems. But I imagine it’s reaching lay consumers very soon. So, Vukosi, you have built Masakane for African Language NLP. Really building AI for Africans by Africans. When assurance frameworks are designed in the U .S., U .K., or Singapore, how well do they translate to context where the data, the languages, the deployment conditions are completely different? What do we think we’re missing?
that we do get to understand that it’s a very different thing. My experience has been that there’s likely not as much collection in Europe or North America or annotation as much as is happening now in the global south. But then that also means that it feels like it’s further away, right? It’s not where the developers are. And that then requires more of this conversation in one place. So that, again, there must be kind of a local understanding. The last piece to that is going to be the capacity and the capabilities of then the policymakers in those countries to be able to understand that part. It will not be top -down. I don’t believe that. It will be them understanding whether it’s labor laws, it’s data governance, it’s just monitoring of systems once they’re on.
If there is not that capacity or capability to actually do those things, again, it’s more automated. direction that is not necessarily what the values of those people actually are.
Those are important words right at the end of the conference, knowing just how much we have to get done here. So Steph, over to you. PAI just released work on closing the global assurance divide, a lot of what Bukosi just mentioned. What are the concrete gaps you’re identifying? Identifying? Is it capacity to conduct third -party evaluations, as Minister Teo mentioned? Is it access to the models being tested, or is it something else? What would it take to really close those gaps?
Awesome. Thanks so much, Maru. And as one of the PAI folks, thanks for being here, everyone. It’s great to see you all. I know it’s a Friday evening, so we’re in between you and cocktails or whatever you have planned, so we very much appreciate it in the last session of the day. So I think it’s such a good question, and I think your question talks about some things that recognize that those challenges aren’t actually just Global South Challenge. I just want to start with the fact that we’ve released two papers. One is on closing the assurance divide, and the other is how we strengthen the global assurance ecosystem generally. And the question of access is one that impacts us all, actually.
In the UK, for example, the Department of Science, Innovation and Technology, I believe that’s what DSET stands for, has made access to models as a means to support insurance a priority for 2026. And so I think that there are a few shared challenges, and I’ll come back to the point around north -south, actually, collaboration in a second. But just thinking about closing the AI insurance divide, we released this paper, and in it we talk about around six challenge areas, from infrastructure to skills. We talk about languages and risk profiles, so the things that you’ve heard about from Vukosi and a lot of the other speakers. So I’ll give you a sense of some of the examples that we have.
So on language, we’re at the India Summit, of course, and India has over, I believe, 120 languages and 19 ,500 dialects. When we think about Africa, we have about… 1 ,500. or 3 ,000 spoken languages in itself. So when we think about benchmarking and evals and designing evals that think about how those systems are deployed in these various contexts, it’s so important to think about languages, and that just generally, I think, demonstrates the complexity of designing evals to meet the needs of this kind of diverse language ecosystem. Rebecca mentioned at the start that we had the declaration, of course, yesterday, and the commitment therefore in the declaration to multilingual evals is really critical. Of course, there’s still a lot of work to determine how do we actually do that in practice in the most effective way, and accounting for that complex and wide language diversity, but that’s one area that we talk about.
The second in terms of closing the assurance divide that we need to account for is risk profile, interestingly. in this paper, we actually interviewed a lot of assurance and safety experts internationally. And one of the things that they mentioned was differences in what they might prioritize when you think about assurance. So when you think about the Pacific Island nations, for example, they would be thinking about assuring for environmental impacts differently than maybe environmental impacts would be considered as important in the US at the moment, for example. Last year, we published a paper on post -appointment monitoring. And in that paper, we talk about sharing kind of data from companies. And one of the points that we talk about is environmental impacts.
And so it’s really interesting that I think in terms of closing the divide, it might the starting point or what you put emphasis on might vary. And that’s important to note as we’re designing things like documentation, description, and so on. And so I think it’s really interesting to see what we’ve kind of focused on. The third I’ll just quickly mention is, of course, infrastructure. I think we’ve probably all heard a lot about this throughout the summit and this idea of what it means to be sovereign and which parts of the stack to prioritize. And that is really, really important. But there are tradeoffs. So in terms of importance, I was looking at a stat that Stanford’s Helm evaluations used over 12 billion tokens and they required 19 ,500 GPU hours alone.
And so when you think about the kind of infrastructural needs, it’s so it creates barriers for a lot of countries in the global south. But I was at an interesting roundtable, actually, that even Carnegie was convening. And we were talking about the fact that how do you balance assurance needs? Where do you start from across the value chain? So at the moment, a lot of the discussion is kind of upstream. Right. We need to have that infrastructure in place. That’s the point that we need to start with. But how do you do that in parallel and how much of that resource should be put into other foundational tools for assurance, such as documentation artifacts, which is another area that we focus on a lot at PAI.
And so I think there will be a lot of questions around how do you weigh up all these challenges, again, knowing that even kind of the G7 countries, the UK AI Safety Institute started with an inaugural $100 million alone. So that prioritization and balancing is going to be important. The last thing I’ll say, coming back to agents, and I will talk about this a bit more, is the North -South collaboration is a real opportunity as we think about agents. And it’s important that global South countries aren’t always playing catch up. I think that’s a point that has come through for me from the summit, which is that NIST or the Casey, so the Center for AI Standards and Innovation.
And this is almost like a test for me of kind of saying. These names of these institutions through this panel. But they just announced a few days ago that they’re going to be working on standardizing work around agents, including that they’ve released an opportunity to comment on a paper around agent attribution and agent identity, I believe, which is really interesting. And there’s, of course, a lot of push for countries to collaborate. And you see a lot of the safety institutes collaborating on questions around assuring agents in the global north. But how do we ensure that global south countries aren’t missing from that? That will have implications for how we attribute agents, how we test agents.
And we shouldn’t just assume, again, whilst those upstream points and infrastructure is important, that in parallel, they’re ultimately part of these kind of thinking ahead questions and frameworks.
Great. So I’m going to take the moderator’s prerogative and have us do a rapid fire. And by rapid fire, I mean every answer is a minute and 30 seconds, which, let’s be honest, is fairly rapid for AI policies. I’m going to start with Fred because I’m more nervous about your flight than perhaps you are. So a minute and 30 seconds. What role should multilateral institutions like ITU play in making globally inclusive AI assurance happen?
Yes, I think AI for Good has a pretty ambitious goal, right? It’s simply put, it’s to unlock AI’s potential to serve humanity. Pretty big. But we can’t do it alone and no one can. It’s not one country and not one institution, not one NGO. That’s why we have 50 plus UN sister agencies as part of AI for Good, but also making great efforts to bring as many diverse voices to the table from the global south, from NGOs, from civil society. It’s always been extremely open. I like to think of it as the Davos of AI, but instead of being very exclusive, it’s extremely inclusive, right? So I think that’s a bit of a philosophy behind AI for Good.
You know, I think the AI, it’s just moving so quick. So the focus has always been on practical applications, practical solutions. But in doing that, you can tease out the next generation of standards, of policy recommendations, of collaboration and partnerships around the world. So I like to think that in the doing, you have the learning, right? And it’s not just about talking. And that’s what AI for Good has always been all about.
Thank you. That was incredible. You have 56 seconds left. So, yeah, I’m going to move us ahead to Vukosi. So Singapore’s aim is test once and comply globally. So from a Global South perspective, what would make that interoperability real rather than a form of exclusion?
Yeah, that’s a hard one. I think going back to I think the other thing that’s come out of a lot of the sessions here has been on the evaluations and how evaluations are used. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. because either on one side it’s going to take you a lot of resources to actually either put up the evaluation to be so all -encompassing on the other side to run it is going to be a lot but then when it comes down to the user which I think was our second panel that I was in this week and you’re trying to think about personalization if you’re going down to an individual what experience do they actually have and how do you get to there?
There will be some more high level safety things that will likely come out and people will be working on that and maybe that’s what I’m thinking Singapore is trying to go for but then when we’re getting to what the individual experience is given that you have the stochastic systems you don’t know what is going to happen necessarily. I know we’re trying to do that but we don’t really know what’s going to happen at the individual experience and we can’t remodel all of that. It’s going to require that again you you do have closer to where the user might be things on what actually that experience was. So one of the hats I wear is I’m a co -founder of Lilapa AI, an AI startup.
And there you will be doing more testing towards, hey, we are serving this client. We’re serving them in this way. And then you’re trying to then go in and say, where is your data coming from? What is the use cases? What are we testing for in terms of their operational kind of requirements? It would necessarily not be just one. But, yes, what you might want is
Yeah, that’s a great point. Assurance needs to be globally decentralized. Owen, given everything we have discussed, what’s one commitment Frontier Labs should make on assurance that would actually move the needle?
yeah good question um i think there’s a question of access to the technology which is important here i think it’s one of the big themes of this conference certainly one of the things that i’ll be taking away so you think the the multilingual part of this is really important understanding respecting local cultures that’s important if you’re going to have a good product and if it’s going to be used broadly um we’ve been investing in gemini for some time now to make it better more representative across different languages we have partnerships that we’re doing here in india including with the iit bombay to to help improve performance across various different indic languages it’s also really important on the safety and security front as well to have benchmarks that are available in different languages fantastic work that ml commons are doing on this front that we’re that we’re pleased to support the other bit of access that i think is really important is having things that are quick and cheap enough for everyone to use one of the things of agentic systems is that they’re actually pretty compute intensive to use we have a range of models that we have developed and bringing to market at google deep mind including our very quick flash models which are relatively cheap, quite efficient, very, very quick.
We think these can play a really important role in powering agentic systems. It’s also going to be really important if we’re going to do effective and rigorous testing of these systems because that could be very compute intensive as well. So thinking about that access piece is something we all need to keep doing. And it’s not an easy question, really. I mean, to do it safely and ensuring that third party assurance providers consider the security questions at hand. And it’s an open
So, Stephanie, no bias at all since we’re both at PAI, but I wanted to give you the final word. What concrete outcomes do you think we want to see from the global AI assurance work in the next 12 months? What would success look like?
So, Owen, now that you said your one point, by the way, can hold you accountable against this delivering on the access question. But. I think we in the two papers, we talk about the need to kind of build a robust assurance ecosystem. And one of those things is changing incentives. So funny enough, another session this week, there was a question about whether we have differences in the way we’re talking about safety over the last few years and whether that we still have those divergences of whether we’ve converged. And there are a few themes that we’ve actually converged on, which is nice. And I think assurance is one of them. And this week, a lot of the discussions we’ve had are in some of those incentive areas like insurance to support assurance.
And so what does that look like? How do we drive new incentives or put some of these structures in place to drive a kind of more mature and robust ecosystem? I think that’s going to be really important. The second is professionalization. There are a lot of questions around how do you trust the assurer? And so how do we ensure that we’re thinking about the skills? What does accreditation look like for assurance organizations or individuals? And so and that will help, I think, questions around kind of access. And so that’s a kind of second piece. But hopefully, I think what we’re what we’re hoping to do. And that’s just because this is also about agents. I think that some of those foundational questions haven’t yet been resolved.
And so I’m hoping that we can move the dial to start thinking about how do you apply that to some of these future questions. So just to shout you out, Madhu. Madhu is the brains behind our safety work. And she came up with a paper on real time failure detection and monitoring of agents. And what I really like about that paper is it talks about a kind of tiered approach to assurance as well. So when you think about agent deployments, do you need to be thinking about assurance based on the risks or the stakes at hand? So is it in the financial services sector? Is it in making about making medical decisions? So how do you tie it as close to the use case and the risks?
And that needs to be also linked to reversibility. What’s the possibility around reversibility of actions and the consequences of that? And then third, we have affordances. What are the kind of affordances you give to the agents? How much autonomy do they have? And so how do you design an assurance ecosystem with all of these different components in mind and a kind of tiered approach? And the more that we can advise, you know, the USKC and a lot of policymakers who clearly are trying to make decisions in this area, I think that’s what success would look like for us.
This was totally not planned. Steph plugging our work here, but I can’t imagine a better note to end on this. It’s a field wide challenge, but I just want to emphasize the field wide opportunity. No, you know, no one single organization can get this right. So hopefully that’s a helpful reminder as we end with this summit and move on to the next iteration. So thank you, everyone. Hope you have a great. safe flight back home. Fred, that’s tonight for you. And for a closing keynote, I’m going to welcome Natasha Crampton, who’s a Chief Responsible AI Officer at Microsoft. And post that, we’ll hear from Chris, who’s the CEO of FMF. Thanks, everyone. Do you want to give it?
Okay, so we’re going to get mementos. Sorry, you might want to come back. You don’t want to miss this. Thank you very much.
Thanks so much, Madhu, and to all of our panellists for what was, I think, a very rich and grounded and also at times humorous discussion. Thank you. One of the things that came across clearly for me today is that we need AI assurance to no longer just be a theoretical exercise, but we actually need to build it into an operational discipline. And that’s a discipline that really needs to work across borders, across languages and cultures, and I think increasingly across agentic systems, systems that don’t just generate outputs but actually take action. I heard this panelist focus on the fact that assurance is pretty uneven today. It’s often strongest where there’s access to compute and data and evaluation infrastructure, and weakest where those things are scarce.
And as several of our panelists emphasized, if we don’t address that gap deliberately, the shift towards AI agents is only going to make that divide even worse. Rather than closing it. When I think about the nature of assurance, I think with agentic systems, it does need to change in its emphasis somewhat. Pre -deployment testing has always been necessary for all types of systems, and so too has post -deployment testing, of course. But post -deployment testing in an agentic world takes on an even greater level of importance, in my view. When systems can plan and they can chain actions, they can interact with tools, they can adapt over time, assurance really has to move towards continuous monitoring, real -time detection, and clear accountabilities for when interventions need to take place.
That can be quite a hard technical problem, but it’s also a governance challenge. So I know that PAI is known for convening communities of not just thinkers, but also doers. And so I wanted to leave everyone with a couple of ideas of implications that really follow from some of the insights that we heard today. The first is that it’s really important that we build assurance into systems as part of the system development lifecycle. And we don’t just seek to bolt it on at the end. So that means that we need to design systems so that they can be observed and audited and constrained in practice, not just in policy documents. Second, assurance has to be interoperable.
We heard Prime Minister Modi speak yesterday about building in India and delivering to the world. That, I think, is absolutely an aspiration that we should strive towards. But that can only work if we have evidence. Evaluation methods and documents and signals of risk that are usable across regions. Thank you. and adaptable to local languages, cultures, and deployment realities. Third, assurance has to be shared. No single company or government or institution can do this alone. And that’s especially true for agents, given how pervasive they are expected to become across the economy. We need shared evaluation infrastructure, shared taxonomies, and shared investment in capacity, particularly in the global south. So for me, this is why organizations like the Partnership on AI, as well as the many collaborators that have come here together in this week’s India AI Impact Summit, as well as open engagement across the community to make sure that we get this right.
It’s a really foundational area for collaboration for all of us. Now, my view is that if we do get assurance, and by right, I mean it needs to be global and inclusive and also dynamic. I think it really does become an enabler of trust and adoption, as Minister Teo said, not a break on progress. One of the key things that I think we need to do as a community is really to treat assurance as infrastructure, infrastructure that we need to build together and put into practice together. Thanks very much.
Well, what a phenomenal session from the opening and closing keynotes to a really rich and dynamic panel. I cannot think of a better way to close out what has been an extraordinarily rich and dynamic summit as well. I have the impossible task of trying to summarize everything that was just said here. So if you’ll bear with me, I’ll just offer kind of three core themes that seem to jump out to me. One is that we need to evolve and mature our understanding of assurance. There’s a lot of reference to agents here, the kind of coming prospect of multi -agent environments as well. We need from evals to mitigations, we need to have a better kind of an evolving understanding of how to do assurance.
Second, and probably more importantly, we also heard a lot about assurance as a global effort. Here I love Steph’s point about the need for greater north -south collaboration. There’s a lot of discussion from Fred and others about the need for global standards and harmonizing those standards and making them interoperable. And then there was also a lot of reference to some of the new institutions that we’ve evolved to enable that global dialogue to happen, whether it’s the institution that was announced literally just before this session an hour ago for the kind of global network or the international network of ACs that have also been kind of revitalized recently as well. And then kind of the last point that really jumped out at me was the assurance as a shared responsibility.
And, Fikosi, I love the point about kind of assurance as a bottom -up effort, and I think it’s one that, you know, we all have a role to play here, regardless of which sector you are in, regardless of what aspect of assurance you’re taking part in, there’s a role for all of us. So with that, I’m going to leave you with just one kind of final call to action, and that is to get involved, right? You know, if we want this technology to be safe and secure and trusted, we all have a role to play. So download the reports, very important thing. Download the great reports that have just come out on this topic. Get involved.
Look at the work that PAI and others are doing as well, and become a part of the conversation about how we’re going to take this amazing technology, but really make sure that it’s safe and secure and that we have a way to trust it. You know, in the opening remarks, Rameca, kind of used this great metaphor of the seed, right? Like one of the goals of the reports that they put out and the conversation in this panel. was to try and plant the seed about, you know, to watch kind of assurance grow. So I guess the parting thought I would give you is to say let’s all kind of roll up our sleeves and get to work and make sure that the seed grows.
So with that, thank you. And thank you as well for our panelists and speakers. Thank you. Thank you. Thank you.
“The India AI Impact Summit brings together more than a dozen countries to “unlock innovation through trustworthy, responsible, beneficial AI”.”
The knowledge base notes that the summit involves “19-ish countries” and emphasizes unlocking innovation through trustworthy, responsible, beneficial AI, confirming the claim about a multi-country gathering and the stated mission [S13].
“The Delhi Declaration was adopted the day before the closing session.”
S13 explicitly states that the Delhi Declaration was adopted “yesterday,” matching the report’s timing reference.
“The second Delhi Declaration commitment urges “multilingual and contextual evaluations” to ensure AI works across languages, cultures and real‑world conditions.”
S77 highlights a commitment to strengthen multilingual and contextual evaluations, especially for Global South contexts, providing additional detail on the focus of that commitment. S23 also discusses multilingual AI as a bridge to inclusive access, adding nuance to the commitment’s intent.
“QR‑codes for the papers will be displayed immediately after remarks so attendees can download them on the spot.”
S75 describes the use of QR codes on presentation slides at the summit to allow participants to scan and obtain more information, confirming that QR codes are employed for on‑site content access.
The panel displayed a strong, cross‑sectoral consensus that AI assurance must be comprehensive, standards‑driven, and globally inclusive, with particular emphasis on multilingual evaluation, continuous monitoring of agentic systems, and the creation of incentive structures that reward high‑assurance performance.
High consensus: most speakers, from government, industry, academia, and multilateral organizations, reiterated overlapping themes, indicating a shared commitment to building a robust, interoperable, and inclusive AI assurance ecosystem. This broad agreement suggests that forthcoming policy initiatives and technical work are likely to receive coordinated support across stakeholder groups.
The panel shows strong consensus on the necessity of AI assurance, multilingual evaluation, and inclusive governance. However, substantive disagreements emerge around the primary mechanism to achieve assurance—market‑driven incentives versus shared public infrastructure, top‑down technical standardisation versus locally driven capacity building, and whether technological shortcuts (cheap models) can offset deep compute inequities. These divergences reflect differing priorities among government, industry, and civil‑society actors.
Moderate to high. While participants align on goals, the varied strategic approaches (competitive advantage, shared infrastructure, standards vs local capacity, technology‑centric solutions) indicate potential friction in policy coordination and implementation, especially between high‑resource actors and Global South stakeholders.
The discussion was driven forward by a series of pivotal remarks that moved the conversation from high‑level declarations to concrete, actionable pathways. Josephine Teo’s call for proactive, government‑led experimentation introduced a new regulatory paradigm that anchored later talks on standards and testing. Frederic Werner’s emphasis on trust, standards, and the pitfalls of assuming technology will automatically benefit the Global South highlighted the assurance divide, prompting Vukosi and Stephanie to surface concrete gaps in language, data, and capacity. Owen Larter’s proposal of interoperable agent protocols supplied a technical blueprint that linked directly to safety and assurance concerns. Stephanie’s systematic breakdown of six challenge areas and her articulation of north‑south collaboration provided a clear agenda, which Natasha and Chris later reframed as an operational infrastructure and shared responsibility. Collectively, these comments reshaped the panel’s tone—from abstract policy to a focused, collaborative roadmap—ensuring that the dialogue culminated in concrete next steps and a shared call to action.
Disclaimer: This is not an official session record. DiploAI generates these resources from audiovisual recordings, and they are presented as-is, including potential errors. Due to logistical challenges, such as discrepancies in audio/video or transcripts, names may be misspelled. We strive for accuracy to the best of our ability.
