Ensuring Safe AI_ Monitoring Agents to Bridge the Global Assurance Gap

20 Feb 2026 14:00h - 15:00h

Ensuring Safe AI_ Monitoring Agents to Bridge the Global Assurance Gap

Session at a glanceSummary, keypoints, and speakers overview

Summary

The summit opened with Rebecca Finlay emphasizing that the newly adopted Delhi Declaration provides a pivotal moment to advance trustworthy, responsible, and beneficial AI, stressing the need for diverse Indian voices in shaping accountability and policy frameworks [1-3][5]. She introduced two new Partnership on AI resources-“Strengthening the AI Assurance Ecosystem” and a paper on global AI assurance-designed to help national policymakers embed robust assurance strategies alongside industrial AI plans and to bridge the assurance gap between the Global North and South [14-16][21-23][24-26][30-33]. Early commitments to share usage data and to enhance multilingual and use-case evaluations were highlighted as concrete steps linked to the forthcoming 2025 foundation-model impact report [25-33][27-30].


Minister Josephine Teo warned that the rapid rise of autonomous agentic systems introduces new risks such as malfunction and diminished human oversight, calling for a shift from reactive regulation to proactive governance [50-58][59-62]. Singapore is piloting a sandbox partnership with Google to test agentic AI and has released a living model-governance framework that invites industry feedback to ensure safe public-service deployment [68-71][74-78]. She outlined three pillars of an assurance ecosystem-rigorous testing, development of standards, and independent third-party verification-to build confidence and give companies a strategic advantage [97-109][110-112].


Moderator Madhu Srikumar defined AI assurance as the measurement, evaluation, and communication of trustworthiness, likening it to an independent safety inspection and tying it to the Delhi commitments on multilingual and contextual evaluations [124-132]. Frederic Werner stressed that trust is a major barrier for scaling AI-for-Good use cases worldwide and highlighted the need for standards that embed safety, human-rights, and inclusivity, especially given the 2.6 billion people still offline [145-166][170-176]. Owen Larter described agentic AI as increasingly autonomous “goal-achieving” systems, announced Google DeepMind’s agents-to-agents and universal commerce protocols, and warned that security risks require robust testing, malware scanning, and third-party assurance [186-204][222-223]. Vukosi Marivate pointed out that language diversity and limited local capacity in the Global South mean assurance frameworks must be locally understood and supported, lest they impose top-down solutions misaligned with regional values [231-240][241-242]. Stephanie Ifayemi presented PAI’s two papers, identifying six challenge areas-including infrastructure, skills, language coverage, and differing risk profiles-and argued that north-south collaboration and tiered, use-case-specific assurance are essential to close the global divide [254-280][286-291].


Natasha Crampton called for AI assurance to become an operational discipline embedded throughout the system lifecycle, emphasizing continuous post-deployment monitoring, interoperability of evaluation signals across languages, and shared infrastructure to prevent widening gaps as agents proliferate [412-421][428-437]. Chris Meserole concluded that evolving assurance practices, fostering global standards, and treating assurance as a shared responsibility are critical next steps, urging participants to download the new reports and actively contribute to building a robust, inclusive assurance ecosystem [448-456][462-465].


Keypoints


Major discussion points


Building an AI-assurance ecosystem for agentic systems – The panel repeatedly stressed that trustworthy autonomous agents require a dedicated assurance framework that includes rigorous testing, clear standards, and independent third-party verification.  Josephine Teo outlined the three pillars (testing, standards, third-party attestations) [97-109] and the opening remarks framed the need to “think about all those actors” and apply assurance to agents [35-40].


Closing the global assurance divide – Participants highlighted that current assurance practices are uneven, especially for the Global South, where language diversity, infrastructure gaps, and limited expertise hinder effective evaluation.  Madhu pointed to the 2.6 billion people offline and the need to use AI to bridge that gap [173-176]; Vukosi described the massive multilingual landscape and limited policy capacity in many countries [231-240]; Stephanie listed six challenge areas (infrastructure, skills, languages, risk profiles, etc.) that keep the divide open [259-277]; Natasha emphasised that without deliberate action the shift to agents will widen the gap [413-418].


Proactive, collaborative governance between government and industry – Singapore’s approach was presented as a model of “test-first” regulation: a government-led sandbox with Google, a living model-governance framework for agents, and ongoing feedback loops with industry partners. The sandbox arrangement with Google [69-73] and the “model governance framework… a live document” [74-78] illustrate this partnership-driven, forward-looking stance.


Standards, interoperability, and shared responsibility – Multiple speakers called for common technical protocols and global standards to enable agents to interact safely across borders, and for multilateral institutions to coordinate these efforts. Owen described the “agents-to-agents” and “universal commerce” protocols and the need for assurance standards [187-205][198-210]; Madhu’s rapid-fire question asked what role the ITU should play [302-306]; Frederic stressed inclusive, multi-stakeholder collaboration through AI for Good [307-314]; Chris summarised the three themes of evolving assurance, global effort, and shared responsibility [447-455].


Overall purpose / goal


The discussion aimed to catalyse concrete action on AI assurance-especially for emerging autonomous agents-by presenting new PAI resources, diagnosing gaps (technical, linguistic, infrastructural, and governance), and rallying a diverse set of stakeholders (governments, industry, standards bodies, and civil society) to co-create a global, inclusive assurance ecosystem that can keep pace with rapid AI advances.


Overall tone and its evolution


– The session opened with a formal and optimistic tone, celebrating the Delhi Declaration and the launch of new papers [1-8][14-18].


– As the conversation moved to agentic AI, the tone became cautiously urgent, highlighting novel risks and the need for proactive regulation [50-58][59-66].


– When addressing the Global South and the assurance divide, the tone shifted to empathetic and problem-solving, acknowledging disparities and calling for capacity-building [173-176][231-240][259-277].


– The latter part of the panel adopted a collaborative and motivational tone, focusing on concrete standards, partnerships, and a call-to-action for all participants to “get involved” and build assurance as shared infrastructure [302-306][447-455][440-443].


Overall, the discussion moved from introductory enthusiasm, through a sober assessment of risks and inequities, to a forward-looking, collective commitment to develop and operationalise trustworthy AI assurance worldwide.


Speakers


Madhu Srikumar – Role: Moderator (panel moderator); Expertise: AI assurance, policy discussion.


Frederic Werner – Role: Chief of Strategic Engagement Department, International Telecommunication Union (ITU) [S4]; Expertise: AI for Good, AI governance, standards development.


Chris Meserole – Role: Executive Director, Frontier Model Forum [S7]; CEO, Frontier Model Forum (FMF) [S8]; Expertise: Frontier AI safety and security, policy coordination.


Rebecca Finlay – Role: Representative, Partnership on AI (PAI); Expertise: AI assurance ecosystem, policy, responsible AI.


Owen Larter – Role: Representative, Google DeepMind (AI research & safety); Expertise: Agentic AI, standards, safety research.


Stephanie Ifayemi – Role: Staff, Partnership on AI (PAI); Expertise: Global AI assurance divide, AI insurance, assurance frameworks.


Vukosi Marivate – Role: Founder, Masakane (African Language NLP); AI researcher focused on African language technologies; Expertise: AI assurance in the Global South, multilingual NLP.


Natasha Crampton – Role: Chief Responsible AI Officer, Microsoft [S17]; Expertise: Responsible AI, AI assurance, agentic systems.


Josephine Teo – Role: Minister, Singapore (Minister for Communications and Information); Expertise: Government AI policy, AI assurance, governance of agentic AI.


Additional speakers:


Rameca – No role or area of expertise identified in the transcript.


Full session reportComprehensive analysis and detailed insights

The session opened with Rebecca Finlay emphasizing that the newly adopted Delhi Declaration represents a “pivotal moment” for trustworthy, responsible and beneficial AI, bringing together “a whole set of voices and perspectives and leadership that is not optional” in India [1-4]. She announced that the Partnership on AI (PAI) has released two new resources: Strengthening the AI Assurance Ecosystem and a paper on global AI assurance[14-18][21-23]. Both papers are available via QR codes for participants to download and discuss with the authors [12-13][14]. Finlay linked the Declaration’s first two commitments to concrete actions: a 2025 foundation-model impact report that will require frontier-AI firms to share usage data, and a new commitment to “strengthening multilingual and use-case evaluations” that will guide future policy [25-33][27-30].


The moderator then introduced the panel and shifted the focus to autonomous “agentic” AI, noting that the assurance question must now be applied to agents because “that’s where the world is going” [35-40].


Josephine Teo, Minister for Communications and Information, described the rapid emergence of agentic AI-from “not a thing” a year ago to a driver of productivity gains-while warning that their autonomy “introduces new risk” and can erode human oversight [50-58]. She called for a move from “reactive regulation” to “proactive preparation” [59-63] and highlighted Singapore’s sandbox partnership with Google, which lets the government “eat our own dog food” and build credibility before wider deployment [71-73]. Teo presented a “living” model-governance framework for agentic AI that invites industry feedback and aims to build confidence among boards, customers and other stakeholders [74-81]. Central to her vision is a three-pillar assurance ecosystem-rigorous technical testing, enforceable standards, and independent third-party verification-drawn as an analogy to safety regimes in aviation and healthcare [97-109][110-112].


Madhu Srikumar, the moderator, then defined AI assurance as “the process of measuring, evaluating, and communicating whether AI systems are trustworthy”, likening it to an independent safety inspection that goes beyond the builder’s own assurances [124-132]. She connected this definition to the Delhi Declaration’s commitment to multilingual and contextual evaluations, framing the panel’s purpose as assessing whether the global community is equipped to deliver on that promise [133-138][141-144].


Frederic Werner (AI for Good) highlighted that trust is a major barrier to scaling high-impact use cases such as affordable health-care, education and disaster response [153-158]. He stressed that standards must embed “common-sense things” – safety, human-rights and inclusivity – especially because 2.6 billion people remain offline, and AI could help remove language and literacy frictions only if accompanied by skilling and locally relevant content [170-176][145-166].


Owen Larter (Google DeepMind) described agentic AI as “more autonomous systems that … achieve goals”, giving examples such as a suit-dry-cleaning agent [186-190]. He announced the development of technical protocols-the “agents-to-agents” protocol and a “universal commerce” protocol-to enable interoperable communication between agents and web services, likening them to early internet standards like HTTP [200-208][202-208]. Larter warned of security challenges, noting collaborations with VirusTotal to scan downloaded skills for malware and the need for “cheap, efficient models” (e.g., flash models) to support both deployment and rigorous testing [222-227][351-354].


Vukosi Marivate drew attention to the Global South, pointing out that India alone has over 120 languages and 19 500 dialects, while Africa has thousands more [262-264]. He argued that assurance frameworks must be “locally understood” and that policymakers need the capacity to monitor systems; otherwise, “top-down” solutions risk misaligning with regional values [237-242][231-240]. Marivate warned that without such capacity, “the last piece … will be the capacity and the capabilities of the policymakers” [237-241].


Stephanie Ifayemi summarized the two PAI papers and identified six challenge areas that keep the assurance divide open: infrastructure, skills, language coverage, divergent risk profiles, documentation, and incentive mechanisms [259-267][274-280]. She cited the Stanford Helm evaluation resource requirement-12 billion tokens and 19 500 GPU-hours-as an illustration of the infrastructure barrier [291-299]. She also noted the UK AI Safety Institute’s inaugural $100 million fund as an example of incentive mechanisms [300-306]. Ifayemi highlighted the multilingual evaluation commitment in the Delhi Declaration and called for “north-south collaboration” to ensure Global South countries are not left out of emerging standards on agents [291-299]. She advocated a tiered assurance approach, matching the level of scrutiny to the stakes of a use-case (e.g., finance versus health) and linking assurance to insurance products and professional accreditation [363-376][384-394].


Rapid-fire segment

The moderator posed quick questions to each panelist:


* Frederic Werner reiterated the role of multilateral bodies-ITU, AI for Good, and others-in fostering inclusive assurance frameworks [307-312].


* Vukosi Marivate critiqued Singapore’s “test-once-comply-globally” model, warning that it could overlook local linguistic and policy capacities, and emphasized the challenge of scaling evaluations and providing user-level personalization [324-341].


* Owen Larter suggested establishing a “Frontier Labs” initiative to improve global access to multilingual, low-cost models and to ensure third-party security review of agentic skills [350-357].


* Stephanie Ifayemi outlined concrete outcomes for the next 12 months: changing incentive structures, creating professional accreditation pathways, and implementing tiered assurance linked to insurance [363-376][384-394].


Closing remarks

Natasha Crampton reinforced that AI assurance must become an “operational discipline” embedded throughout the system development lifecycle, not merely a post-hoc check [425-428]. For agentic systems, she stressed that “post-deployment testing … takes on an even greater level of importance”, requiring continuous monitoring, real-time failure detection, and clear accountability [419-422]. Crampton called for interoperable evaluation signals that work across languages and cultures, and for shared infrastructure-including taxonomies and capacity-building investments-to prevent the agentic shift from widening existing gaps [428-437][430-434].


Chris Meserole synthesized three overarching themes: (1) the need to evolve assurance practices for multi-agent environments, (2) the necessity of a truly global, collaborative effort, and (3) the imperative that assurance be a shared responsibility across governments, industry and civil society [447-455]. He urged participants to “download the reports” and join ongoing initiatives, framing the earlier “seed-planting” metaphor as a call to “roll up our sleeves and get to work” [462-465].


Consensus and points of tension

Across the discussion, participants agreed that a robust AI-assurance ecosystem rests on three pillars-rigorous testing, clear standards, and independent third-party verification-and that assurance should be defined as an independent trustworthiness audit [97-109][124-132]. They also concurred that multilingual evaluation is a critical challenge, with the Delhi Declaration highlighting the need to address “120 languages and 19 500 dialects” in India and “1 500-3 000 spoken languages” in Africa [262-267][231-241].


Disagreements emerged around implementation pathways:


* Top-down vs. local capacity – Vukosi questioned Singapore’s “test-once-and-comply-globally” model, warning it could ignore local linguistic and policy capacities [324-341].


* Standards development locus – Frederic advocated for multilateral bodies to lead inclusive global assurance, whereas Owen emphasized industry-driven protocols such as agents-to-agents [307-312][202-208].


* Pre-deployment vs. continuous monitoring – Teo’s three-pillar model emphasized testing and standards, while Crampton argued that “continuous monitoring … is even more important” for autonomous agents [97-109][419-422].


Key take-aways

1. The Delhi Declaration establishes concrete commitments on usage-data sharing and multilingual evaluation.


2. AI assurance is an independent, systematic verification of safety, reliability and trustworthiness.


3. Agentic AI raises heightened risk, demanding proactive sandboxes, living governance frameworks and continuous monitoring.


4. A functional assurance ecosystem must combine rigorous testing, enforceable standards and independent third-party auditors.


5. Global inclusion requires addressing language diversity, building local capacity and avoiding top-down imposition.


6. Technical interoperability (agents-to-agents, universal commerce) and security (malware scanning, low-cost models) are essential for a safe agentic economy.


7. Closing the assurance divide involves tackling infrastructure, skills, language, risk-profile, documentation and incentive gaps.


8. Collaboration across governments, multilateral institutions, industry and civil society is required, treating assurance as shared infrastructure built into the AI lifecycle [363-376][425-434].


Unresolved issues

* Designing scalable multilingual evaluation methodologies.


* Funding and providing compute resources for assurance in low-resource settings.


* Establishing third-party assurance providers and accreditation pathways in the Global South.


* Defining mechanisms for real-time post-deployment monitoring of agents.


* Balancing proactive government sandboxes with industry-led self-assessment.


Suggested compromises include a tiered assurance model that aligns scrutiny with risk, combining sandbox experimentation with independent audits, and developing modular standards that allow regions to adopt core safety components while adding local language and risk extensions [384-394].


Overall, the panel moved from an optimistic opening about the Delhi Declaration, through a sober appraisal of the novel risks posed by agentic AI, to a collaborative call-to-action emphasizing concrete standards, capacity-building and shared responsibility. The convergence of viewpoints provides a solid foundation for next steps: disseminating the two PAI papers, expanding Singapore’s sandbox experience, advancing open technical protocols, and mobilising multilateral bodies such as the ITU to ensure that AI assurance becomes a globally inclusive, interoperable infrastructure that enables trust and adoption rather than hindering innovation.


Session transcriptComplete transcript of the session
Rebecca Finlay

in 19 -ish countries, and we’re all focused on what does it mean to unlock innovation through trustworthy, responsible, beneficial AI. And so, of course, no surprise, gatherings like the one that we’ve had this week are really crucial for the work we do, and with the Delhi Declaration adopted yesterday, this is an even more important moment to build on where we have come from, to lean in, and to really get to work around some of the questions of the accountability work that needs to be done, the scientific evidence that we need to build around frameworks and good policy moving forward. And, of course, it’s extraordinarily important that this is happening in India, that it’s bringing a whole set of voices and perspectives and leadership that is not optional.

At PAI, we believe… We believe that that is fundamental to building a global community committed to this work, and it’s great… to see it in action this week. So thank you all for being here with us. So today we’re going to give you an opportunity to see two of our latest papers. These are papers that were begun out of the Paris Action Summit. And at that time, as we were thinking about moving into action and invasion, we felt that work needed to happen with a good sense of what the assurance ecosystem looked like. So we’ve had working groups underway developing these two new resources. They’ll be up on the screen at some point. You’ll be able to get a QR code and download them.

Feel free to talk to any of us. The first one is Strengthening the AI Assurance Ecosystem. It really looks at telling and helping national policymakers, if you’re building a robust industrial AI strategy, you better have a comprehensive AI assurance strategy as well. And you need to be able to do that. And so we’re going to be talking about that. We need to think about all those actors and what they look like. We’re going to hear about one of the experts, of course, in this as soon as the minister comes to join us. The second piece, which is really important, we think, for this conversation is what does it mean to do AI assurance? globally around the world?

How do we close the divide that exists? What is different about the challenges faced by countries in the Global South versus others? So we’re really hoping that these resources not only are good, substantive contributions to the work that needs to be done, but the idea is to just catalyze, you know, sort of plant a number of seeds across a number of ways in which assurance works so that those can grow and really come to life out of this. And just two quick comments on that. Now that we have half the declaration, and so now we can, as opposed to earlier in the week, start to articulate it, really leaning in with regard to the commitments around, in commitment one, around usage, clarity around usage data, really trying to give some empirical grounding to this work.

In 2025, in our progress report around foundation model, impact. We made exactly this recommendation. We directly called for Frontier AI companies to share usage data. We’ve been tracking progress, and there has been some progress in that regard. So we are delighted to see this particular commitment to come about and to start to see some standards about how that usage data is going to be shared. So we’re very pleased to see that work. We’re also very pleased to see the second commitment around strengthening multilingual and use case evaluations. And you’ll see, if you do download the report on the global assurance divide, that that is clearly a key piece of work that needs to happen. So this afternoon, we are going to give you an extraordinarily expert panel that brings a real diversity of perspectives to this work.

And so we want to take the assurance question and apply it to agents. Because that’s where the world is going. We’re all seeing them in the news every day. We’re seeing them integrated into foundation model systems. So what does it mean? to take what we know about assurance and think about the applications that agents will add to the complexity of that work. So let me begin by introducing our first speaker. She’s probably been one of the most visible ministers this week because of the extraordinary leadership that Singapore has taken when we think about AI assurance. I know you’re going to talk a little bit about that. Such a pleasure to welcome you, Minister Josephine Teo.

She’s going to come and say some words for us before the panel begins. Thank you.

Josephine Teo

Thank you very much, Rebecca, and also very much appreciate Partnership on AI for the invitation. When this series of summits first began in Bletchley, AI agents were not a thing. Nobody was talking about them, even just 12 months ago. When we had the AI Action Summit in Paris, it has barely crept into the conversation at the time. the preoccupation was all around DeepSeq and what it told us about the capabilities that is emerging out of China. But today, as Rebecca correctly identified, agentic systems have taken off. They are increasingly being used and we need to have a better grasp on how to deal with this issue because agentic AI certainly offers transformative possibilities in how we delegate and orchestrate work when deployed strategically.

Agents functions as invaluable teammates, unlocking productivity gains and time savings, which we all want more of. However, I should also add that this autonomy, the very nature of how agents can be helpful to us is autonomy. This autonomy also introduces new risk. The potential for harm increases when systems malfunction and human oversight is normalized. We are no longer present. or at least diminish to a very large extent. The implications may be complex and not fully predictable. So the way my colleagues and I have been thinking about this is that there needs to be a shift. There needs to be a shift in terms of how we might want to rely on reactive regulation to a different kind of stance, which is proactive preparation.

And in Singapore, that’s what we’ve been trying to do. We’ve tried to be proactive about governing the new risks in the era of agentic AI. And I think it starts with the government itself being a leader and not a laggard in using agentic AI. We need to test it. We need to look at how the solutions can not only enhance public service delivery, But we also need to be able to put in place more controls. Government is high risk because the touch point with citizens are very sensitive. No citizen and no government wants to make serious mistakes when they interact with their citizens, telling them things about their health, telling things about their social security, telling them about things to do with their benefits that are not accurate, and having them not just being told but acted upon.

So this need to ensure that we know what we’re doing is a very high one. And the way we are also thinking about it is to try and work with industry. So, for example, between Google and Singapore government, we have a sandbox on agentic AI. It’s one of the ways. We think we can, in a way, eat our own dog food. Try it. You know, does it taste all right? hurt us in a very significant way because if we were not able to do so, I don’t think we have a lot of credibility in terms of how we want to govern agentic AI. But we can’t wait, you know, for the dog food to materialize in its consequences for ourselves.

In the meantime, my colleagues have put together a model governance framework for agentic AI. It is meant to provide practical support to enterprises so that they can also deploy autonomous agents responsibly and to mitigate the risk. We know that this is not a complete solution and this document that we put out has to be a live document. We very much encourage feedback and as a way for us to keep improving the guidance to enterprises. Can I also just add that as we do this work, what is the… meaning and what is the purpose behind it. Ultimately, it is to build confidence in the use of agentic AI systems. And we think that at many levels, this confidence has to be presented, has to be demonstrated to boards of organizations, to customers, to other stakeholders.

And how do we demonstrate that the risks have been managed well? And that is where the assurance ecosystem that Rebecca talks about comes in. It is an absolutely essential part of building trust over the medium to longer term so that there is a way, a foundation upon which agentic AI systems can be made more readily adopted and available. I should also say that for companies that are thinking about it, and I see Microsoft here, and I’m sure that there are other companies represented. If we are to trust these agentic systems, the safety aspects should not be downplayed. And I would venture to say that a company that is able to give a high assurance on safety will find itself being differentiated from their competitor.

It’s more likely to translate into stronger interest in a product and service. So rather than think of it as something that you are unhappy to comply with, think of it as a strategic competitive advantage. And that is a way I think that will give us the confidence to put it forward. The question, however, is that are we completely without experience in this regard? And the answer is no. In aviation and healthcare, there are a lot of measures being put in place to give assurance to passengers. When we board a plane, we usually expect to arrive. when we visit the hospital, we generally expect to be treated, except for disease conditions that are not yet well understood.

But the trust in these systems have to be built over time, and they don’t come without some assurance being put in place. The question is for AI, and specifically agentic AI, what would be the components? What leads to an assurance ecosystem system that would be robust enough? We think that there are at least three components. The first is that there must be testing. We need some way of making sure that there are technical assessments of the system to make sure that the systems are robust, they are reliable, and they’re safe. And a lot more work needs to be done in this space, developing the testing methodology, building the testing datasets, and also making sure that the testing of agentic systems take into account that these systems are robust.

These systems are going to be much more complex than multi -agents, for example, and it’s not just the output, but the in -between steps, how the reasoning takes place, and what is the orchestration that is being built into the GenTech systems. So that’s the first, testing. Second is that eventually we will need standards. We cannot just define what is good enough. We also need to assure the users that it has met expectations in safety and reliability, and so these are still very early days. Thirdly, we think that this ecosystem cannot do without third -party assurance providers. It’s one thing to claim that your agentic AI system is safe. It’s another thing to have someone attest to the safety of it.

So these could be technical testers, auditors, and they provide independence, augment in -house capabilities, and also help to identify the blind spots, and it’s necessary for us to strengthen this pool as well. So I’m going to stop here. I want to conclude my remarks to say that Singapore is actively building these components. and we welcome conversations with partners and colleagues because we know that we cannot do this alone. So we look forward to discussions in the three panels on how we can meaningfully collaborate on assurance for agentic AI. Thank you very much once again, Rebecca.

Madhu Srikumar

Thank you. Thank you. We’re all here. It’s the end of the conference, and we’re all intact. Thank you so much, everyone, for joining us. Thank you, Minister Teo, for the keynote. One quick note before we dive in. Our panelist, Fred, has a flight to catch, so he’ll need to slip away a few minutes early, but, Fred, we’ll make sure we get your best insights before you escape. No pressure. So we are the last session, so we are standing between you and whatever you have planned right after. So I promise we’ll make this worth it. We have an incredible panel and a lot of ground to cover. So before we get started, what do we mean by AI assurance?

Because you’re going to keep hearing that term quite a bit here. So really put simply, AI assurance is the process of measuring, evaluating, and communicating whether AI systems are trustworthy. Are they safe? Do they work as intended? Can the public actually trust them? So really think of it like a safety inspection, but for AI. You wouldn’t want, you know, you’d want an independent inspector checking a building. Not just the builder saying, trust me, it’s fine. So really, AI assurance is about independent verification, as Minister Teo went over. And why this panel? Why now? So the summit unveiled the New Delhi Frontier AI commitments just yesterday. And the second of those commitments is about strengthening multilingual and contextual evaluations.

So really making sure AI systems work across languages, cultures, and real world conditions. And really, that’s the assurance challenge in a nutshell. And our panel today is about whether we are actually equipped to deliver on that promise globally and not just in a handful of countries. So really, our panelists span the ITU, Google DeepMind, the University of Pretoria, and PAI. So we have the range to actually wrestle with this question. So with that, I’m going to get into our first question for today. Fred, that’s going to be you. ITU has been convening on AI governance through AI for Good and working on standards across borders. So really, when we talk about AI assurance, what does it mean to you, ensuring that these systems are safe and trusted?

And how do we think about assurance when 2 .6 billion people remain offline and may be excluded from the frameworks being designed?

Frederic Werner

Yeah, thanks for that great question, and thanks for having me here. So I think that safe to save is no. There’s a huge shortage of high -potential AI for Good use cases, everything from affordable health care to education for all. food security, disaster response, and also looking at more applications in the physical manifestations of AI that you see in robotics, embodied AI, brain -computer interface technologies. The best part of my job at AI for Good is I see these use cases coming across my desk every day. And I can tell you when we started AI for Good in 2017, it was mainly in PowerPoint slides. They didn’t really exist. But as we got into, say, the 2023 with GenAI, last year, the unofficial theme of AI for Good was the rise of the AI agents, a bit scary, Terminator -like, but that’s what people were talking about.

And we’re really going from sort of the promise to the pilots to the use cases and now scaling. Now, when you’re looking at these use cases, I think one big challenge is trust. How do you trust them? I mean, there’s always the good intention, right? But is that trust there? And also, are they replicable and scalable? And I’ve yet to see, you know, high potential use case developed in Brussels work equally well in Johannesburg and Shenzhen and maybe Panama. Like, it’s just, we haven’t really reached that yet. And if you look at these sort of fast -emerging governance frameworks around the world, whether you’re in the U .S. or EU or China or everything in between, I think there’s a lot of good intentions, a lot of good thinking.

But how do you turn those ambitious words and principles into actions? Because the devil is in the details, and I think standards have details. So when you’re thinking about how do you – especially when you start to get into AI agents and you really – that trust element is becoming ever more critical, how can you bake in a lot of the common sense things that we’ve been talking about all week or even for the past years at AI for Good? Are they trustworthy? Are they verifiable? Are they secure? Are they safe? Are they designed with human rights principles in mind? Are they inclusive? Are people from the global south appetizing? Are they able when we’re drafting and developing these standards?

So these are not always natural reflexes, and at the same time, it’s hard to turn words into action. So one of the tools, I’m not saying it’s the only tool, but I think as these solutions start to scale and businesses start to interact internationally or even internationally, at one point you’re going to need standards, and it’s within those standards that you can kind of bake in those common sense principles that we’ve been all talking about. And I forget the last part of your question. It was really a question about… Oh, connectivity. That was it, yes. …2 .6 billion people who remain offline, yeah. Yeah. Yeah, so, you know, ITU’s mission is connecting the world, and a third of the world is still offline.

And, you know, large parts of the world actually have connectivity, but there’s actually no incentive to connect. So if there’s no content in your local language or dialect or no access to government services or useful applications that are fit for purpose in where you live… you know there’s why would you connect so i i think ai can actually help to remove that friction where you have a lot of bottlenecks for example literacy disabilities again like content in your own language or dialect so i think one thing is closing the connectivity gap but the other thing is actually using ai to remove that friction and the last thing i would say is i think sometimes there’s a comparison where um if you take east africa for example and you have the the mobile payment miracle or revolution with mpeza right you effectively leapfrog decades of infrastructure legacy infrastructure and there may be a kind of optimism that well the same thing could happen with ai in the global south maybe but i don’t think we can take it for granted that if that happens it goes in the right direction it’s not a guarantee that just by putting the tool in the hands of the people that they’re going to create value they’re going to use it responsibly they’re going to use it to solve local challenges build more cohesion and community, but those aren’t for granted.

So I think that whole AI skilling angle of really educating people from grade school to grad school to diplomats and everyone in between, if you don’t address that literacy piece, then it’s just going to be a crapshoot. We’re not sure

Madhu Srikumar

Great. I mean, it’s a good transition. Speaking of standards, Owen, Google DeepMind recently deepened its partnership with the UK AI Security Institute on safety research, so including work on monitoring chain of thought and evaluations. So really from an industry perspective, you know, what does robust AI assurance look like? Where do you think the gaps and opportunities are between what Frontier Labs kind of do internally and what’s needed for broader public trust?

Owen Larter

Yeah, thank you, Madhu. And thank you to Rebecca and Partnership on AI for convening this really important conversation. And a big congratulations to our Indian hosts for a fantastic week at the summit. This week, maybe start talking a little bit about what… agents are, we’re increasingly excited about them at Google DeepMind. They’re essentially more autonomous systems that instead of just following basic instructions can actually achieve goals. So let’s say I want to get my suit dry cleaned on Thursday, instead of taking an AI system and say, find a website for a dry cleaning company, see if it’s open on Thursday, see what the hours are, see if it’s within my budget. You can just say to your agentic system, go find a way to dry clean my suit, make sure it’s being picked up by Friday, and it will go and interact with those different websites and try and find a way to meet your goals.

All kinds of fantastic applications already that we’re seeing right across the economy. We’re using increasingly agentic coding systems at Google and Google DeepMind to do a lot of our coding. So we have our anti -gravity framework, which is fantastic. You can interact with it in normal, natural language and say, build me a website, build me a tracking system to follow a particular bill that I’m interested in, and it will really help you achieve these goals. I think you’ll increasingly see agents used right across the economy as well. I think we’re just in the early years of a new AI enabled agentic economy. I think you will have very normal interactions with agents on a regular basis that will pop up on your phone screen and say, hey, it’s been a few weeks since you bought toothpaste.

Would you like me to go and take care of you and get some more toothpaste for you? You mentioned standards, which I think is going to be a critical part of getting all of this right. There’s a couple of dimensions to the standards. So firstly, we need to create the sort of technical protocols to actually underpin this agentic economy. So we’ve been trying to contribute to this conversation. There is the agents to agents protocol that Google has launched. There’s the universal commerce protocol. This is basically a way of helping agents talk to each other and agents talk to websites so that you have standardized sets of information. An agent will basically come to an agent or an agent will come to a website and say, this is my ID.

These are my capabilities. These are what I’m trying to do. I think in the same way that we developed protocols and standards in the early 90s to underpin the internet like HTTP, like URL, we’re going to have to build these out. There are then also assurance standards, which are related, but I think very important as well. We need to make sure that we’re understanding the capabilities of these systems. We need to keep making progress on how we can test for the risks that they may pose and then work right across society to come up with ways to mitigate that. I think the work that the safety and security institutes are doing around the world is absolutely critical.

So Minister Teo mentioned some of the work that we’re doing in Singapore. The UK Security Institute has been world leading on this. I think this is an area that we’re going to see more from the ACs and KCs right across the world. The US government also, through their KC, launching an agent standards initiative this week as well.

Madhu Srikumar

Great. And if you don’t mind a follow up question, that’s a really important point that you pointed out, that we currently need interoperability. We need agents to flourish. We need to find a new way to kind of imagine this paradigm. But I’m curious if there’s a safety challenge when it comes to agents. Instead. yeah that keeps you up at night

Owen Larter

yeah i think there are definitely risks to be mindful of so i think agent security is something that we should all be thinking a lot about if we’re connecting increasingly autonomous systems into different accounts different email accounts different bank accounts i think we want to be pretty careful about how we do that and come up with superior security protocols and that can be helpful there we’ve actually been doing some work with virus total which is part of the the google security operations team at google to make sure that when certain agentic systems are downloading skills or downloading apps from agentic websites they’re being scanned for malware or vulnerabilities that are being detected so that they can be addressed before people put them onto their their computer i think there’s also a concern that these agentic systems could create new capabilities that could be misused so across the cyber security dimension domain for example i think some of the frameworks that we have already at google deep mind will be helpful here so we have our frontier safety framework which we use to test models before we put them out into the real world.

We think about how those models are going to interact with systems, how they might be parts of agents as we’re doing that work.

Madhu Srikumar

All right. Just speaking for myself, I can’t wait to use agents. I feel like it’s a lot of developer communities that have, you know, started playing around with these systems. But I imagine it’s reaching lay consumers very soon. So, Vukosi, you have built Masakane for African Language NLP. Really building AI for Africans by Africans. When assurance frameworks are designed in the U .S., U .K., or Singapore, how well do they translate to context where the data, the languages, the deployment conditions are completely different? What do we think we’re missing?

Vukosi Marivate

that we do get to understand that it’s a very different thing. My experience has been that there’s likely not as much collection in Europe or North America or annotation as much as is happening now in the global south. But then that also means that it feels like it’s further away, right? It’s not where the developers are. And that then requires more of this conversation in one place. So that, again, there must be kind of a local understanding. The last piece to that is going to be the capacity and the capabilities of then the policymakers in those countries to be able to understand that part. It will not be top -down. I don’t believe that. It will be them understanding whether it’s labor laws, it’s data governance, it’s just monitoring of systems once they’re on.

If there is not that capacity or capability to actually do those things, again, it’s more automated. direction that is not necessarily what the values of those people actually are.

Madhu Srikumar

Those are important words right at the end of the conference, knowing just how much we have to get done here. So Steph, over to you. PAI just released work on closing the global assurance divide, a lot of what Bukosi just mentioned. What are the concrete gaps you’re identifying? Identifying? Is it capacity to conduct third -party evaluations, as Minister Teo mentioned? Is it access to the models being tested, or is it something else? What would it take to really close those gaps?

Stephanie Ifayemi

Awesome. Thanks so much, Maru. And as one of the PAI folks, thanks for being here, everyone. It’s great to see you all. I know it’s a Friday evening, so we’re in between you and cocktails or whatever you have planned, so we very much appreciate it in the last session of the day. So I think it’s such a good question, and I think your question talks about some things that recognize that those challenges aren’t actually just Global South Challenge. I just want to start with the fact that we’ve released two papers. One is on closing the assurance divide, and the other is how we strengthen the global assurance ecosystem generally. And the question of access is one that impacts us all, actually.

In the UK, for example, the Department of Science, Innovation and Technology, I believe that’s what DSET stands for, has made access to models as a means to support insurance a priority for 2026. And so I think that there are a few shared challenges, and I’ll come back to the point around north -south, actually, collaboration in a second. But just thinking about closing the AI insurance divide, we released this paper, and in it we talk about around six challenge areas, from infrastructure to skills. We talk about languages and risk profiles, so the things that you’ve heard about from Vukosi and a lot of the other speakers. So I’ll give you a sense of some of the examples that we have.

So on language, we’re at the India Summit, of course, and India has over, I believe, 120 languages and 19 ,500 dialects. When we think about Africa, we have about… 1 ,500. or 3 ,000 spoken languages in itself. So when we think about benchmarking and evals and designing evals that think about how those systems are deployed in these various contexts, it’s so important to think about languages, and that just generally, I think, demonstrates the complexity of designing evals to meet the needs of this kind of diverse language ecosystem. Rebecca mentioned at the start that we had the declaration, of course, yesterday, and the commitment therefore in the declaration to multilingual evals is really critical. Of course, there’s still a lot of work to determine how do we actually do that in practice in the most effective way, and accounting for that complex and wide language diversity, but that’s one area that we talk about.

The second in terms of closing the assurance divide that we need to account for is risk profile, interestingly. in this paper, we actually interviewed a lot of assurance and safety experts internationally. And one of the things that they mentioned was differences in what they might prioritize when you think about assurance. So when you think about the Pacific Island nations, for example, they would be thinking about assuring for environmental impacts differently than maybe environmental impacts would be considered as important in the US at the moment, for example. Last year, we published a paper on post -appointment monitoring. And in that paper, we talk about sharing kind of data from companies. And one of the points that we talk about is environmental impacts.

And so it’s really interesting that I think in terms of closing the divide, it might the starting point or what you put emphasis on might vary. And that’s important to note as we’re designing things like documentation, description, and so on. And so I think it’s really interesting to see what we’ve kind of focused on. The third I’ll just quickly mention is, of course, infrastructure. I think we’ve probably all heard a lot about this throughout the summit and this idea of what it means to be sovereign and which parts of the stack to prioritize. And that is really, really important. But there are tradeoffs. So in terms of importance, I was looking at a stat that Stanford’s Helm evaluations used over 12 billion tokens and they required 19 ,500 GPU hours alone.

And so when you think about the kind of infrastructural needs, it’s so it creates barriers for a lot of countries in the global south. But I was at an interesting roundtable, actually, that even Carnegie was convening. And we were talking about the fact that how do you balance assurance needs? Where do you start from across the value chain? So at the moment, a lot of the discussion is kind of upstream. Right. We need to have that infrastructure in place. That’s the point that we need to start with. But how do you do that in parallel and how much of that resource should be put into other foundational tools for assurance, such as documentation artifacts, which is another area that we focus on a lot at PAI.

And so I think there will be a lot of questions around how do you weigh up all these challenges, again, knowing that even kind of the G7 countries, the UK AI Safety Institute started with an inaugural $100 million alone. So that prioritization and balancing is going to be important. The last thing I’ll say, coming back to agents, and I will talk about this a bit more, is the North -South collaboration is a real opportunity as we think about agents. And it’s important that global South countries aren’t always playing catch up. I think that’s a point that has come through for me from the summit, which is that NIST or the Casey, so the Center for AI Standards and Innovation.

And this is almost like a test for me of kind of saying. These names of these institutions through this panel. But they just announced a few days ago that they’re going to be working on standardizing work around agents, including that they’ve released an opportunity to comment on a paper around agent attribution and agent identity, I believe, which is really interesting. And there’s, of course, a lot of push for countries to collaborate. And you see a lot of the safety institutes collaborating on questions around assuring agents in the global north. But how do we ensure that global south countries aren’t missing from that? That will have implications for how we attribute agents, how we test agents.

And we shouldn’t just assume, again, whilst those upstream points and infrastructure is important, that in parallel, they’re ultimately part of these kind of thinking ahead questions and frameworks.

Madhu Srikumar

Great. So I’m going to take the moderator’s prerogative and have us do a rapid fire. And by rapid fire, I mean every answer is a minute and 30 seconds, which, let’s be honest, is fairly rapid for AI policies. I’m going to start with Fred because I’m more nervous about your flight than perhaps you are. So a minute and 30 seconds. What role should multilateral institutions like ITU play in making globally inclusive AI assurance happen?

Frederic Werner

Yes, I think AI for Good has a pretty ambitious goal, right? It’s simply put, it’s to unlock AI’s potential to serve humanity. Pretty big. But we can’t do it alone and no one can. It’s not one country and not one institution, not one NGO. That’s why we have 50 plus UN sister agencies as part of AI for Good, but also making great efforts to bring as many diverse voices to the table from the global south, from NGOs, from civil society. It’s always been extremely open. I like to think of it as the Davos of AI, but instead of being very exclusive, it’s extremely inclusive, right? So I think that’s a bit of a philosophy behind AI for Good.

You know, I think the AI, it’s just moving so quick. So the focus has always been on practical applications, practical solutions. But in doing that, you can tease out the next generation of standards, of policy recommendations, of collaboration and partnerships around the world. So I like to think that in the doing, you have the learning, right? And it’s not just about talking. And that’s what AI for Good has always been all about.

Madhu Srikumar

Thank you. That was incredible. You have 56 seconds left. So, yeah, I’m going to move us ahead to Vukosi. So Singapore’s aim is test once and comply globally. So from a Global South perspective, what would make that interoperability real rather than a form of exclusion?

Vukosi Marivate

Yeah, that’s a hard one. I think going back to I think the other thing that’s come out of a lot of the sessions here has been on the evaluations and how evaluations are used. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. because either on one side it’s going to take you a lot of resources to actually either put up the evaluation to be so all -encompassing on the other side to run it is going to be a lot but then when it comes down to the user which I think was our second panel that I was in this week and you’re trying to think about personalization if you’re going down to an individual what experience do they actually have and how do you get to there?

There will be some more high level safety things that will likely come out and people will be working on that and maybe that’s what I’m thinking Singapore is trying to go for but then when we’re getting to what the individual experience is given that you have the stochastic systems you don’t know what is going to happen necessarily. I know we’re trying to do that but we don’t really know what’s going to happen at the individual experience and we can’t remodel all of that. It’s going to require that again you you do have closer to where the user might be things on what actually that experience was. So one of the hats I wear is I’m a co -founder of Lilapa AI, an AI startup.

And there you will be doing more testing towards, hey, we are serving this client. We’re serving them in this way. And then you’re trying to then go in and say, where is your data coming from? What is the use cases? What are we testing for in terms of their operational kind of requirements? It would necessarily not be just one. But, yes, what you might want is

Madhu Srikumar

Yeah, that’s a great point. Assurance needs to be globally decentralized. Owen, given everything we have discussed, what’s one commitment Frontier Labs should make on assurance that would actually move the needle?

Owen Larter

yeah good question um i think there’s a question of access to the technology which is important here i think it’s one of the big themes of this conference certainly one of the things that i’ll be taking away so you think the the multilingual part of this is really important understanding respecting local cultures that’s important if you’re going to have a good product and if it’s going to be used broadly um we’ve been investing in gemini for some time now to make it better more representative across different languages we have partnerships that we’re doing here in india including with the iit bombay to to help improve performance across various different indic languages it’s also really important on the safety and security front as well to have benchmarks that are available in different languages fantastic work that ml commons are doing on this front that we’re that we’re pleased to support the other bit of access that i think is really important is having things that are quick and cheap enough for everyone to use one of the things of agentic systems is that they’re actually pretty compute intensive to use we have a range of models that we have developed and bringing to market at google deep mind including our very quick flash models which are relatively cheap, quite efficient, very, very quick.

We think these can play a really important role in powering agentic systems. It’s also going to be really important if we’re going to do effective and rigorous testing of these systems because that could be very compute intensive as well. So thinking about that access piece is something we all need to keep doing. And it’s not an easy question, really. I mean, to do it safely and ensuring that third party assurance providers consider the security questions at hand. And it’s an open

Madhu Srikumar

So, Stephanie, no bias at all since we’re both at PAI, but I wanted to give you the final word. What concrete outcomes do you think we want to see from the global AI assurance work in the next 12 months? What would success look like?

Stephanie Ifayemi

So, Owen, now that you said your one point, by the way, can hold you accountable against this delivering on the access question. But. I think we in the two papers, we talk about the need to kind of build a robust assurance ecosystem. And one of those things is changing incentives. So funny enough, another session this week, there was a question about whether we have differences in the way we’re talking about safety over the last few years and whether that we still have those divergences of whether we’ve converged. And there are a few themes that we’ve actually converged on, which is nice. And I think assurance is one of them. And this week, a lot of the discussions we’ve had are in some of those incentive areas like insurance to support assurance.

And so what does that look like? How do we drive new incentives or put some of these structures in place to drive a kind of more mature and robust ecosystem? I think that’s going to be really important. The second is professionalization. There are a lot of questions around how do you trust the assurer? And so how do we ensure that we’re thinking about the skills? What does accreditation look like for assurance organizations or individuals? And so and that will help, I think, questions around kind of access. And so that’s a kind of second piece. But hopefully, I think what we’re what we’re hoping to do. And that’s just because this is also about agents. I think that some of those foundational questions haven’t yet been resolved.

And so I’m hoping that we can move the dial to start thinking about how do you apply that to some of these future questions. So just to shout you out, Madhu. Madhu is the brains behind our safety work. And she came up with a paper on real time failure detection and monitoring of agents. And what I really like about that paper is it talks about a kind of tiered approach to assurance as well. So when you think about agent deployments, do you need to be thinking about assurance based on the risks or the stakes at hand? So is it in the financial services sector? Is it in making about making medical decisions? So how do you tie it as close to the use case and the risks?

And that needs to be also linked to reversibility. What’s the possibility around reversibility of actions and the consequences of that? And then third, we have affordances. What are the kind of affordances you give to the agents? How much autonomy do they have? And so how do you design an assurance ecosystem with all of these different components in mind and a kind of tiered approach? And the more that we can advise, you know, the USKC and a lot of policymakers who clearly are trying to make decisions in this area, I think that’s what success would look like for us.

Madhu Srikumar

This was totally not planned. Steph plugging our work here, but I can’t imagine a better note to end on this. It’s a field wide challenge, but I just want to emphasize the field wide opportunity. No, you know, no one single organization can get this right. So hopefully that’s a helpful reminder as we end with this summit and move on to the next iteration. So thank you, everyone. Hope you have a great. safe flight back home. Fred, that’s tonight for you. And for a closing keynote, I’m going to welcome Natasha Crampton, who’s a Chief Responsible AI Officer at Microsoft. And post that, we’ll hear from Chris, who’s the CEO of FMF. Thanks, everyone. Do you want to give it?

Okay, so we’re going to get mementos. Sorry, you might want to come back. You don’t want to miss this. Thank you very much.

Natasha Crampton

Thanks so much, Madhu, and to all of our panellists for what was, I think, a very rich and grounded and also at times humorous discussion. Thank you. One of the things that came across clearly for me today is that we need AI assurance to no longer just be a theoretical exercise, but we actually need to build it into an operational discipline. And that’s a discipline that really needs to work across borders, across languages and cultures, and I think increasingly across agentic systems, systems that don’t just generate outputs but actually take action. I heard this panelist focus on the fact that assurance is pretty uneven today. It’s often strongest where there’s access to compute and data and evaluation infrastructure, and weakest where those things are scarce.

And as several of our panelists emphasized, if we don’t address that gap deliberately, the shift towards AI agents is only going to make that divide even worse. Rather than closing it. When I think about the nature of assurance, I think with agentic systems, it does need to change in its emphasis somewhat. Pre -deployment testing has always been necessary for all types of systems, and so too has post -deployment testing, of course. But post -deployment testing in an agentic world takes on an even greater level of importance, in my view. When systems can plan and they can chain actions, they can interact with tools, they can adapt over time, assurance really has to move towards continuous monitoring, real -time detection, and clear accountabilities for when interventions need to take place.

That can be quite a hard technical problem, but it’s also a governance challenge. So I know that PAI is known for convening communities of not just thinkers, but also doers. And so I wanted to leave everyone with a couple of ideas of implications that really follow from some of the insights that we heard today. The first is that it’s really important that we build assurance into systems as part of the system development lifecycle. And we don’t just seek to bolt it on at the end. So that means that we need to design systems so that they can be observed and audited and constrained in practice, not just in policy documents. Second, assurance has to be interoperable.

We heard Prime Minister Modi speak yesterday about building in India and delivering to the world. That, I think, is absolutely an aspiration that we should strive towards. But that can only work if we have evidence. Evaluation methods and documents and signals of risk that are usable across regions. Thank you. and adaptable to local languages, cultures, and deployment realities. Third, assurance has to be shared. No single company or government or institution can do this alone. And that’s especially true for agents, given how pervasive they are expected to become across the economy. We need shared evaluation infrastructure, shared taxonomies, and shared investment in capacity, particularly in the global south. So for me, this is why organizations like the Partnership on AI, as well as the many collaborators that have come here together in this week’s India AI Impact Summit, as well as open engagement across the community to make sure that we get this right.

It’s a really foundational area for collaboration for all of us. Now, my view is that if we do get assurance, and by right, I mean it needs to be global and inclusive and also dynamic. I think it really does become an enabler of trust and adoption, as Minister Teo said, not a break on progress. One of the key things that I think we need to do as a community is really to treat assurance as infrastructure, infrastructure that we need to build together and put into practice together. Thanks very much.

Chris Meserole

Well, what a phenomenal session from the opening and closing keynotes to a really rich and dynamic panel. I cannot think of a better way to close out what has been an extraordinarily rich and dynamic summit as well. I have the impossible task of trying to summarize everything that was just said here. So if you’ll bear with me, I’ll just offer kind of three core themes that seem to jump out to me. One is that we need to evolve and mature our understanding of assurance. There’s a lot of reference to agents here, the kind of coming prospect of multi -agent environments as well. We need from evals to mitigations, we need to have a better kind of an evolving understanding of how to do assurance.

Second, and probably more importantly, we also heard a lot about assurance as a global effort. Here I love Steph’s point about the need for greater north -south collaboration. There’s a lot of discussion from Fred and others about the need for global standards and harmonizing those standards and making them interoperable. And then there was also a lot of reference to some of the new institutions that we’ve evolved to enable that global dialogue to happen, whether it’s the institution that was announced literally just before this session an hour ago for the kind of global network or the international network of ACs that have also been kind of revitalized recently as well. And then kind of the last point that really jumped out at me was the assurance as a shared responsibility.

And, Fikosi, I love the point about kind of assurance as a bottom -up effort, and I think it’s one that, you know, we all have a role to play here, regardless of which sector you are in, regardless of what aspect of assurance you’re taking part in, there’s a role for all of us. So with that, I’m going to leave you with just one kind of final call to action, and that is to get involved, right? You know, if we want this technology to be safe and secure and trusted, we all have a role to play. So download the reports, very important thing. Download the great reports that have just come out on this topic. Get involved.

Look at the work that PAI and others are doing as well, and become a part of the conversation about how we’re going to take this amazing technology, but really make sure that it’s safe and secure and that we have a way to trust it. You know, in the opening remarks, Rameca, kind of used this great metaphor of the seed, right? Like one of the goals of the reports that they put out and the conversation in this panel. was to try and plant the seed about, you know, to watch kind of assurance grow. So I guess the parting thought I would give you is to say let’s all kind of roll up our sleeves and get to work and make sure that the seed grows.

So with that, thank you. And thank you as well for our panelists and speakers. Thank you. Thank you. Thank you.

Related ResourcesKnowledge base sources related to the discussion topics (31)
Factual NotesClaims verified against the Diplo knowledge base (5)
Confirmedhigh

“Both papers are available via QR codes for participants to download and discuss with the authors”

The knowledge base notes that QR codes and PDF downloads were provided to participants for accessing materials [S102].

Confirmedhigh

“AI agents were not a thing a year ago, now they are emerging rapidly”

A source explicitly states that AI agents were not being discussed 12 months ago, matching the claim about their recent emergence [S23].

Confirmedhigh

“Singapore’s sandbox partnership with Google lets the government “eat our own dog food” and build credibility before wider deployment”

Singapore’s Ministry of Communications and Information partnered with Google Cloud on an AI initiative (AI Trailblazers), which functions as a sandbox for testing AI solutions [S76].

Additional Contextmedium

“The discussion highlighted the need to apply AI assurance to autonomous “agentic” AI as the world moves in that direction”

Other sources discuss the growing adoption of agentic AI and the associated risks, underscoring the relevance of assurance for such systems [S54] and note that up to 90% of public-sector agencies plan to explore or implement agentic AI within two years [S110].

Additional Contextmedium

“Partnership on AI (PAI) has released two new resources: “Strengthening the AI Assurance Ecosystem” and a paper on global AI assurance”

The knowledge base confirms that the Partnership on AI is expanding and launching new initiatives related to AI challenges, though it does not list the exact titles mentioned in the report [S101].

External Sources (111)
S1
Ensuring Safe AI_ Monitoring Agents to Bridge the Global Assurance Gap — – Natasha Crampton- Madhu Srikumar- Chris Meserole- Stephanie Ifayemi
S2
https://dig.watch/event/india-ai-impact-summit-2026/transforming-health-systems-with-ai-from-lab-to-last-mile — Last we saw was in G20. Hopefully, it brings back memories. Yes. Happy ones. I’d like to keep it that way. She has had e…
S3
https://dig.watch/event/india-ai-impact-summit-2026/inclusive-ai-starts-with-people-not-just-algorithms — AI in turn is changing IT. It’s changing IT in ways that we never believed. It was even possible. And I think that so we…
S4
AI for Good Technology That Empowers People — -Frederick Werner- Chief of Strategic Engagement Department at ITU (International Telecommunication Union)
S5
Closing remarks — – **Frederic Werner**: Event coordinator/organizer (coordinates with Secretary General, manages event logistics and anno…
S6
https://dig.watch/event/india-ai-impact-summit-2026/setting-the-rules_-global-ai-standards-for-growth-and-governance — And I think… similar with some of the controls that might need to be kind of used to manage some of the risks if there…
S7
Setting the Rules_ Global AI Standards for Growth and Governance — I’m Chris Meserole,. I’m the executive director of the Frontier Model Forum. Our mission is to advance Frontier AI safet…
S8
Ensuring Safe AI_ Monitoring Agents to Bridge the Global Assurance Gap — -Chris Meserole- CEO of FMF (organization not fully specified in transcript)
S9
Ensuring Safe AI_ Monitoring Agents to Bridge the Global Assurance Gap — – Natasha Crampton- Rebecca Finlay- Frederic Werner
S10
Making Climate Tech Count — – Nassir: No role or title mentioned – Rebecca Anderson: Moderator Rebecca Anderson: Good. Catherine, we talked abou…
S11
The reality of science fiction: Behind the scenes of race and technology — ‘Every desireis an endand every endis a desirethenthe end of the worldis a desire of the worldwhat type of end do you de…
S12
Evolving AI, evolving governance: from principles to action | IGF 2023 WS #196 — Owen Later:Fantastic. Hello, is this on? Can people hear me? Excellent, thank you. Good morning, everyone. My name is Ow…
S13
Policy Network on Artificial Intelligence | IGF 2023 — Moderator – Prateek:Good morning, everyone. To those who have made it early in the morning, after long days and long kar…
S14
DC-Sustainability Data, Access & Transparency: A Trifecta for Sustainable News | IGF 2023 — Owen Larter:Fantastic. I can jump in and give some thoughts and agree with a lot of what Gabriela said as well. I think …
S15
Ensuring Safe AI_ Monitoring Agents to Bridge the Global Assurance Gap — – Natasha Crampton- Stephanie Ifayemi – Stephanie Ifayemi- Vukosi Marivate
S16
S17
Multi-stakeholder Discussion on issues about Generative AI — Natasha Crampton:So, I’m Natasha Crankjian from Microsoft. I’m incredibly optimistic about AI’s potential to help us hav…
S18
Towards a Safer South Launching the Global South AI Safety Research Network — – Mr. Abhishek Singh- Ms. Natasha Crampton- Ms. Chenai Chair – Ms. Natasha Crampton- Dr. Rachel Sibande
S19
Democratizing AI Building Trustworthy Systems for Everyone — – Dr. Saurabh Garg- Natasha Crampton – Dr. Saurabh Garg- Natasha Crampton- Justin Carsten – Natasha Crampton- Particip…
S20
AI Impact Summit 2026: Global Ministerial Discussions on Inclusive AI Development — -Josephine Teo- Role/title not specified (represents Singapore)
S22
S23
https://dig.watch/event/india-ai-impact-summit-2026/ensuring-safe-ai_-monitoring-agents-to-bridge-the-global-assurance-gap — In 2025, in our progress report around foundation model, impact. We made exactly this recommendation. We directly called…
S24
G20 New Delhi Declaration, main takeaways — On 9 September, G20 leaders adopted the New Delhi Declaration. India’s diplomacy made a major success by fostering conse…
S25
High-Level Session 1: Navigating the Misinformation Maze: Strategic Cooperation For A Trusted Digital Future — Natalia Gherman: Thank you. I believe that one way governments, tech companies, media and civil society can work togethe…
S26
Who Watches the Watchers Building Trust in AI Governance — Independent evaluation. Independent evaluation is essential given that we are all using AI systems for all different sit…
S27
WORKING PAPER — The current global landscape is marked by an array of disparate data regula7ons, a situa7on that presents substan7al imp…
S28
Singapore opens global sandbox to test AI responsibly — Singapore has launched aglobal AI assurance sandboxled by IMDA and AI Verify Foundation. Minister Josephine Teo framed t…
S29
Sandboxes for Data Governance: Global Responsible Innovation | IGF 2023 WS #279 — Awarded on European level as a way to improve public governance By working together, these stakeholders can collaborati…
S30
Internet standards and human rights | IGF 2023 WS #460 — Addressing the underrepresentation of the Global South and considering the needs of every demographic are essential to a…
S31
Main Session on Artificial Intelligence | IGF 2023 — In today’s world, Artificial Intelligence (AI) plays a pivotal role in transforming industries and daily life. By emulat…
S32
How can Artificial Intelligence (AI) improve digital accessibility for persons with disabilities? — Ambassador Francisca Mendez:And good afternoon, everybody. Thank you so much, Excellency, Australia, Ethiopia, dear coll…
S33
How to ensure cultural and linguistic diversity in the digital and AI worlds? — Xianhong Hu:Thank you very much Mr. Ambassador. Good morning everyone. First of all please allow me, I’d like to be able…
S34
https://dig.watch/event/india-ai-impact-summit-2026/announcement-of-new-delhi-frontier-ai-commitments — The third is strengthening multilingual and contextual evaluations and real -world use cases. The fourth is strengthenin…
S35
WS #283 AI Agents: Ensuring Responsible Deployment — The speakers demonstrated strong consensus on fundamental challenges including the need for clear definitions, robust se…
S36
Leaders TalkX: ICT application to unlock the full potential of digital – Part I — 2.6 billion people remain offline globally, representing this dignity gap.
S37
Searching for Standards: The Global Competition to Govern AI | IGF 2023 — The impact of jurisdiction size on regulation was also discussed. The example of Singapore’s small jurisdiction size pot…
S38
Aligning AI Governance Across the Tech Stack ITI C-Suite Panel — The discussion maintained a collaborative and constructive tone throughout, with panelists generally agreeing on core pr…
S39
WS #257 Emerging Norms for Digital Public Infrastructure — 2. Interoperability: The need for open standards and cross-border compatibility was emphasized by several speakers.
S40
Global AI Policy Framework: International Cooperation and Historical Perspectives — So global principles are very important, but implementation must account for national contexts and capacities, as you we…
S41
Law, Tech, Humanity, and Trust — Joelle Rizk: Thank you again for giving us the floor. Thank you very much. And this definitely speaks to the coordinatio…
S42
Enhancing CSO participation in global digital policy processes: Roles, structures, and accountability — The International Telecommunication Union (ITU), recognised as the United Nations Specialised Agency for Information and…
S43
Opening — The overall tone was formal yet optimistic. Speakers acknowledged the serious challenges posed by rapid technological ch…
S44
Opening of the session — The tone was generally constructive and collaborative, with delegates emphasizing the need for cooperation and shared co…
S45
Opening of the session — The tone began very positively and constructively, with the Chair commending delegations for focused, specific intervent…
S46
Opening Remarks (50th IFDT) — The overall tone was formal yet warm and celebratory. Speakers expressed pride in the IFDT’s accomplishments and gratitu…
S47
Unpacking the High-Level Panel’s Report on Digital Cooperation: Geneva policy experts propose action plan — Capacity development in general, and the help desk in particular, should be closely related to local social dynamics, in…
S48
What’s new with cybersecurity negotiations? The informal OEWG consultations on CBMs — Something we’ve heard over and over again is that capacity building must be needs-driven and adjusted to local contexts….
S49
Report on WSIS+20 Open Consultations – 29 July 2025 (Test to be deleted) — Localised and context-driven capacity building:Recommended that capacity building needs to be localised, context-driven,…
S50
Ensuring Safe AI_ Monitoring Agents to Bridge the Global Assurance Gap — Marivate argues that Singapore’s ‘test once, comply globally’ vision requires significant localization for individual us…
S51
International multistakeholder cooperation for AI standards | IGF 2023 WS #465 — Additionally, it provides e-learning materials to enhance understanding of AI standards. Moreover, the AI Standards Hub …
S52
What is it about AI that we need to regulate? — The Role of International Institutions in Setting Norms for Advanced TechnologiesThe discussions across IGF 2025 session…
S53
Setting the Rules_ Global AI Standards for Growth and Governance — I’m happy to add to this. So I think there’s been a theme that has come across in this panel a couple of times, which is…
S54
Agentic AI in Focus Opportunities Risks and Governance — “We want standards.”[2]. “So we’re talking about standards.”[4]. “We’re talking about technical benchmarks.”[31]. “Don’t…
S55
Addressing Disputes in Electronic Commerce: — No new entity is created; instead, professional independent auditors would audit ODR providers to ensure compliance with…
S56
Who Watches the Watchers Building Trust in AI Governance — “But it would be not easy to persuade corporate executives to use the independent audit without clear economic incentive…
S57
Resolutions — – vocational education seeks to meet international standards. 15. In order to ensure quality, responsible national autho…
S58
Can we test for trust? The verification challenge in AI — A central theme was the need for more inclusive and globally representative approaches to AI testing and standards devel…
S59
Meeting REPORT — In summation, the analysis concludes that strategy planning should indeed precede performance measurement. When organisa…
S60
How to make AI governance fit for purpose? — AI governance must address various risks brought by AI technology, including data leakage, model hallucinations, AI acti…
S61
Delegated decisions, amplified risks: Charting a secure future for agentic AI — ## Introduction and Context ## Key Technical Insights ## Proposed Solutions and Recommendations Meredith Whittaker: W…
S62
Announcement of New Delhi Frontier AI Commitments — “First, advancing understanding of real‑world AI usage through anonymized and aggregated insights to support evidence‑ba…
S63
Towards a Safer South Launching the Global South AI Safety Research Network — -Need for multilingual and multicultural evaluation systems: The discussion emphasized developing benchmarks beyond Engl…
S64
Interdisciplinary approaches — AI-related issues are being discussed in various international spaces. In addition to the EU, OECD, and UNESCO, organisa…
S65
Artificial Intelligence & Emerging Tech — In conclusion, the meeting underscored the importance of AI in societal development and how it can address various chall…
S66
Press Conference: Closing the AI Access Gap — Countries need robust data strategies that include sharing frameworks and data protection measures. These strategies are…
S67
Ensuring Safe AI_ Monitoring Agents to Bridge the Global Assurance Gap — – Josephine Teo- Owen Larter Real-time failure detection and tiered assurance approaches are needed based on risk level…
S68
U.S. AI Standards Shaping the Future of Trustworthy Artificial Intelligence — – Owen Lauder- Wifredo Fernandez- Austin Marin Just as cars have standardized fuel economy ratings and crash test resul…
S69
WS #283 AI Agents: Ensuring Responsible Deployment — Third-party assessment and verification are increasingly demanded by markets as tools for building trust and ensuring ac…
S70
https://dig.watch/event/india-ai-impact-summit-2026/ensuring-safe-ai_-monitoring-agents-to-bridge-the-global-assurance-gap — And how do we demonstrate that the risks have been managed well? And that is where the assurance ecosystem that Rebecca …
S71
Informal Stakeholder Consultation Session — -Digital Divides and Inclusion: Extensive discussion on bridging connectivity gaps, with emphasis on moving beyond basic…
S72
Leaders TalkX: ICT application to unlock the full potential of digital – Part I — 2.6 billion people remain offline globally, representing this dignity gap.
S73
Searching for Standards: The Global Competition to Govern AI | IGF 2023 — The impact of jurisdiction size on regulation was also discussed. The example of Singapore’s small jurisdiction size pot…
S74
Sandboxes for Data Governance: Global Responsible Innovation | IGF 2023 WS #279 — Advocates for a harmonised approach to regulation and policy-making believe that this method can yield positive outcomes…
S75
WS #55 Future of Governance in Africa — Effective digital governance requires collaboration between government and industry stakeholders. This approach ensures …
S76
Singapore and Google Cloud launch initiative to foster AI solutions — Singapore’s Ministry of Communications and Information (MCI), Digital Industry Singapore (DISG), Smart Nation and Digita…
S77
WS #257 Emerging Norms for Digital Public Infrastructure — 2. Interoperability: The need for open standards and cross-border compatibility was emphasized by several speakers.
S78
How IS3C is going to make the Internet more secure and safer | IGF 2023 — Such standards are considered to promote transparency, collaboration, and interoperability.
S79
International multistakeholder cooperation for AI standards | IGF 2023 WS #465 — Aligning with standards allows companies to enter new markets and enhance competitiveness. Interoperability ensures seam…
S80
Opening — The overall tone was formal yet optimistic. Speakers acknowledged the serious challenges posed by rapid technological ch…
S81
Opening Remarks (50th IFDT) — The overall tone was formal yet warm and celebratory. Speakers expressed pride in the IFDT’s accomplishments and gratitu…
S82
Opening Ceremony — The tone is consistently formal, diplomatic, and optimistic yet cautionary. Speakers maintain a celebratory atmosphere a…
S83
Summit Opening Session — The tone throughout is consistently formal, diplomatic, and collaborative. Speakers maintain an optimistic and forward-l…
S84
Opening of the session — Referenced the wide sense of commitment and political will among member states and the promising, balanced nature of REV…
S85
Delegated decisions, amplified risks: Charting a secure future for agentic AI — The tone was consistently critical and cautionary throughout, with Whittaker maintaining a technically informed but acce…
S86
Defying Cognitive Atrophy in the Age of AI: A World Economic Forum Stakeholder Dialogue — The discussion began with a cautiously optimistic tone, acknowledging both opportunities and risks. However, the tone be…
S87
AI and Digital Developments Forecast for 2026 — The tone begins as analytical and educational but becomes increasingly cautionary and urgent throughout the conversation…
S88
Comprehensive Summary: AI Governance and Societal Transformation – A Keynote Discussion — The tone begins confrontational and personal as Hunter-Torricke distances himself from his tech industry past, then shif…
S89
AI and Human Connection: Navigating Trust and Reality in a Fragmented World — The tone began optimistically with audience engagement but became increasingly concerned and urgent as panelists reveale…
S90
Emerging Markets: Resilience, Innovation, and the Future of Global Development — The tone was notably optimistic and forward-looking throughout the conversation. Panelists consistently emphasized oppor…
S91
Resilient infrastructure for a sustainable world — The tone was professional and collaborative throughout, with speakers building on each other’s points constructively. Th…
S92
Open Forum #13 Bridging the Digital Divide Focus on the Global South — The discussion maintained a consistently collaborative and solution-oriented tone throughout. Speakers acknowledged seri…
S93
WS #302 Upgrading Digital Governance at the Local Level — The discussion maintained a consistently professional and collaborative tone throughout. It began with formal introducti…
S94
Safeguarding Children with Responsible AI — The discussion maintained a tone of “measured optimism” throughout. It began with urgency and concern (particularly in B…
S95
AI Infrastructure and Future Development: A Panel Discussion — The tone was overwhelmingly optimistic and bullish throughout, with panelists consistently emphasizing the “limitless” p…
S96
Building Population-Scale Digital Public Infrastructure for AI — The tone is optimistic and collaborative throughout, with speakers sharing concrete examples of successful implementatio…
S97
Panel 4 – Resilient Subsea Infrastructure for Underserved Regions  — The discussion maintained a professional, collaborative tone throughout, with panelists building on each other’s insight…
S98
Opening of the session — Advancement in collective understanding of emerging technologies was promoted, echoing the meeting’s ethos of collaborat…
S99
Global leaders pledge for responsible AI at the 2023 GPAI Summit in New Delhi — The 2023Global Partnership on Artificial Intelligence(GPAI) Summit in New Delhi brought together diverse stakeholdersaim…
S100
Day 0 Event #173 Building Ethical AI: Policy Tool for Human Centric and Responsible AI Governance — Alaa Abdulaal: So hello, everyone. I think I was honored to join the session. And I have seen a lot of amazing conver…
S101
Partnership on AI expands and launches initiatives focused on AI challenges and opportunities — The Partnership on AI, founded in September 2016 by Amazon, DeepMind/Google, Facebook, IBM, and Microsoft with the aim t…
S102
Information Society in Times of Risk — Looking toward future policy development, Kremers advocated for incorporating information society requirements into the …
S103
Day 0 Event #261 Navigating Ethical Dilemmas in AI-Generated Content — The central presentation focused on the Harlem Declaration, described as an international commitment to promote ethical …
S104
Networking Session #127 The Internet Society Community Discusses WSIS+20 and Beyond — Utilize online forum and QR codes provided to submit feedback on specific sections of the Elements Paper
S105
Open Forum #30 High Level Review of AI Governance Including the Discussion — Several concrete commitments emerged from the discussion:
S107
AI Policy Summit Opening Remarks: Discussion Report — The discussion identified several concrete commitments:
S108
AI for equality: Bridging the innovation gap — The discussion generated several concrete commitments:
S109
What policy levers can bridge the AI divide? — – **LJ Rich**: Moderator/Host (introduced the panel at the beginning)
S110
Agentic AI gains ground as GenAI maturity grows in public sector — Public sector organisations around the world are rapidly moving beyondexperimentation with generative AI (GenAI), with u…
S111
WS #35 Unlocking sandboxes for people and the planet — 2. Africa: Morine Amutorine shared insights on sandboxes in Africa, noting the prevalence of fintech sandboxes and the c…
Speakers Analysis
Detailed breakdown of each speaker’s arguments and positions
R
Rebecca Finlay
1 argument166 words per minute801 words289 seconds
Argument 1
Delhi Declaration drives accountability and usage‑data sharing (Rebecca)
EXPLANATION
Rebecca highlights that the newly adopted Delhi Declaration creates concrete obligations for AI developers to share usage data and strengthens accountability mechanisms. She notes that this commitment builds on earlier progress reports and aligns with the partnership’s broader push for transparent AI governance.
EVIDENCE
She references the Delhi Declaration adopted the previous day and explains that it includes a commitment for Frontier AI companies to share usage data, noting that this was recommended in the 2025 progress report and that some progress has already been observed [25-31].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
The G20 New Delhi Declaration, adopted on 9 September, includes commitments on AI accountability and data sharing [S24]; a 2025 progress report explicitly recommended that Frontier AI companies share usage data and notes early progress on this recommendation [S23].
MAJOR DISCUSSION POINT
AI Assurance Ecosystem & Policy Commitments
AGREED WITH
Owen Larter, Stephanie Ifayemi, Vukosi Marivate
M
Madhu Srikumar
1 argument146 words per minute1068 words436 seconds
Argument 1
AI assurance defined as independent trustworthiness verification (Madhu)
EXPLANATION
Madhu defines AI assurance as the systematic process of measuring, evaluating, and communicating the trustworthiness of AI systems. She likens it to a safety inspection that requires independent verification rather than reliance on the system’s creator.
EVIDENCE
She explains that AI assurance involves assessing safety, intended functionality, and public trust, comparing it to an independent building inspector rather than the builder’s claim, and emphasizes its role as independent verification as described by the minister [124-132].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
Independent evaluation as a core element of AI assurance is highlighted as essential for building trust in AI systems [S26] and reinforced in discussions on who monitors the watchers [S1].
MAJOR DISCUSSION POINT
AI Assurance Ecosystem & Policy Commitments
AGREED WITH
Josephine Teo, Stephanie Ifayemi, Natasha Crampton
J
Josephine Teo
2 arguments148 words per minute1271 words513 seconds
Argument 1
Proactive government sandbox and model‑governance framework for agents (Josephine)
EXPLANATION
Josephine argues that Singapore is taking a proactive stance by creating a sandbox partnership with industry to test agentic AI and by publishing a living model‑governance framework. This approach aims to build internal expertise and credibility before broader deployment.
EVIDENCE
She describes the sandbox collaboration with Google, the concept of “eating our own dog food” to test agents safely, and the release of a model-governance framework that is intended to evolve with feedback [68-73].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
Singapore’s launch of a global AI assurance sandbox, led by IMDA and the AI Verify Foundation, exemplifies the proactive sandbox approach described [S28]; the same session also outlines sandbox components such as testing, standards and third-party assurance [S1].
MAJOR DISCUSSION POINT
Governance & Risk Management of Agentic AI
DISAGREED WITH
Vukosi Marivate
Argument 2
Assurance requires testing, standards, and third‑party auditors (Josephine)
EXPLANATION
Josephine outlines three essential components for a robust AI assurance ecosystem: rigorous technical testing, the development of clear standards, and independent third‑party assurance providers to validate safety claims. She stresses that these elements are necessary to manage the heightened risks of autonomous agents.
EVIDENCE
She enumerates testing (technical assessments, datasets, reasoning steps), standards (defining “good enough” and meeting safety expectations), and third-party auditors (technical testers, auditors providing independence) as the three pillars of assurance [97-109].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
The three essential pillars for a robust AI assurance ecosystem-technical testing, standards development, and independent third-party auditors-are enumerated in the session notes [S1].
MAJOR DISCUSSION POINT
Governance & Risk Management of Agentic AI
AGREED WITH
Stephanie Ifayemi, Natasha Crampton, Madhu Srikumar
DISAGREED WITH
Natasha Crampton
F
Frederic Werner
2 arguments180 words per minute1021 words339 seconds
Argument 1
Global standards must embed common‑sense principles and be inclusive (Frederic)
EXPLANATION
Frederic stresses that emerging AI standards should incorporate practical, common‑sense safeguards and be designed to include voices from the Global South. He sees multilateral platforms like AI for Good as crucial for translating ambitious principles into actionable standards.
EVIDENCE
He notes that AI for Good convenes diverse stakeholders, emphasizes turning principles into actions, and highlights the need for inclusive standards that reflect varied regional contexts [160-170].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
AI for Good emphasizes turning high-level principles into practical, inclusive standards and highlights the need for common-sense safeguards [S4]; internet-standard discussions stress inclusion of the Global South and diverse stakeholders [S30]; OECD guidance underlines coordinated, inclusive standard-setting [S22].
MAJOR DISCUSSION POINT
Global Inclusion – Multilingual & South‑North Divide
AGREED WITH
Rebecca Finlay, Stephanie Ifayemi, Vukosi Marivate
Argument 2
Multilateral bodies (ITU, AI for Good) should drive inclusive global assurance (Frederic)
EXPLANATION
Frederic argues that institutions such as the ITU and AI for Good are uniquely positioned to coordinate inclusive, worldwide AI assurance efforts. Their broad membership and collaborative ethos can help translate global standards into practice across regions.
EVIDENCE
He describes AI for Good’s network of over 50 UN agencies, its inclusive philosophy likened to a “Davos of AI” that welcomes diverse voices, and its role in generating practical solutions that feed into standards and policy recommendations [307-320].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
AI for Good’s network of over 50 UN agencies and its inclusive philosophy position it as a hub for global assurance efforts [S4]; the ITU is cited as a key multilateral platform for equitable standard development [S30]; OECD’s call for coordinated transparency and incident reporting supports multilateral leadership in assurance [S22].
MAJOR DISCUSSION POINT
Collaborative, Shared Responsibility & Call to Action
AGREED WITH
Madhu Srikumar, Stephanie Ifayemi, Natasha Crampton, Chris Meserole
DISAGREED WITH
Owen Larter
V
Vukosi Marivate
1 argument178 words per minute562 words189 seconds
Argument 1
Assurance must handle diverse languages and build local capacity, not be top‑down (Vukosi)
EXPLANATION
Vukosi points out that assurance frameworks need to reflect the linguistic diversity and local policy capacities of Global South countries. He argues that a top‑down approach would miss critical contextual nuances, and capacity building among local policymakers is essential.
EVIDENCE
He mentions the large number of languages in India and Africa, the need for local understanding, and stresses that policymakers must be equipped to interpret labor laws, data governance, and monitoring, otherwise automated decisions may not align with local values [231-241].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
Vukosi highlighted the massive linguistic diversity in India and Africa and the need for local capacity building in assurance frameworks [S1]; further emphasis on strengthening multilingual and contextual evaluations is noted [S34]; broader discussions on multilingual challenges in global assurance are present [S23].
MAJOR DISCUSSION POINT
Global Inclusion – Multilingual & South‑North Divide
AGREED WITH
Rebecca Finlay, Owen Larter, Stephanie Ifayemi
DISAGREED WITH
Josephine Teo
S
Stephanie Ifayemi
3 arguments177 words per minute1576 words532 seconds
Argument 1
Multilingual evaluation commitment highlights language‑centric challenges (Stephanie)
EXPLANATION
Stephanie explains that the Delhi Declaration’s commitment to multilingual evaluation underscores the complexity of assessing AI across thousands of languages and dialects. She notes that designing effective benchmarks for such diversity is a key challenge for assurance work.
EVIDENCE
She cites the numbers of languages in India (≈120) and Africa (≈1,500-3,000), and describes how these linguistic variations complicate benchmarking and evaluation design, linking this to the declaration’s multilingual commitment [262-267].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
The Delhi Declaration’s explicit commitment to multilingual evaluation underscores the complexity of benchmarking across thousands of languages, as discussed in the multilingual evaluation strengthening notes [S34]; the declaration’s multilingual focus is also referenced in the G20 summary [S24].
MAJOR DISCUSSION POINT
Global Inclusion – Multilingual & South‑North Divide
AGREED WITH
Rebecca Finlay, Vukosi Marivate, Frederic Werner
Argument 2
Six challenge areas (infrastructure, skills, languages, risk profiles, documentation, etc.) identified (Stephanie)
EXPLANATION
Stephanie outlines six major challenge domains that must be addressed to close the AI assurance divide: infrastructure, skills, language diversity, risk‑profile differences, documentation, and related factors. She argues that each area requires targeted interventions to enable equitable assurance.
EVIDENCE
She references the paper that enumerates these six challenge areas, giving examples such as GPU-intensive evaluation infrastructure, language diversity, and varying risk priorities across regions [259-267].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
A paper presented in the session enumerates six major challenge domains-infra-structure, skills, language diversity, risk-profile differences, documentation, and related factors-that must be addressed to close the AI assurance divide [S1].
MAJOR DISCUSSION POINT
Infrastructure, Incentives & Professionalisation to Close the Assurance Divide
AGREED WITH
Rebecca Finlay, Owen Larter, Vukosi Marivate
Argument 3
Need for new incentives, insurance mechanisms and professional accreditation for assureurs (Stephanie)
EXPLANATION
Stephanie calls for the creation of incentives—such as insurance products—and professional accreditation schemes to motivate and standardise the work of assurance providers. She believes these mechanisms will strengthen the ecosystem and improve trust in AI systems.
EVIDENCE
She discusses converging themes around assurance, the role of insurance to support assurance, the need for professionalisation, skills development, and accreditation for assurance organisations or individuals [363-376].
EXTERNAL EVIDENCE (KNOWLEDGE BASE)
The discussion calls for new incentives such as insurance products and professional accreditation schemes to motivate and standardise assurance providers, highlighting the role of insurance and professionalisation in ecosystem maturity [S1]; similar points about balancing challenges and incentives are raised in broader assurance debates [S23].
MAJOR DISCUSSION POINT
Infrastructure, Incentives & Professionalisation to Close the Assurance Divide
AGREED WITH
Josephine Teo, Natasha Crampton, Madhu Srikumar
DISAGREED WITH
Josephine Teo
O
Owen Larter
2 arguments201 words per minute1152 words342 seconds
Argument 1
Development of agents‑to‑agents and universal commerce protocols for safe interaction (Owen)
EXPLANATION
Owen describes technical work at Google DeepMind to create standardized protocols that enable agents to communicate with each other and with web services securely. These protocols aim to provide the same foundational interoperability that early internet standards like HTTP delivered.
EVIDENCE
He mentions the agents-to-agents protocol, the universal commerce protocol, and compares them to early internet standards such as HTTP and URLs, explaining how they convey identity, capabilities, and intent between agents and websites [202-208].
MAJOR DISCUSSION POINT
Technical Standards, Interoperability & Security for Agents
DISAGREED WITH
Frederic Werner
Argument 2
Security scanning of agentic downloads and provision of cheap, efficient models (Owen)
EXPLANATION
Owen highlights efforts to mitigate security risks by scanning agentic downloads for malware and by offering low‑cost, high‑efficiency models that make testing and deployment more accessible. These steps aim to reduce barriers for widespread, safe adoption of agentic AI.
EVIDENCE
He describes collaboration with VirusTotal to scan downloaded skills/apps for vulnerabilities, and notes the development of “flash” models that are inexpensive, fast, and suitable for compute-intensive agentic workloads [222-227].
MAJOR DISCUSSION POINT
Technical Standards, Interoperability & Security for Agents
AGREED WITH
Rebecca Finlay, Stephanie Ifayemi, Vukosi Marivate
N
Natasha Crampton
1 argument136 words per minute637 words279 seconds
Argument 1
Assurance must be built into the development lifecycle, be interoperable and shared (Natasha)
EXPLANATION
Natasha argues that AI assurance should be integrated from the start of system design, ensuring interoperability across regions and shared resources. She stresses that without such integration, the shift to agentic systems could widen existing assurance gaps.
EVIDENCE
She calls for assurance to be embedded in the development lifecycle, to be interoperable across languages and cultures, and to be shared through common evaluation infrastructure and capacity building, especially for the Global South [425-434].
MAJOR DISCUSSION POINT
Collaborative, Shared Responsibility & Call to Action
AGREED WITH
Josephine Teo, Stephanie Ifayemi, Madhu Srikumar
DISAGREED WITH
Josephine Teo
C
Chris Meserole
1 argument135 words per minute534 words236 seconds
Argument 1
All stakeholders must participate; download reports, join initiatives (Chris)
EXPLANATION
Chris issues a call to action, urging everyone—governments, industry, civil society—to engage with the newly released reports, contribute to standards development, and actively participate in assurance initiatives. He frames participation as essential to advancing safe and trustworthy AI.
EVIDENCE
He summarises three core themes, emphasises the need for global standards, mentions the recent establishment of new institutions, and explicitly asks listeners to download the reports and get involved in the conversation [447-462].
MAJOR DISCUSSION POINT
Collaborative, Shared Responsibility & Call to Action
AGREED WITH
Frederic Werner, Madhu Srikumar, Stephanie Ifayemi, Natasha Crampton
Agreements
Agreement Points
AI assurance requires rigorous testing, clear standards, and independent third‑party verification
Speakers: Josephine Teo, Stephanie Ifayemi, Natasha Crampton, Madhu Srikumar
Assurance requires testing, standards, and third‑party auditors (Josephine) Need for new incentives, insurance mechanisms and professional accreditation for assureurs (Stephanie) Assurance must be built into the development lifecycle, be interoperable and shared (Natasha) AI assurance defined as independent trustworthiness verification (Madhu)
All speakers agree that a robust AI assurance ecosystem hinges on technical testing, the creation of standards, and independent third-party assessment, and that assurance should be embedded from design through deployment as an independent verification process. [97-109][363-376][425-434][124-132]
POLICY CONTEXT (KNOWLEDGE BASE)
This aligns with calls for independent auditors to verify compliance, as highlighted in discussions on professional auditing frameworks [S55], and reflects concerns about the lack of financial incentives for such audits [S56]. It also echoes the need for inclusive, globally representative testing standards noted in recent AI governance analyses [S58].
Multilingual evaluation and language diversity are critical challenges for global AI assurance
Speakers: Rebecca Finlay, Stephanie Ifayemi, Vukosi Marivate, Frederic Werner
Delhi Declaration drives accountability and usage‑data sharing (Rebecca) Multilingual evaluation commitment highlights language‑centric challenges (Stephanie) Assurance must handle diverse languages and build local capacity, not be top‑down (Vukosi) Global standards must embed common‑sense principles and be inclusive (Frederic)
Speakers concur that the multilingual commitment in the Delhi Declaration underscores the complexity of assuring AI across thousands of languages, requiring local capacity building and inclusive, language-aware standards. [25-31][262-267][231-241][160-170]
POLICY CONTEXT (KNOWLEDGE BASE)
The importance of multilingual and culturally contextual evaluation has been emphasized in the New Delhi Frontier AI commitments [S62] and the Global South AI Safety Research Network, which calls for benchmarks beyond English-language models [S63]. Earlier capacity-building reports also stress the need for local language inclusion [S49].
Multilateral and multistakeholder collaboration is essential for inclusive AI assurance
Speakers: Frederic Werner, Madhu Srikumar, Stephanie Ifayemi, Natasha Crampton, Chris Meserole
Multilateral bodies (ITU, AI for Good) should drive inclusive global assurance (Frederic) What role should multilateral institutions like ITU play in making globally inclusive AI assurance happen? (Madhu) North‑South collaboration is a real opportunity (Stephanie) Assurance must be shared; no single entity can do it alone (Natasha) All stakeholders must participate; download reports, join initiatives (Chris)
There is broad consensus that global AI assurance requires coordinated action by multilateral institutions, multistakeholder platforms, and north-south partnerships, with shared resources and collective participation. [307-320][302-306][291-299][435-438][447-462]
POLICY CONTEXT (KNOWLEDGE BASE)
Multistakeholder cooperation is advocated by the AI Standards Hub and IGF sessions promoting inclusive access to AI standards [S51], while the role of international institutions in setting norms for advanced technologies is underscored in IGF deliberations [S52]. Broader interdisciplinary coordination among UN agencies and regional bodies further supports this view [S64][S65].
Accessible, affordable tools and infrastructure are needed to close the AI assurance divide
Speakers: Rebecca Finlay, Owen Larter, Stephanie Ifayemi, Vukosi Marivate
Delhi Declaration drives accountability and usage‑data sharing (Rebecca) Security scanning of agentic downloads and provision of cheap, efficient models (Owen) Six challenge areas (infrastructure, skills, languages, risk profiles, documentation, etc.) identified (Stephanie) Assurance must handle diverse languages and build local capacity, not be top‑down (Vukosi)
All agree that high-cost compute and infrastructure barriers hinder assurance, and that low-cost models, scalable infrastructure, and capacity building are essential to enable equitable AI assurance worldwide. [25-31][351-354][277-282][231-241]
POLICY CONTEXT (KNOWLEDGE BASE)
Capacity-development recommendations call for affordable, locally relevant tools and infrastructure, emphasizing context-driven approaches [S47][S48][S49]. Technical access and compute efficiency are identified as primary barriers to safe AI assurance [S50], and closing the AI access gap is highlighted in recent policy statements urging robust data strategies and government leadership [S66].
Agentic AI introduces new risks that demand continuous monitoring and assurance
Speakers: Josephine Teo, Natasha Crampton, Stephanie Ifayemi, Owen Larter
Autonomy also introduces new risk (Josephine) Post‑deployment testing in an agentic world takes on an even greater level of importance (Natasha) Tiered assurance for agents based on risk and stakes (Stephanie) Security scanning of agentic downloads and focus on agent security (Owen)
Speakers concur that autonomous agents heighten risk, requiring ongoing, real-time monitoring, tiered assurance frameworks, and robust security measures throughout the lifecycle. [53-58][419-422][384-394][222-227]
POLICY CONTEXT (KNOWLEDGE BASE)
Stakeholders have urged the development of concrete technical benchmarks and standards specifically for agentic AI systems [S54], and recent reports call for comprehensive monitoring and control mechanisms to manage emerging risks [S60][S61].
Similar Viewpoints
Both emphasize proactive testing environments and technical safeguards (sandbox, security scanning) as essential for safe deployment of agentic AI. [68-73][222-227]
Speakers: Josephine Teo, Owen Larter
Proactive government sandbox and model‑governance framework for agents (Josephine) Security scanning of agentic downloads and provision of cheap, efficient models (Owen)
Both stress that assurance frameworks must be inclusive of the Global South, reflecting linguistic diversity and local policy capacity. [160-170][231-241]
Speakers: Frederic Werner, Vukosi Marivate
Global standards must embed common‑sense principles and be inclusive (Frederic) Assurance must handle diverse languages and build local capacity, not be top‑down (Vukosi)
Both call for concrete incentives and broad stakeholder engagement to mature the assurance ecosystem. [363-376][447-462]
Speakers: Stephanie Ifayemi, Chris Meserole
Need for new incentives, insurance mechanisms and professional accreditation for assureurs (Stephanie) All stakeholders must participate; download reports, join initiatives (Chris)
Unexpected Consensus
Assurance can be framed as a strategic competitive advantage for companies
Speakers: Josephine Teo, Stephanie Ifayemi
Think of it as a strategic competitive advantage (Josephine) Need for new incentives, insurance mechanisms and professional accreditation for assureurs (Stephanie)
While a government minister typically emphasizes public safety, Josephine explicitly positions high assurance as a market differentiator, and Stephanie similarly highlights incentives (including competitive advantage) for firms, showing unexpected alignment on viewing assurance as a business advantage. [86-88][363-376]
Overall Assessment

The panel displayed strong consensus on the necessity of a robust, inclusive AI assurance ecosystem that incorporates rigorous testing, standards, third‑party verification, multilingual considerations, and shared multilateral effort. There is agreement that accessible tools, capacity building, and continuous monitoring for agentic AI are essential. The convergence across government, industry, and civil‑society voices signals a solid foundation for coordinated action.

High consensus across most speakers, indicating a shared understanding of the core pillars needed for trustworthy AI and suggesting that forthcoming policy and technical initiatives are likely to receive broad support.

Differences
Different Viewpoints
Centralised "test‑once‑and‑comply‑globally" approach versus the need for locally‑driven capacity building and avoidance of top‑down frameworks
Speakers: Josephine Teo, Vukosi Marivate
Proactive government sandbox and model‑governance framework for agents (Josephine) Assurance must handle diverse languages and build local capacity, not be top‑down (Vukosi)
Singapore’s policy of testing AI agents once and then applying the results globally is presented as a way to streamline assurance [324]. Vukosi counters that assurance frameworks must reflect linguistic diversity and local policy capacity, warning that a top-down model would miss critical contextual nuances and could lead to exclusion [231-241][326-340].
Who should lead the development of inclusive global AI assurance standards – multilateral institutions versus industry‑driven technical protocols
Speakers: Frederic Werner, Owen Larter
Multilateral bodies (ITU, AI for Good) should drive inclusive global assurance (Frederic) Development of agents‑to‑agents and universal commerce protocols for safe interaction (Owen)
Frederic argues that bodies like the ITU and AI for Good are uniquely positioned to coordinate inclusive, worldwide assurance efforts [307-320]. Owen emphasizes industry initiatives at Google DeepMind, such as agents-to-agents and universal commerce protocols, as the primary means to achieve safe interoperability [202-208].
POLICY CONTEXT (KNOWLEDGE BASE)
Debates on governance highlight the central role of international institutions in norm-setting for advanced technologies [S52], while concerns about who contributes to consensus standards point to tensions between multilateral bodies and industry-led protocols [S53]. Multistakeholder initiatives further illustrate the push for broader participation [S51].
Use of financial incentives and professional accreditation versus a focus on technical testing, standards and third‑party auditors
Speakers: Stephanie Ifayemi, Josephine Teo
Need for new incentives, insurance mechanisms and professional accreditation for assureurs (Stephanie) Assurance requires testing, standards, and third‑party auditors (Josephine)
Stephanie calls for the creation of insurance products and accreditation schemes to motivate and professionalise assurance providers [363-376]. Josephine, by contrast, frames a robust assurance ecosystem around technical testing, the development of standards, and independent third-party verification, without mentioning financial incentives [97-109].
POLICY CONTEXT (KNOWLEDGE BASE)
Proposals for professional independent auditors to ensure compliance are discussed alongside challenges of providing clear economic incentives for their adoption [S55][S56].
Emphasis on continuous post‑deployment monitoring versus a primary focus on pre‑deployment testing
Speakers: Natasha Crampton, Josephine Teo
Assurance must be built into the development lifecycle, be interoperable and shared (Natasha) Assurance requires testing, standards, and third‑party auditors (Josephine)
Natasha stresses that, for agentic systems, assurance must move toward continuous monitoring, real-time detection and clear accountability after deployment [420-422]. Josephine’s three-pillar model centres on pre-deployment technical testing, standards creation and third-party attestation, with less explicit attention to ongoing monitoring [97-109].
POLICY CONTEXT (KNOWLEDGE BASE)
AI governance frameworks stress the need for ongoing monitoring and control mechanisms after deployment [S60][S61], contrasting with earlier emphasis on pre-deployment testing concentrated in resource-rich settings [S58].
Unexpected Differences
Industry‑centric protocol development versus the need for broad, inclusive multilateral coordination
Speakers: Owen Larter, Frederic Werner
Development of agents‑to‑agents and universal commerce protocols for safe interaction (Owen) Multilateral bodies (ITU, AI for Good) should drive inclusive global assurance (Frederic)
While industry often leads technical standardisation, Frederic’s emphasis on multilateral, inclusive bodies was not anticipated given the heavy focus on private-sector protocol work earlier in the session. This reveals a tension between proprietary technical solutions and the desire for globally coordinated, inclusive standards [202-208][307-320].
POLICY CONTEXT (KNOWLEDGE BASE)
Concerns about exclusive industry-driven standards are raised in analyses of who contributes to consensus and the call for inclusive, globally representative approaches [S53][S58], while multistakeholder coordination is advocated as essential for equitable AI assurance [S51][S52].
Overall Assessment

The panel broadly agrees on the necessity of a robust, trustworthy AI assurance ecosystem, but diverges on where responsibility should lie (national sandbox vs multilateral coordination), the balance between centralized standards and local capacity, the role of financial incentives, and the emphasis on continuous post‑deployment monitoring.

Moderate to high disagreement: while there is consensus on the goal, the differing viewpoints on governance structures, incentive mechanisms and operational focus indicate significant strategic gaps that could impede coordinated action unless reconciled.

Partial Agreements
All speakers concur that a robust AI assurance ecosystem is essential and that the Delhi Declaration’s commitments, independent verification, proactive governance, multilingual evaluation and inclusive standards are needed. However, they diverge on the primary mechanisms: Rebecca focuses on policy commitments, Madhu on definition, Josephine on sandbox testing, Stephanie on multilingual benchmarks, and Frederic on multilateral standard‑setting [25-31][124-132][68-73][262-267][307-320].
Speakers: Rebecca Finlay, Madhu Srikumar, Josephine Teo, Stephanie Ifayemi, Frederic Werner
Delhi Declaration drives accountability and usage‑data sharing (Rebecca) AI assurance defined as independent trustworthiness verification (Madhu) Proactive government sandbox and model‑governance framework for agents (Josephine) Multilingual evaluation commitment highlights language‑centric challenges (Stephanie) Global standards must embed common‑sense principles and be inclusive (Frederic)
Both agree that technical infrastructure and standards are critical, but Owen concentrates on specific protocol development, while Stephanie maps a broader set of systemic challenges that must be addressed before such protocols can be effective [202-208][259-267].
Speakers: Owen Larter, Stephanie Ifayemi
Development of agents‑to‑agents and universal commerce protocols for safe interaction (Owen) Six challenge areas (infrastructure, skills, languages, risk profiles, documentation, etc.) identified (Stephanie)
Takeaways
Key takeaways
The Delhi Declaration establishes new commitments on usage‑data sharing and multilingual, contextual AI evaluations, reinforcing the need for robust AI assurance. AI assurance is defined as an independent, systematic verification of AI systems’ safety, reliability, and trustworthiness. Agentic AI introduces heightened risks due to autonomy; effective governance requires proactive government sandboxes, model‑governance frameworks, and continuous monitoring. A functional AI assurance ecosystem must include three pillars: rigorous testing, enforceable standards, and third‑party auditors. Global inclusion is essential: assurance frameworks must handle the vast linguistic diversity of the Global South and build local capacity rather than imposing top‑down solutions. Technical interoperability (agents‑to‑agents protocols, universal commerce protocols) and security (malware scanning, efficient low‑cost models) are critical for safe deployment of autonomous agents. Closing the assurance divide involves six challenge areas – infrastructure, skills, language coverage, risk‑profile alignment, documentation, and incentives such as insurance and professional accreditation. Collaboration across governments, multilateral bodies (ITU, AI for Good), industry, and civil society is required; assurance should be treated as shared infrastructure built into the AI development lifecycle.
Resolutions and action items
Release and disseminate two new PAI papers – ‘Strengthening the AI Assurance Ecosystem’ and ‘Closing the Global Assurance Divide’ – via QR codes for download. Singapore to continue operating its agentic‑AI sandbox with industry partners (e.g., Google) and to update its model‑governance framework as a living document. Google DeepMind to advance and open‑source agents‑to‑agents and universal commerce protocols, and to provide low‑cost, high‑efficiency models for broader testing. ITU to incorporate multilingual and contextual evaluation considerations into its standard‑setting work and to promote inclusive global assurance processes. PAI to pursue professionalisation of assurance (accreditation schemes) and to explore insurance‑based incentives for trustworthy AI deployment. All participants encouraged to download the reports, contribute feedback, and join ongoing collaborative initiatives (e.g., AI for Good, NIST/Casey AI Standards network).
Unresolved issues
Concrete methodology for multilingual, contextual evaluations across thousands of languages and dialects remains undefined. How to fund and scale the heavy compute and data infrastructure needed for large‑scale assurance testing in low‑resource regions. Specific mechanisms for third‑party assurance provision in the Global South, including capacity‑building and certification pathways. Details of continuous post‑deployment monitoring and real‑time failure detection for agentic systems were discussed but not finalized. The exact balance between proactive government regulation and industry‑led self‑assessment frameworks is still open. How to ensure that interoperability standards for agents do not become exclusionary for jurisdictions lacking technical resources.
Suggested compromises
Adopt a tiered assurance approach that matches the level of risk and stakes of a use‑case, allowing lighter assessments for low‑impact applications while reserving full audits for high‑risk domains. Combine proactive government sandboxes with industry feedback loops, positioning regulators as early adopters (“eating our own dog food”) while still requiring independent third‑party validation. Balance upstream infrastructure investments (e.g., compute resources) with parallel development of lightweight documentation and tooling to lower entry barriers for emerging economies. Encourage global standards development that is modular, allowing regions to adopt core safety components while adding locally‑relevant language and risk‑profile extensions.
Thought Provoking Comments
We need to shift from reactive regulation to a proactive preparation stance, with the government itself being a leader and testing agentic AI in a sandbox with industry partners.
This reframes AI governance from a lag‑behind model to an experimental, learning‑by‑doing approach, highlighting the importance of government‑industry collaboration and real‑world testing rather than waiting for incidents.
Her remark redirected the conversation from abstract policy to concrete experimentation, prompting later speakers (e.g., Owen Larter) to discuss technical standards and sandbox mechanisms, and set the tone for viewing assurance as an enabler of innovation rather than a barrier.
Speaker: Josephine Teo
The assurance ecosystem cannot be robust without third‑party assurance providers; it is one thing to claim safety, another to have an independent party attest to it.
She introduced the idea that trust requires external validation, echoing practices from aviation and healthcare, and positioned third‑party auditors as essential for credibility.
This sparked the panel’s focus on standards and external verification, leading Owen Larter to mention collaborations with security teams (VirusTotal) and Stephanie Ifayemi to discuss capacity gaps for third‑party evaluations in the Global South.
Speaker: Josephine Teo
There are 2.6 billion people offline; AI can help remove friction (e.g., language barriers, literacy disabilities) but we cannot assume that simply providing tools will automatically create value or responsible use.
He broadened the discussion to include connectivity and digital literacy, emphasizing that technology alone won’t close the assurance divide without education and local relevance.
His point shifted the panel toward the socioeconomic dimensions of assurance, prompting Vukosi Marivate to stress local capacity and Stephanie Ifayemi to enumerate language‑related challenges.
Speaker: Frederic Werner
We need technical protocols like the agents‑to‑agents protocol and universal commerce protocol to enable interoperability, similar to how HTTP and URLs underpinned the early internet.
He introduced a concrete, infrastructure‑level solution for the emerging agentic economy, linking standards directly to the ability of agents to communicate and transact safely.
This concrete proposal moved the dialogue from high‑level policy to actionable engineering work, influencing subsequent remarks about the need for cheap, accessible models and prompting the rapid‑fire discussion on multilateral roles.
Speaker: Owen Larter
Assurance must be built on three pillars: rigorous testing (including the reasoning steps of agents), standards that define “good enough,” and independent third‑party assurance providers.
She distilled the assurance challenge into a clear framework, providing a roadmap that participants could reference throughout the session.
The three‑pillar model became a reference point for later speakers, especially Stephanie’s discussion of challenge areas and Natasha’s call to embed assurance throughout the system lifecycle.
Speaker: Josephine Teo
Closing the global assurance divide involves six challenge areas—language diversity, risk‑profile differences, infrastructure, documentation, incentives, and professionalisation—each requiring tailored solutions.
She moved the conversation from abstract gaps to a structured taxonomy, making the problem tractable and highlighting where resources are most needed.
Her taxonomy guided the rapid‑fire segment, informing Vukosi’s focus on local evaluation capacity and Owen’s emphasis on affordable compute, and it anchored the summit’s concluding recommendations.
Speaker: Stephanie Ifayemi
Assurance should be treated as infrastructure: it must be built into the development lifecycle, be interoperable across regions and languages, and be shared among governments, industry, and civil society.
She synthesized the panel’s insights into a strategic vision, positioning assurance not as an after‑thought but as foundational infrastructure that enables trust and adoption.
Her framing reinforced the earlier calls for continuous monitoring and standardisation, giving the closing remarks a unifying narrative that tied together the technical, policy, and capacity themes discussed earlier.
Speaker: Natasha Crampton
We need to think about assurance in a tiered way, matching the level of scrutiny to the risk and stakes of the specific use‑case (e.g., finance vs. healthcare).
Introduces a nuanced, risk‑based approach that acknowledges that a one‑size‑fits‑all assurance model is impractical, especially for diverse global contexts.
This prompted participants to consider differentiated regulatory pathways and influenced the discussion on how third‑party auditors can focus on high‑impact domains first, shaping the panel’s concluding recommendations.
Speaker: Stephanie Ifayemi
Overall Assessment

The discussion was shaped by a series of pivotal comments that moved the conversation from high‑level declarations to concrete, actionable frameworks. Josephine Teo’s shift toward proactive, sandbox‑based governance and her three‑pillar model set the conceptual foundation. Frederic Werner broadened the scope by highlighting connectivity and digital‑literacy gaps, prompting a focus on capacity building in the Global South. Owen Larter supplied tangible technical standards for agent interoperability, while Stephanie Ifayemi provided a structured taxonomy of assurance challenges and a tiered risk‑based approach. Natasha Crampton’s closing synthesis framed assurance as essential infrastructure, tying together the technical, policy, and equity strands. Collectively, these insights redirected the dialogue toward practical standards, inclusive capacity building, and a shared‑responsibility mindset, steering the panel toward concrete next steps rather than remaining in abstract debate.

Follow-up Questions
What are the essential components of a robust AI assurance ecosystem for agentic AI?
Identifies testing, standards, and third‑party assurance as needed to ensure safety, reliability, and trustworthiness of autonomous agents.
Speaker: Josephine Teo
How should AI assurance be defined and operationalized, especially considering the 2.6 billion people who remain offline and may be excluded from current frameworks?
Seeks an inclusive definition and mechanisms so that assurance practices do not leave large offline populations behind.
Speaker: Madhu Srikumar (to Frederic Werner)
What specific safety and security challenges do autonomous agents pose, particularly when they interact with personal accounts, email, banking, and can download skills or apps?
Highlights the need to address malware, misuse, and secure protocols for agents that act on sensitive user data.
Speaker: Madhu Srikumar (to Owen Larter)
How well do assurance frameworks designed in the US, UK, or Singapore translate to contexts with different languages, data, and deployment conditions, and what is missing?
Calls for assessment of the applicability of existing frameworks to local realities in the Global South.
Speaker: Madhu Srikumar (to Vukosi Marivate)
What are the concrete gaps in closing the global AI assurance divide (e.g., capacity for third‑party evaluations, access to models, infrastructure, skills), and what would be required to close them?
Requests a detailed inventory of obstacles and actionable steps to achieve equitable assurance capabilities worldwide.
Speaker: Madhu Srikumar (to Stephanie Ifayemi)
What role should multilateral institutions like the ITU play in making globally inclusive AI assurance happen?
Seeks clarification on how intergovernmental bodies can coordinate standards, capacity‑building, and inclusive participation.
Speaker: Madhu Srikumar (question directed to Frederic Werner)
From a Global South perspective, what would make interoperability of AI assurance standards real rather than a form of exclusion?
Looks for mechanisms that ensure standards are accessible, affordable, and adaptable for low‑resource environments.
Speaker: Madhu Srikumar (to Vukosi Marivate)
What single commitment should Frontier Labs make on assurance that would actually move the needle?
Requests a concrete, measurable pledge from the industry leader to advance assurance practice.
Speaker: Madhu Srikumar (to Owen Larter)
What concrete outcomes should the global AI assurance community achieve in the next 12 months, and what would success look like?
Aims to define short‑term milestones and success metrics for the emerging assurance ecosystem.
Speaker: Madhu Srikumar (to Stephanie Ifayemi)
Develop robust testing methodologies and datasets for evaluating safety, reliability, and reasoning processes of agentic AI systems.
Current lack of standardized tests hampers ability to certify complex autonomous agents.
Speaker: Josephine Teo
Create standardized technical protocols (e.g., agents‑to‑agents, universal commerce) to enable interoperability among autonomous agents.
Interoperability is essential for a thriving agentic economy and for consistent assurance across platforms.
Speaker: Owen Larter
Establish third‑party assurance providers and accreditation mechanisms to independently verify agentic AI safety.
Independent verification builds trust and helps identify blind spots beyond in‑house testing.
Speaker: Josephine Teo; Stephanie Ifayemi
Design multilingual and culturally aware evaluation frameworks to assess AI systems across thousands of languages and dialects.
Language diversity is a major barrier; evaluations must reflect local linguistic realities.
Speaker: Stephanie Ifayemi; Vukosi Marivate; Frederic Werner
Build affordable compute and infrastructure resources for assurance activities in low‑resource settings.
High GPU and token costs create a barrier for many countries to conduct rigorous evaluations.
Speaker: Stephanie Ifayemi
Understand region‑specific risk profiles (e.g., environmental impacts for Pacific Island nations) to tailor assurance priorities.
Different locales prioritize different risks; assurance must be context‑sensitive.
Speaker: Stephanie Ifayemi
Investigate AI literacy and skilling pathways in the Global South to enable effective use and governance of AI agents.
Without widespread AI literacy, deployment of agents may not yield intended benefits or safety.
Speaker: Frederic Werner
Explore how AI can be leveraged to reduce the digital connectivity gap, especially through language‑specific content and services.
AI could help overcome bottlenecks (e.g., language barriers) that keep populations offline.
Speaker: Frederic Werner
Develop real‑time monitoring, failure detection, and reversible‑action mechanisms for deployed agentic systems.
Continuous post‑deployment assurance is crucial as agents can act autonomously over time.
Speaker: Natasha Crampton; Stephanie Ifayemi
Create incentive structures (e.g., insurance models) that encourage organizations to invest in robust AI assurance.
Aligning economic incentives can accelerate adoption of thorough assurance practices.
Speaker: Stephanie Ifayemi
Professionalize AI assurance through accreditation, skill standards, and career pathways for assurance practitioners.
Trust in assessors depends on recognized qualifications and standards.
Speaker: Stephanie Ifayemi
Evaluate the effectiveness of sandbox approaches (e.g., Singapore‑Google sandbox) for testing agentic AI in government contexts.
Sandbox pilots provide practical insights but need systematic evaluation to scale.
Speaker: Josephine Teo
Assess the impact of agentic AI on existing regulatory frameworks and determine needed regulatory adaptations from reactive to proactive models.
Current regulations may be insufficient for autonomous agents; proactive governance is required.
Speaker: Josephine Teo

Disclaimer: This is not an official session record. DiploAI generates these resources from audiovisual recordings, and they are presented as-is, including potential errors. Due to logistical challenges, such as discrepancies in audio/video or transcripts, names may be misspelled. We strive for accuracy to the best of our ability.