Ensuring Safe AI_ Monitoring Agents to Bridge the Global Assurance Gap
20 Feb 2026 14:00h - 15:00h
Ensuring Safe AI_ Monitoring Agents to Bridge the Global Assurance Gap
Summary
The summit opened with Rebecca Finlay emphasizing that the newly adopted Delhi Declaration provides a pivotal moment to advance trustworthy, responsible, and beneficial AI, stressing the need for diverse Indian voices in shaping accountability and policy frameworks [1-3][5]. She introduced two new Partnership on AI resources-“Strengthening the AI Assurance Ecosystem” and a paper on global AI assurance-designed to help national policymakers embed robust assurance strategies alongside industrial AI plans and to bridge the assurance gap between the Global North and South [14-16][21-23][24-26][30-33]. Early commitments to share usage data and to enhance multilingual and use-case evaluations were highlighted as concrete steps linked to the forthcoming 2025 foundation-model impact report [25-33][27-30].
Minister Josephine Teo warned that the rapid rise of autonomous agentic systems introduces new risks such as malfunction and diminished human oversight, calling for a shift from reactive regulation to proactive governance [50-58][59-62]. Singapore is piloting a sandbox partnership with Google to test agentic AI and has released a living model-governance framework that invites industry feedback to ensure safe public-service deployment [68-71][74-78]. She outlined three pillars of an assurance ecosystem-rigorous testing, development of standards, and independent third-party verification-to build confidence and give companies a strategic advantage [97-109][110-112].
Moderator Madhu Srikumar defined AI assurance as the measurement, evaluation, and communication of trustworthiness, likening it to an independent safety inspection and tying it to the Delhi commitments on multilingual and contextual evaluations [124-132]. Frederic Werner stressed that trust is a major barrier for scaling AI-for-Good use cases worldwide and highlighted the need for standards that embed safety, human-rights, and inclusivity, especially given the 2.6 billion people still offline [145-166][170-176]. Owen Larter described agentic AI as increasingly autonomous “goal-achieving” systems, announced Google DeepMind’s agents-to-agents and universal commerce protocols, and warned that security risks require robust testing, malware scanning, and third-party assurance [186-204][222-223]. Vukosi Marivate pointed out that language diversity and limited local capacity in the Global South mean assurance frameworks must be locally understood and supported, lest they impose top-down solutions misaligned with regional values [231-240][241-242]. Stephanie Ifayemi presented PAI’s two papers, identifying six challenge areas-including infrastructure, skills, language coverage, and differing risk profiles-and argued that north-south collaboration and tiered, use-case-specific assurance are essential to close the global divide [254-280][286-291].
Natasha Crampton called for AI assurance to become an operational discipline embedded throughout the system lifecycle, emphasizing continuous post-deployment monitoring, interoperability of evaluation signals across languages, and shared infrastructure to prevent widening gaps as agents proliferate [412-421][428-437]. Chris Meserole concluded that evolving assurance practices, fostering global standards, and treating assurance as a shared responsibility are critical next steps, urging participants to download the new reports and actively contribute to building a robust, inclusive assurance ecosystem [448-456][462-465].
Keypoints
Major discussion points
– Building an AI-assurance ecosystem for agentic systems – The panel repeatedly stressed that trustworthy autonomous agents require a dedicated assurance framework that includes rigorous testing, clear standards, and independent third-party verification. Josephine Teo outlined the three pillars (testing, standards, third-party attestations) [97-109] and the opening remarks framed the need to “think about all those actors” and apply assurance to agents [35-40].
– Closing the global assurance divide – Participants highlighted that current assurance practices are uneven, especially for the Global South, where language diversity, infrastructure gaps, and limited expertise hinder effective evaluation. Madhu pointed to the 2.6 billion people offline and the need to use AI to bridge that gap [173-176]; Vukosi described the massive multilingual landscape and limited policy capacity in many countries [231-240]; Stephanie listed six challenge areas (infrastructure, skills, languages, risk profiles, etc.) that keep the divide open [259-277]; Natasha emphasised that without deliberate action the shift to agents will widen the gap [413-418].
– Proactive, collaborative governance between government and industry – Singapore’s approach was presented as a model of “test-first” regulation: a government-led sandbox with Google, a living model-governance framework for agents, and ongoing feedback loops with industry partners. The sandbox arrangement with Google [69-73] and the “model governance framework… a live document” [74-78] illustrate this partnership-driven, forward-looking stance.
– Standards, interoperability, and shared responsibility – Multiple speakers called for common technical protocols and global standards to enable agents to interact safely across borders, and for multilateral institutions to coordinate these efforts. Owen described the “agents-to-agents” and “universal commerce” protocols and the need for assurance standards [187-205][198-210]; Madhu’s rapid-fire question asked what role the ITU should play [302-306]; Frederic stressed inclusive, multi-stakeholder collaboration through AI for Good [307-314]; Chris summarised the three themes of evolving assurance, global effort, and shared responsibility [447-455].
Overall purpose / goal
The discussion aimed to catalyse concrete action on AI assurance-especially for emerging autonomous agents-by presenting new PAI resources, diagnosing gaps (technical, linguistic, infrastructural, and governance), and rallying a diverse set of stakeholders (governments, industry, standards bodies, and civil society) to co-create a global, inclusive assurance ecosystem that can keep pace with rapid AI advances.
Overall tone and its evolution
– The session opened with a formal and optimistic tone, celebrating the Delhi Declaration and the launch of new papers [1-8][14-18].
– As the conversation moved to agentic AI, the tone became cautiously urgent, highlighting novel risks and the need for proactive regulation [50-58][59-66].
– When addressing the Global South and the assurance divide, the tone shifted to empathetic and problem-solving, acknowledging disparities and calling for capacity-building [173-176][231-240][259-277].
– The latter part of the panel adopted a collaborative and motivational tone, focusing on concrete standards, partnerships, and a call-to-action for all participants to “get involved” and build assurance as shared infrastructure [302-306][447-455][440-443].
Overall, the discussion moved from introductory enthusiasm, through a sober assessment of risks and inequities, to a forward-looking, collective commitment to develop and operationalise trustworthy AI assurance worldwide.
Speakers
– Madhu Srikumar – Role: Moderator (panel moderator); Expertise: AI assurance, policy discussion.
– Frederic Werner – Role: Chief of Strategic Engagement Department, International Telecommunication Union (ITU) [S4]; Expertise: AI for Good, AI governance, standards development.
– Chris Meserole – Role: Executive Director, Frontier Model Forum [S7]; CEO, Frontier Model Forum (FMF) [S8]; Expertise: Frontier AI safety and security, policy coordination.
– Rebecca Finlay – Role: Representative, Partnership on AI (PAI); Expertise: AI assurance ecosystem, policy, responsible AI.
– Owen Larter – Role: Representative, Google DeepMind (AI research & safety); Expertise: Agentic AI, standards, safety research.
– Stephanie Ifayemi – Role: Staff, Partnership on AI (PAI); Expertise: Global AI assurance divide, AI insurance, assurance frameworks.
– Vukosi Marivate – Role: Founder, Masakane (African Language NLP); AI researcher focused on African language technologies; Expertise: AI assurance in the Global South, multilingual NLP.
– Natasha Crampton – Role: Chief Responsible AI Officer, Microsoft [S17]; Expertise: Responsible AI, AI assurance, agentic systems.
– Josephine Teo – Role: Minister, Singapore (Minister for Communications and Information); Expertise: Government AI policy, AI assurance, governance of agentic AI.
Additional speakers:
– Rameca – No role or area of expertise identified in the transcript.
The session opened with Rebecca Finlay emphasizing that the newly adopted Delhi Declaration represents a “pivotal moment” for trustworthy, responsible and beneficial AI, bringing together “a whole set of voices and perspectives and leadership that is not optional” in India [1-4]. She announced that the Partnership on AI (PAI) has released two new resources: Strengthening the AI Assurance Ecosystem and a paper on global AI assurance [14-18][21-23]. Both papers are available via QR codes for participants to download and discuss with the authors [12-13][14]. Finlay linked the Declaration’s first two commitments to concrete actions: a 2025 foundation-model impact report that will require frontier-AI firms to share usage data, and a new commitment to “strengthening multilingual and use-case evaluations” that will guide future policy [25-33][27-30].
The moderator then introduced the panel and shifted the focus to autonomous “agentic” AI, noting that the assurance question must now be applied to agents because “that’s where the world is going” [35-40].
Josephine Teo, Minister for Communications and Information, described the rapid emergence of agentic AI-from “not a thing” a year ago to a driver of productivity gains-while warning that their autonomy “introduces new risk” and can erode human oversight [50-58]. She called for a move from “reactive regulation” to “proactive preparation” [59-63] and highlighted Singapore’s sandbox partnership with Google, which lets the government “eat our own dog food” and build credibility before wider deployment [71-73]. Teo presented a “living” model-governance framework for agentic AI that invites industry feedback and aims to build confidence among boards, customers and other stakeholders [74-81]. Central to her vision is a three-pillar assurance ecosystem-rigorous technical testing, enforceable standards, and independent third-party verification-drawn as an analogy to safety regimes in aviation and healthcare [97-109][110-112].
Madhu Srikumar, the moderator, then defined AI assurance as “the process of measuring, evaluating, and communicating whether AI systems are trustworthy”, likening it to an independent safety inspection that goes beyond the builder’s own assurances [124-132]. She connected this definition to the Delhi Declaration’s commitment to multilingual and contextual evaluations, framing the panel’s purpose as assessing whether the global community is equipped to deliver on that promise [133-138][141-144].
Frederic Werner (AI for Good) highlighted that trust is a major barrier to scaling high-impact use cases such as affordable health-care, education and disaster response [153-158]. He stressed that standards must embed “common-sense things” – safety, human-rights and inclusivity – especially because 2.6 billion people remain offline, and AI could help remove language and literacy frictions only if accompanied by skilling and locally relevant content [170-176][145-166].
Owen Larter (Google DeepMind) described agentic AI as “more autonomous systems that … achieve goals”, giving examples such as a suit-dry-cleaning agent [186-190]. He announced the development of technical protocols-the “agents-to-agents” protocol and a “universal commerce” protocol-to enable interoperable communication between agents and web services, likening them to early internet standards like HTTP [200-208][202-208]. Larter warned of security challenges, noting collaborations with VirusTotal to scan downloaded skills for malware and the need for “cheap, efficient models” (e.g., flash models) to support both deployment and rigorous testing [222-227][351-354].
Vukosi Marivate drew attention to the Global South, pointing out that India alone has over 120 languages and 19 500 dialects, while Africa has thousands more [262-264]. He argued that assurance frameworks must be “locally understood” and that policymakers need the capacity to monitor systems; otherwise, “top-down” solutions risk misaligning with regional values [237-242][231-240]. Marivate warned that without such capacity, “the last piece … will be the capacity and the capabilities of the policymakers” [237-241].
Stephanie Ifayemi summarized the two PAI papers and identified six challenge areas that keep the assurance divide open: infrastructure, skills, language coverage, divergent risk profiles, documentation, and incentive mechanisms [259-267][274-280]. She cited the Stanford Helm evaluation resource requirement-12 billion tokens and 19 500 GPU-hours-as an illustration of the infrastructure barrier [291-299]. She also noted the UK AI Safety Institute’s inaugural $100 million fund as an example of incentive mechanisms [300-306]. Ifayemi highlighted the multilingual evaluation commitment in the Delhi Declaration and called for “north-south collaboration” to ensure Global South countries are not left out of emerging standards on agents [291-299]. She advocated a tiered assurance approach, matching the level of scrutiny to the stakes of a use-case (e.g., finance versus health) and linking assurance to insurance products and professional accreditation [363-376][384-394].
Rapid-fire segment
The moderator posed quick questions to each panelist:
* Frederic Werner reiterated the role of multilateral bodies-ITU, AI for Good, and others-in fostering inclusive assurance frameworks [307-312].
* Vukosi Marivate critiqued Singapore’s “test-once-comply-globally” model, warning that it could overlook local linguistic and policy capacities, and emphasized the challenge of scaling evaluations and providing user-level personalization [324-341].
* Owen Larter suggested establishing a “Frontier Labs” initiative to improve global access to multilingual, low-cost models and to ensure third-party security review of agentic skills [350-357].
* Stephanie Ifayemi outlined concrete outcomes for the next 12 months: changing incentive structures, creating professional accreditation pathways, and implementing tiered assurance linked to insurance [363-376][384-394].
Closing remarks
Natasha Crampton reinforced that AI assurance must become an “operational discipline” embedded throughout the system development lifecycle, not merely a post-hoc check [425-428]. For agentic systems, she stressed that “post-deployment testing … takes on an even greater level of importance”, requiring continuous monitoring, real-time failure detection, and clear accountability [419-422]. Crampton called for interoperable evaluation signals that work across languages and cultures, and for shared infrastructure-including taxonomies and capacity-building investments-to prevent the agentic shift from widening existing gaps [428-437][430-434].
Chris Meserole synthesized three overarching themes: (1) the need to evolve assurance practices for multi-agent environments, (2) the necessity of a truly global, collaborative effort, and (3) the imperative that assurance be a shared responsibility across governments, industry and civil society [447-455]. He urged participants to “download the reports” and join ongoing initiatives, framing the earlier “seed-planting” metaphor as a call to “roll up our sleeves and get to work” [462-465].
Consensus and points of tension
Across the discussion, participants agreed that a robust AI-assurance ecosystem rests on three pillars-rigorous testing, clear standards, and independent third-party verification-and that assurance should be defined as an independent trustworthiness audit [97-109][124-132]. They also concurred that multilingual evaluation is a critical challenge, with the Delhi Declaration highlighting the need to address “120 languages and 19 500 dialects” in India and “1 500-3 000 spoken languages” in Africa [262-267][231-241].
Disagreements emerged around implementation pathways:
* Top-down vs. local capacity – Vukosi questioned Singapore’s “test-once-and-comply-globally” model, warning it could ignore local linguistic and policy capacities [324-341].
* Standards development locus – Frederic advocated for multilateral bodies to lead inclusive global assurance, whereas Owen emphasized industry-driven protocols such as agents-to-agents [307-312][202-208].
* Pre-deployment vs. continuous monitoring – Teo’s three-pillar model emphasized testing and standards, while Crampton argued that “continuous monitoring … is even more important” for autonomous agents [97-109][419-422].
Key take-aways
1. The Delhi Declaration establishes concrete commitments on usage-data sharing and multilingual evaluation.
2. AI assurance is an independent, systematic verification of safety, reliability and trustworthiness.
3. Agentic AI raises heightened risk, demanding proactive sandboxes, living governance frameworks and continuous monitoring.
4. A functional assurance ecosystem must combine rigorous testing, enforceable standards and independent third-party auditors.
5. Global inclusion requires addressing language diversity, building local capacity and avoiding top-down imposition.
6. Technical interoperability (agents-to-agents, universal commerce) and security (malware scanning, low-cost models) are essential for a safe agentic economy.
7. Closing the assurance divide involves tackling infrastructure, skills, language, risk-profile, documentation and incentive gaps.
8. Collaboration across governments, multilateral institutions, industry and civil society is required, treating assurance as shared infrastructure built into the AI lifecycle [363-376][425-434].
Unresolved issues
* Designing scalable multilingual evaluation methodologies.
* Funding and providing compute resources for assurance in low-resource settings.
* Establishing third-party assurance providers and accreditation pathways in the Global South.
* Defining mechanisms for real-time post-deployment monitoring of agents.
* Balancing proactive government sandboxes with industry-led self-assessment.
Suggested compromises include a tiered assurance model that aligns scrutiny with risk, combining sandbox experimentation with independent audits, and developing modular standards that allow regions to adopt core safety components while adding local language and risk extensions [384-394].
Overall, the panel moved from an optimistic opening about the Delhi Declaration, through a sober appraisal of the novel risks posed by agentic AI, to a collaborative call-to-action emphasizing concrete standards, capacity-building and shared responsibility. The convergence of viewpoints provides a solid foundation for next steps: disseminating the two PAI papers, expanding Singapore’s sandbox experience, advancing open technical protocols, and mobilising multilateral bodies such as the ITU to ensure that AI assurance becomes a globally inclusive, interoperable infrastructure that enables trust and adoption rather than hindering innovation.
in 19 -ish countries, and we’re all focused on what does it mean to unlock innovation through trustworthy, responsible, beneficial AI. And so, of course, no surprise, gatherings like the one that we’ve had this week are really crucial for the work we do, and with the Delhi Declaration adopted yesterday, this is an even more important moment to build on where we have come from, to lean in, and to really get to work around some of the questions of the accountability work that needs to be done, the scientific evidence that we need to build around frameworks and good policy moving forward. And, of course, it’s extraordinarily important that this is happening in India, that it’s bringing a whole set of voices and perspectives and leadership that is not optional.
At PAI, we believe… We believe that that is fundamental to building a global community committed to this work, and it’s great… to see it in action this week. So thank you all for being here with us. So today we’re going to give you an opportunity to see two of our latest papers. These are papers that were begun out of the Paris Action Summit. And at that time, as we were thinking about moving into action and invasion, we felt that work needed to happen with a good sense of what the assurance ecosystem looked like. So we’ve had working groups underway developing these two new resources. They’ll be up on the screen at some point. You’ll be able to get a QR code and download them.
Feel free to talk to any of us. The first one is Strengthening the AI Assurance Ecosystem. It really looks at telling and helping national policymakers, if you’re building a robust industrial AI strategy, you better have a comprehensive AI assurance strategy as well. And you need to be able to do that. And so we’re going to be talking about that. We need to think about all those actors and what they look like. We’re going to hear about one of the experts, of course, in this as soon as the minister comes to join us. The second piece, which is really important, we think, for this conversation is what does it mean to do AI assurance? globally around the world?
How do we close the divide that exists? What is different about the challenges faced by countries in the Global South versus others? So we’re really hoping that these resources not only are good, substantive contributions to the work that needs to be done, but the idea is to just catalyze, you know, sort of plant a number of seeds across a number of ways in which assurance works so that those can grow and really come to life out of this. And just two quick comments on that. Now that we have half the declaration, and so now we can, as opposed to earlier in the week, start to articulate it, really leaning in with regard to the commitments around, in commitment one, around usage, clarity around usage data, really trying to give some empirical grounding to this work.
In 2025, in our progress report around foundation model, impact. We made exactly this recommendation. We directly called for Frontier AI companies to share usage data. We’ve been tracking progress, and there has been some progress in that regard. So we are delighted to see this particular commitment to come about and to start to see some standards about how that usage data is going to be shared. So we’re very pleased to see that work. We’re also very pleased to see the second commitment around strengthening multilingual and use case evaluations. And you’ll see, if you do download the report on the global assurance divide, that that is clearly a key piece of work that needs to happen. So this afternoon, we are going to give you an extraordinarily expert panel that brings a real diversity of perspectives to this work.
And so we want to take the assurance question and apply it to agents. Because that’s where the world is going. We’re all seeing them in the news every day. We’re seeing them integrated into foundation model systems. So what does it mean? to take what we know about assurance and think about the applications that agents will add to the complexity of that work. So let me begin by introducing our first speaker. She’s probably been one of the most visible ministers this week because of the extraordinary leadership that Singapore has taken when we think about AI assurance. I know you’re going to talk a little bit about that. Such a pleasure to welcome you, Minister Josephine Teo.
She’s going to come and say some words for us before the panel begins. Thank you.
Thank you very much, Rebecca, and also very much appreciate Partnership on AI for the invitation. When this series of summits first began in Bletchley, AI agents were not a thing. Nobody was talking about them, even just 12 months ago. When we had the AI Action Summit in Paris, it has barely crept into the conversation at the time. the preoccupation was all around DeepSeq and what it told us about the capabilities that is emerging out of China. But today, as Rebecca correctly identified, agentic systems have taken off. They are increasingly being used and we need to have a better grasp on how to deal with this issue because agentic AI certainly offers transformative possibilities in how we delegate and orchestrate work when deployed strategically.
Agents functions as invaluable teammates, unlocking productivity gains and time savings, which we all want more of. However, I should also add that this autonomy, the very nature of how agents can be helpful to us is autonomy. This autonomy also introduces new risk. The potential for harm increases when systems malfunction and human oversight is normalized. We are no longer present. or at least diminish to a very large extent. The implications may be complex and not fully predictable. So the way my colleagues and I have been thinking about this is that there needs to be a shift. There needs to be a shift in terms of how we might want to rely on reactive regulation to a different kind of stance, which is proactive preparation.
And in Singapore, that’s what we’ve been trying to do. We’ve tried to be proactive about governing the new risks in the era of agentic AI. And I think it starts with the government itself being a leader and not a laggard in using agentic AI. We need to test it. We need to look at how the solutions can not only enhance public service delivery, But we also need to be able to put in place more controls. Government is high risk because the touch point with citizens are very sensitive. No citizen and no government wants to make serious mistakes when they interact with their citizens, telling them things about their health, telling things about their social security, telling them about things to do with their benefits that are not accurate, and having them not just being told but acted upon.
So this need to ensure that we know what we’re doing is a very high one. And the way we are also thinking about it is to try and work with industry. So, for example, between Google and Singapore government, we have a sandbox on agentic AI. It’s one of the ways. We think we can, in a way, eat our own dog food. Try it. You know, does it taste all right? hurt us in a very significant way because if we were not able to do so, I don’t think we have a lot of credibility in terms of how we want to govern agentic AI. But we can’t wait, you know, for the dog food to materialize in its consequences for ourselves.
In the meantime, my colleagues have put together a model governance framework for agentic AI. It is meant to provide practical support to enterprises so that they can also deploy autonomous agents responsibly and to mitigate the risk. We know that this is not a complete solution and this document that we put out has to be a live document. We very much encourage feedback and as a way for us to keep improving the guidance to enterprises. Can I also just add that as we do this work, what is the… meaning and what is the purpose behind it. Ultimately, it is to build confidence in the use of agentic AI systems. And we think that at many levels, this confidence has to be presented, has to be demonstrated to boards of organizations, to customers, to other stakeholders.
And how do we demonstrate that the risks have been managed well? And that is where the assurance ecosystem that Rebecca talks about comes in. It is an absolutely essential part of building trust over the medium to longer term so that there is a way, a foundation upon which agentic AI systems can be made more readily adopted and available. I should also say that for companies that are thinking about it, and I see Microsoft here, and I’m sure that there are other companies represented. If we are to trust these agentic systems, the safety aspects should not be downplayed. And I would venture to say that a company that is able to give a high assurance on safety will find itself being differentiated from their competitor.
It’s more likely to translate into stronger interest in a product and service. So rather than think of it as something that you are unhappy to comply with, think of it as a strategic competitive advantage. And that is a way I think that will give us the confidence to put it forward. The question, however, is that are we completely without experience in this regard? And the answer is no. In aviation and healthcare, there are a lot of measures being put in place to give assurance to passengers. When we board a plane, we usually expect to arrive. when we visit the hospital, we generally expect to be treated, except for disease conditions that are not yet well understood.
But the trust in these systems have to be built over time, and they don’t come without some assurance being put in place. The question is for AI, and specifically agentic AI, what would be the components? What leads to an assurance ecosystem system that would be robust enough? We think that there are at least three components. The first is that there must be testing. We need some way of making sure that there are technical assessments of the system to make sure that the systems are robust, they are reliable, and they’re safe. And a lot more work needs to be done in this space, developing the testing methodology, building the testing datasets, and also making sure that the testing of agentic systems take into account that these systems are robust.
These systems are going to be much more complex than multi -agents, for example, and it’s not just the output, but the in -between steps, how the reasoning takes place, and what is the orchestration that is being built into the GenTech systems. So that’s the first, testing. Second is that eventually we will need standards. We cannot just define what is good enough. We also need to assure the users that it has met expectations in safety and reliability, and so these are still very early days. Thirdly, we think that this ecosystem cannot do without third -party assurance providers. It’s one thing to claim that your agentic AI system is safe. It’s another thing to have someone attest to the safety of it.
So these could be technical testers, auditors, and they provide independence, augment in -house capabilities, and also help to identify the blind spots, and it’s necessary for us to strengthen this pool as well. So I’m going to stop here. I want to conclude my remarks to say that Singapore is actively building these components. and we welcome conversations with partners and colleagues because we know that we cannot do this alone. So we look forward to discussions in the three panels on how we can meaningfully collaborate on assurance for agentic AI. Thank you very much once again, Rebecca.
Thank you. Thank you. We’re all here. It’s the end of the conference, and we’re all intact. Thank you so much, everyone, for joining us. Thank you, Minister Teo, for the keynote. One quick note before we dive in. Our panelist, Fred, has a flight to catch, so he’ll need to slip away a few minutes early, but, Fred, we’ll make sure we get your best insights before you escape. No pressure. So we are the last session, so we are standing between you and whatever you have planned right after. So I promise we’ll make this worth it. We have an incredible panel and a lot of ground to cover. So before we get started, what do we mean by AI assurance?
Because you’re going to keep hearing that term quite a bit here. So really put simply, AI assurance is the process of measuring, evaluating, and communicating whether AI systems are trustworthy. Are they safe? Do they work as intended? Can the public actually trust them? So really think of it like a safety inspection, but for AI. You wouldn’t want, you know, you’d want an independent inspector checking a building. Not just the builder saying, trust me, it’s fine. So really, AI assurance is about independent verification, as Minister Teo went over. And why this panel? Why now? So the summit unveiled the New Delhi Frontier AI commitments just yesterday. And the second of those commitments is about strengthening multilingual and contextual evaluations.
So really making sure AI systems work across languages, cultures, and real world conditions. And really, that’s the assurance challenge in a nutshell. And our panel today is about whether we are actually equipped to deliver on that promise globally and not just in a handful of countries. So really, our panelists span the ITU, Google DeepMind, the University of Pretoria, and PAI. So we have the range to actually wrestle with this question. So with that, I’m going to get into our first question for today. Fred, that’s going to be you. ITU has been convening on AI governance through AI for Good and working on standards across borders. So really, when we talk about AI assurance, what does it mean to you, ensuring that these systems are safe and trusted?
And how do we think about assurance when 2 .6 billion people remain offline and may be excluded from the frameworks being designed?
Yeah, thanks for that great question, and thanks for having me here. So I think that safe to save is no. There’s a huge shortage of high -potential AI for Good use cases, everything from affordable health care to education for all. food security, disaster response, and also looking at more applications in the physical manifestations of AI that you see in robotics, embodied AI, brain -computer interface technologies. The best part of my job at AI for Good is I see these use cases coming across my desk every day. And I can tell you when we started AI for Good in 2017, it was mainly in PowerPoint slides. They didn’t really exist. But as we got into, say, the 2023 with GenAI, last year, the unofficial theme of AI for Good was the rise of the AI agents, a bit scary, Terminator -like, but that’s what people were talking about.
And we’re really going from sort of the promise to the pilots to the use cases and now scaling. Now, when you’re looking at these use cases, I think one big challenge is trust. How do you trust them? I mean, there’s always the good intention, right? But is that trust there? And also, are they replicable and scalable? And I’ve yet to see, you know, high potential use case developed in Brussels work equally well in Johannesburg and Shenzhen and maybe Panama. Like, it’s just, we haven’t really reached that yet. And if you look at these sort of fast -emerging governance frameworks around the world, whether you’re in the U .S. or EU or China or everything in between, I think there’s a lot of good intentions, a lot of good thinking.
But how do you turn those ambitious words and principles into actions? Because the devil is in the details, and I think standards have details. So when you’re thinking about how do you – especially when you start to get into AI agents and you really – that trust element is becoming ever more critical, how can you bake in a lot of the common sense things that we’ve been talking about all week or even for the past years at AI for Good? Are they trustworthy? Are they verifiable? Are they secure? Are they safe? Are they designed with human rights principles in mind? Are they inclusive? Are people from the global south appetizing? Are they able when we’re drafting and developing these standards?
So these are not always natural reflexes, and at the same time, it’s hard to turn words into action. So one of the tools, I’m not saying it’s the only tool, but I think as these solutions start to scale and businesses start to interact internationally or even internationally, at one point you’re going to need standards, and it’s within those standards that you can kind of bake in those common sense principles that we’ve been all talking about. And I forget the last part of your question. It was really a question about… Oh, connectivity. That was it, yes. …2 .6 billion people who remain offline, yeah. Yeah. Yeah, so, you know, ITU’s mission is connecting the world, and a third of the world is still offline.
And, you know, large parts of the world actually have connectivity, but there’s actually no incentive to connect. So if there’s no content in your local language or dialect or no access to government services or useful applications that are fit for purpose in where you live… you know there’s why would you connect so i i think ai can actually help to remove that friction where you have a lot of bottlenecks for example literacy disabilities again like content in your own language or dialect so i think one thing is closing the connectivity gap but the other thing is actually using ai to remove that friction and the last thing i would say is i think sometimes there’s a comparison where um if you take east africa for example and you have the the mobile payment miracle or revolution with mpeza right you effectively leapfrog decades of infrastructure legacy infrastructure and there may be a kind of optimism that well the same thing could happen with ai in the global south maybe but i don’t think we can take it for granted that if that happens it goes in the right direction it’s not a guarantee that just by putting the tool in the hands of the people that they’re going to create value they’re going to use it responsibly they’re going to use it to solve local challenges build more cohesion and community, but those aren’t for granted.
So I think that whole AI skilling angle of really educating people from grade school to grad school to diplomats and everyone in between, if you don’t address that literacy piece, then it’s just going to be a crapshoot. We’re not sure
Great. I mean, it’s a good transition. Speaking of standards, Owen, Google DeepMind recently deepened its partnership with the UK AI Security Institute on safety research, so including work on monitoring chain of thought and evaluations. So really from an industry perspective, you know, what does robust AI assurance look like? Where do you think the gaps and opportunities are between what Frontier Labs kind of do internally and what’s needed for broader public trust?
Yeah, thank you, Madhu. And thank you to Rebecca and Partnership on AI for convening this really important conversation. And a big congratulations to our Indian hosts for a fantastic week at the summit. This week, maybe start talking a little bit about what… agents are, we’re increasingly excited about them at Google DeepMind. They’re essentially more autonomous systems that instead of just following basic instructions can actually achieve goals. So let’s say I want to get my suit dry cleaned on Thursday, instead of taking an AI system and say, find a website for a dry cleaning company, see if it’s open on Thursday, see what the hours are, see if it’s within my budget. You can just say to your agentic system, go find a way to dry clean my suit, make sure it’s being picked up by Friday, and it will go and interact with those different websites and try and find a way to meet your goals.
All kinds of fantastic applications already that we’re seeing right across the economy. We’re using increasingly agentic coding systems at Google and Google DeepMind to do a lot of our coding. So we have our anti -gravity framework, which is fantastic. You can interact with it in normal, natural language and say, build me a website, build me a tracking system to follow a particular bill that I’m interested in, and it will really help you achieve these goals. I think you’ll increasingly see agents used right across the economy as well. I think we’re just in the early years of a new AI enabled agentic economy. I think you will have very normal interactions with agents on a regular basis that will pop up on your phone screen and say, hey, it’s been a few weeks since you bought toothpaste.
Would you like me to go and take care of you and get some more toothpaste for you? You mentioned standards, which I think is going to be a critical part of getting all of this right. There’s a couple of dimensions to the standards. So firstly, we need to create the sort of technical protocols to actually underpin this agentic economy. So we’ve been trying to contribute to this conversation. There is the agents to agents protocol that Google has launched. There’s the universal commerce protocol. This is basically a way of helping agents talk to each other and agents talk to websites so that you have standardized sets of information. An agent will basically come to an agent or an agent will come to a website and say, this is my ID.
These are my capabilities. These are what I’m trying to do. I think in the same way that we developed protocols and standards in the early 90s to underpin the internet like HTTP, like URL, we’re going to have to build these out. There are then also assurance standards, which are related, but I think very important as well. We need to make sure that we’re understanding the capabilities of these systems. We need to keep making progress on how we can test for the risks that they may pose and then work right across society to come up with ways to mitigate that. I think the work that the safety and security institutes are doing around the world is absolutely critical.
So Minister Teo mentioned some of the work that we’re doing in Singapore. The UK Security Institute has been world leading on this. I think this is an area that we’re going to see more from the ACs and KCs right across the world. The US government also, through their KC, launching an agent standards initiative this week as well.
Great. And if you don’t mind a follow up question, that’s a really important point that you pointed out, that we currently need interoperability. We need agents to flourish. We need to find a new way to kind of imagine this paradigm. But I’m curious if there’s a safety challenge when it comes to agents. Instead. yeah that keeps you up at night
yeah i think there are definitely risks to be mindful of so i think agent security is something that we should all be thinking a lot about if we’re connecting increasingly autonomous systems into different accounts different email accounts different bank accounts i think we want to be pretty careful about how we do that and come up with superior security protocols and that can be helpful there we’ve actually been doing some work with virus total which is part of the the google security operations team at google to make sure that when certain agentic systems are downloading skills or downloading apps from agentic websites they’re being scanned for malware or vulnerabilities that are being detected so that they can be addressed before people put them onto their their computer i think there’s also a concern that these agentic systems could create new capabilities that could be misused so across the cyber security dimension domain for example i think some of the frameworks that we have already at google deep mind will be helpful here so we have our frontier safety framework which we use to test models before we put them out into the real world.
We think about how those models are going to interact with systems, how they might be parts of agents as we’re doing that work.
All right. Just speaking for myself, I can’t wait to use agents. I feel like it’s a lot of developer communities that have, you know, started playing around with these systems. But I imagine it’s reaching lay consumers very soon. So, Vukosi, you have built Masakane for African Language NLP. Really building AI for Africans by Africans. When assurance frameworks are designed in the U .S., U .K., or Singapore, how well do they translate to context where the data, the languages, the deployment conditions are completely different? What do we think we’re missing?
that we do get to understand that it’s a very different thing. My experience has been that there’s likely not as much collection in Europe or North America or annotation as much as is happening now in the global south. But then that also means that it feels like it’s further away, right? It’s not where the developers are. And that then requires more of this conversation in one place. So that, again, there must be kind of a local understanding. The last piece to that is going to be the capacity and the capabilities of then the policymakers in those countries to be able to understand that part. It will not be top -down. I don’t believe that. It will be them understanding whether it’s labor laws, it’s data governance, it’s just monitoring of systems once they’re on.
If there is not that capacity or capability to actually do those things, again, it’s more automated. direction that is not necessarily what the values of those people actually are.
Those are important words right at the end of the conference, knowing just how much we have to get done here. So Steph, over to you. PAI just released work on closing the global assurance divide, a lot of what Bukosi just mentioned. What are the concrete gaps you’re identifying? Identifying? Is it capacity to conduct third -party evaluations, as Minister Teo mentioned? Is it access to the models being tested, or is it something else? What would it take to really close those gaps?
Awesome. Thanks so much, Maru. And as one of the PAI folks, thanks for being here, everyone. It’s great to see you all. I know it’s a Friday evening, so we’re in between you and cocktails or whatever you have planned, so we very much appreciate it in the last session of the day. So I think it’s such a good question, and I think your question talks about some things that recognize that those challenges aren’t actually just Global South Challenge. I just want to start with the fact that we’ve released two papers. One is on closing the assurance divide, and the other is how we strengthen the global assurance ecosystem generally. And the question of access is one that impacts us all, actually.
In the UK, for example, the Department of Science, Innovation and Technology, I believe that’s what DSET stands for, has made access to models as a means to support insurance a priority for 2026. And so I think that there are a few shared challenges, and I’ll come back to the point around north -south, actually, collaboration in a second. But just thinking about closing the AI insurance divide, we released this paper, and in it we talk about around six challenge areas, from infrastructure to skills. We talk about languages and risk profiles, so the things that you’ve heard about from Vukosi and a lot of the other speakers. So I’ll give you a sense of some of the examples that we have.
So on language, we’re at the India Summit, of course, and India has over, I believe, 120 languages and 19 ,500 dialects. When we think about Africa, we have about… 1 ,500. or 3 ,000 spoken languages in itself. So when we think about benchmarking and evals and designing evals that think about how those systems are deployed in these various contexts, it’s so important to think about languages, and that just generally, I think, demonstrates the complexity of designing evals to meet the needs of this kind of diverse language ecosystem. Rebecca mentioned at the start that we had the declaration, of course, yesterday, and the commitment therefore in the declaration to multilingual evals is really critical. Of course, there’s still a lot of work to determine how do we actually do that in practice in the most effective way, and accounting for that complex and wide language diversity, but that’s one area that we talk about.
The second in terms of closing the assurance divide that we need to account for is risk profile, interestingly. in this paper, we actually interviewed a lot of assurance and safety experts internationally. And one of the things that they mentioned was differences in what they might prioritize when you think about assurance. So when you think about the Pacific Island nations, for example, they would be thinking about assuring for environmental impacts differently than maybe environmental impacts would be considered as important in the US at the moment, for example. Last year, we published a paper on post -appointment monitoring. And in that paper, we talk about sharing kind of data from companies. And one of the points that we talk about is environmental impacts.
And so it’s really interesting that I think in terms of closing the divide, it might the starting point or what you put emphasis on might vary. And that’s important to note as we’re designing things like documentation, description, and so on. And so I think it’s really interesting to see what we’ve kind of focused on. The third I’ll just quickly mention is, of course, infrastructure. I think we’ve probably all heard a lot about this throughout the summit and this idea of what it means to be sovereign and which parts of the stack to prioritize. And that is really, really important. But there are tradeoffs. So in terms of importance, I was looking at a stat that Stanford’s Helm evaluations used over 12 billion tokens and they required 19 ,500 GPU hours alone.
And so when you think about the kind of infrastructural needs, it’s so it creates barriers for a lot of countries in the global south. But I was at an interesting roundtable, actually, that even Carnegie was convening. And we were talking about the fact that how do you balance assurance needs? Where do you start from across the value chain? So at the moment, a lot of the discussion is kind of upstream. Right. We need to have that infrastructure in place. That’s the point that we need to start with. But how do you do that in parallel and how much of that resource should be put into other foundational tools for assurance, such as documentation artifacts, which is another area that we focus on a lot at PAI.
And so I think there will be a lot of questions around how do you weigh up all these challenges, again, knowing that even kind of the G7 countries, the UK AI Safety Institute started with an inaugural $100 million alone. So that prioritization and balancing is going to be important. The last thing I’ll say, coming back to agents, and I will talk about this a bit more, is the North -South collaboration is a real opportunity as we think about agents. And it’s important that global South countries aren’t always playing catch up. I think that’s a point that has come through for me from the summit, which is that NIST or the Casey, so the Center for AI Standards and Innovation.
And this is almost like a test for me of kind of saying. These names of these institutions through this panel. But they just announced a few days ago that they’re going to be working on standardizing work around agents, including that they’ve released an opportunity to comment on a paper around agent attribution and agent identity, I believe, which is really interesting. And there’s, of course, a lot of push for countries to collaborate. And you see a lot of the safety institutes collaborating on questions around assuring agents in the global north. But how do we ensure that global south countries aren’t missing from that? That will have implications for how we attribute agents, how we test agents.
And we shouldn’t just assume, again, whilst those upstream points and infrastructure is important, that in parallel, they’re ultimately part of these kind of thinking ahead questions and frameworks.
Great. So I’m going to take the moderator’s prerogative and have us do a rapid fire. And by rapid fire, I mean every answer is a minute and 30 seconds, which, let’s be honest, is fairly rapid for AI policies. I’m going to start with Fred because I’m more nervous about your flight than perhaps you are. So a minute and 30 seconds. What role should multilateral institutions like ITU play in making globally inclusive AI assurance happen?
Yes, I think AI for Good has a pretty ambitious goal, right? It’s simply put, it’s to unlock AI’s potential to serve humanity. Pretty big. But we can’t do it alone and no one can. It’s not one country and not one institution, not one NGO. That’s why we have 50 plus UN sister agencies as part of AI for Good, but also making great efforts to bring as many diverse voices to the table from the global south, from NGOs, from civil society. It’s always been extremely open. I like to think of it as the Davos of AI, but instead of being very exclusive, it’s extremely inclusive, right? So I think that’s a bit of a philosophy behind AI for Good.
You know, I think the AI, it’s just moving so quick. So the focus has always been on practical applications, practical solutions. But in doing that, you can tease out the next generation of standards, of policy recommendations, of collaboration and partnerships around the world. So I like to think that in the doing, you have the learning, right? And it’s not just about talking. And that’s what AI for Good has always been all about.
Thank you. That was incredible. You have 56 seconds left. So, yeah, I’m going to move us ahead to Vukosi. So Singapore’s aim is test once and comply globally. So from a Global South perspective, what would make that interoperability real rather than a form of exclusion?
Yeah, that’s a hard one. I think going back to I think the other thing that’s come out of a lot of the sessions here has been on the evaluations and how evaluations are used. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. And I think that’s a really important thing. because either on one side it’s going to take you a lot of resources to actually either put up the evaluation to be so all -encompassing on the other side to run it is going to be a lot but then when it comes down to the user which I think was our second panel that I was in this week and you’re trying to think about personalization if you’re going down to an individual what experience do they actually have and how do you get to there?
There will be some more high level safety things that will likely come out and people will be working on that and maybe that’s what I’m thinking Singapore is trying to go for but then when we’re getting to what the individual experience is given that you have the stochastic systems you don’t know what is going to happen necessarily. I know we’re trying to do that but we don’t really know what’s going to happen at the individual experience and we can’t remodel all of that. It’s going to require that again you you do have closer to where the user might be things on what actually that experience was. So one of the hats I wear is I’m a co -founder of Lilapa AI, an AI startup.
And there you will be doing more testing towards, hey, we are serving this client. We’re serving them in this way. And then you’re trying to then go in and say, where is your data coming from? What is the use cases? What are we testing for in terms of their operational kind of requirements? It would necessarily not be just one. But, yes, what you might want is
Yeah, that’s a great point. Assurance needs to be globally decentralized. Owen, given everything we have discussed, what’s one commitment Frontier Labs should make on assurance that would actually move the needle?
yeah good question um i think there’s a question of access to the technology which is important here i think it’s one of the big themes of this conference certainly one of the things that i’ll be taking away so you think the the multilingual part of this is really important understanding respecting local cultures that’s important if you’re going to have a good product and if it’s going to be used broadly um we’ve been investing in gemini for some time now to make it better more representative across different languages we have partnerships that we’re doing here in india including with the iit bombay to to help improve performance across various different indic languages it’s also really important on the safety and security front as well to have benchmarks that are available in different languages fantastic work that ml commons are doing on this front that we’re that we’re pleased to support the other bit of access that i think is really important is having things that are quick and cheap enough for everyone to use one of the things of agentic systems is that they’re actually pretty compute intensive to use we have a range of models that we have developed and bringing to market at google deep mind including our very quick flash models which are relatively cheap, quite efficient, very, very quick.
We think these can play a really important role in powering agentic systems. It’s also going to be really important if we’re going to do effective and rigorous testing of these systems because that could be very compute intensive as well. So thinking about that access piece is something we all need to keep doing. And it’s not an easy question, really. I mean, to do it safely and ensuring that third party assurance providers consider the security questions at hand. And it’s an open
So, Stephanie, no bias at all since we’re both at PAI, but I wanted to give you the final word. What concrete outcomes do you think we want to see from the global AI assurance work in the next 12 months? What would success look like?
So, Owen, now that you said your one point, by the way, can hold you accountable against this delivering on the access question. But. I think we in the two papers, we talk about the need to kind of build a robust assurance ecosystem. And one of those things is changing incentives. So funny enough, another session this week, there was a question about whether we have differences in the way we’re talking about safety over the last few years and whether that we still have those divergences of whether we’ve converged. And there are a few themes that we’ve actually converged on, which is nice. And I think assurance is one of them. And this week, a lot of the discussions we’ve had are in some of those incentive areas like insurance to support assurance.
And so what does that look like? How do we drive new incentives or put some of these structures in place to drive a kind of more mature and robust ecosystem? I think that’s going to be really important. The second is professionalization. There are a lot of questions around how do you trust the assurer? And so how do we ensure that we’re thinking about the skills? What does accreditation look like for assurance organizations or individuals? And so and that will help, I think, questions around kind of access. And so that’s a kind of second piece. But hopefully, I think what we’re what we’re hoping to do. And that’s just because this is also about agents. I think that some of those foundational questions haven’t yet been resolved.
And so I’m hoping that we can move the dial to start thinking about how do you apply that to some of these future questions. So just to shout you out, Madhu. Madhu is the brains behind our safety work. And she came up with a paper on real time failure detection and monitoring of agents. And what I really like about that paper is it talks about a kind of tiered approach to assurance as well. So when you think about agent deployments, do you need to be thinking about assurance based on the risks or the stakes at hand? So is it in the financial services sector? Is it in making about making medical decisions? So how do you tie it as close to the use case and the risks?
And that needs to be also linked to reversibility. What’s the possibility around reversibility of actions and the consequences of that? And then third, we have affordances. What are the kind of affordances you give to the agents? How much autonomy do they have? And so how do you design an assurance ecosystem with all of these different components in mind and a kind of tiered approach? And the more that we can advise, you know, the USKC and a lot of policymakers who clearly are trying to make decisions in this area, I think that’s what success would look like for us.
This was totally not planned. Steph plugging our work here, but I can’t imagine a better note to end on this. It’s a field wide challenge, but I just want to emphasize the field wide opportunity. No, you know, no one single organization can get this right. So hopefully that’s a helpful reminder as we end with this summit and move on to the next iteration. So thank you, everyone. Hope you have a great. safe flight back home. Fred, that’s tonight for you. And for a closing keynote, I’m going to welcome Natasha Crampton, who’s a Chief Responsible AI Officer at Microsoft. And post that, we’ll hear from Chris, who’s the CEO of FMF. Thanks, everyone. Do you want to give it?
Okay, so we’re going to get mementos. Sorry, you might want to come back. You don’t want to miss this. Thank you very much.
Thanks so much, Madhu, and to all of our panellists for what was, I think, a very rich and grounded and also at times humorous discussion. Thank you. One of the things that came across clearly for me today is that we need AI assurance to no longer just be a theoretical exercise, but we actually need to build it into an operational discipline. And that’s a discipline that really needs to work across borders, across languages and cultures, and I think increasingly across agentic systems, systems that don’t just generate outputs but actually take action. I heard this panelist focus on the fact that assurance is pretty uneven today. It’s often strongest where there’s access to compute and data and evaluation infrastructure, and weakest where those things are scarce.
And as several of our panelists emphasized, if we don’t address that gap deliberately, the shift towards AI agents is only going to make that divide even worse. Rather than closing it. When I think about the nature of assurance, I think with agentic systems, it does need to change in its emphasis somewhat. Pre -deployment testing has always been necessary for all types of systems, and so too has post -deployment testing, of course. But post -deployment testing in an agentic world takes on an even greater level of importance, in my view. When systems can plan and they can chain actions, they can interact with tools, they can adapt over time, assurance really has to move towards continuous monitoring, real -time detection, and clear accountabilities for when interventions need to take place.
That can be quite a hard technical problem, but it’s also a governance challenge. So I know that PAI is known for convening communities of not just thinkers, but also doers. And so I wanted to leave everyone with a couple of ideas of implications that really follow from some of the insights that we heard today. The first is that it’s really important that we build assurance into systems as part of the system development lifecycle. And we don’t just seek to bolt it on at the end. So that means that we need to design systems so that they can be observed and audited and constrained in practice, not just in policy documents. Second, assurance has to be interoperable.
We heard Prime Minister Modi speak yesterday about building in India and delivering to the world. That, I think, is absolutely an aspiration that we should strive towards. But that can only work if we have evidence. Evaluation methods and documents and signals of risk that are usable across regions. Thank you. and adaptable to local languages, cultures, and deployment realities. Third, assurance has to be shared. No single company or government or institution can do this alone. And that’s especially true for agents, given how pervasive they are expected to become across the economy. We need shared evaluation infrastructure, shared taxonomies, and shared investment in capacity, particularly in the global south. So for me, this is why organizations like the Partnership on AI, as well as the many collaborators that have come here together in this week’s India AI Impact Summit, as well as open engagement across the community to make sure that we get this right.
It’s a really foundational area for collaboration for all of us. Now, my view is that if we do get assurance, and by right, I mean it needs to be global and inclusive and also dynamic. I think it really does become an enabler of trust and adoption, as Minister Teo said, not a break on progress. One of the key things that I think we need to do as a community is really to treat assurance as infrastructure, infrastructure that we need to build together and put into practice together. Thanks very much.
Well, what a phenomenal session from the opening and closing keynotes to a really rich and dynamic panel. I cannot think of a better way to close out what has been an extraordinarily rich and dynamic summit as well. I have the impossible task of trying to summarize everything that was just said here. So if you’ll bear with me, I’ll just offer kind of three core themes that seem to jump out to me. One is that we need to evolve and mature our understanding of assurance. There’s a lot of reference to agents here, the kind of coming prospect of multi -agent environments as well. We need from evals to mitigations, we need to have a better kind of an evolving understanding of how to do assurance.
Second, and probably more importantly, we also heard a lot about assurance as a global effort. Here I love Steph’s point about the need for greater north -south collaboration. There’s a lot of discussion from Fred and others about the need for global standards and harmonizing those standards and making them interoperable. And then there was also a lot of reference to some of the new institutions that we’ve evolved to enable that global dialogue to happen, whether it’s the institution that was announced literally just before this session an hour ago for the kind of global network or the international network of ACs that have also been kind of revitalized recently as well. And then kind of the last point that really jumped out at me was the assurance as a shared responsibility.
And, Fikosi, I love the point about kind of assurance as a bottom -up effort, and I think it’s one that, you know, we all have a role to play here, regardless of which sector you are in, regardless of what aspect of assurance you’re taking part in, there’s a role for all of us. So with that, I’m going to leave you with just one kind of final call to action, and that is to get involved, right? You know, if we want this technology to be safe and secure and trusted, we all have a role to play. So download the reports, very important thing. Download the great reports that have just come out on this topic. Get involved.
Look at the work that PAI and others are doing as well, and become a part of the conversation about how we’re going to take this amazing technology, but really make sure that it’s safe and secure and that we have a way to trust it. You know, in the opening remarks, Rameca, kind of used this great metaphor of the seed, right? Like one of the goals of the reports that they put out and the conversation in this panel. was to try and plant the seed about, you know, to watch kind of assurance grow. So I guess the parting thought I would give you is to say let’s all kind of roll up our sleeves and get to work and make sure that the seed grows.
So with that, thank you. And thank you as well for our panelists and speakers. Thank you. Thank you. Thank you.
– Josephine Teo- Owen Larter Real-time failure detection and tiered assurance approaches are needed based on risk levels and agent autonomy Three essential components needed: technical testing, stan…
Event– Owen Lauder- Wifredo Fernandez- Austin Marin Just as cars have standardized fuel economy ratings and crash test results that help consumers make informed decisions, AI systems need similar standard…
EventThird-party assessment and verification are increasingly demanded by markets as tools for building trust and ensuring accountability
EventAnd how do we demonstrate that the risks have been managed well? And that is where the assurance ecosystem that Rebecca talks about comes in. It is an absolutely essential part of building trust over …
Event_reporting-Digital Divides and Inclusion: Extensive discussion on bridging connectivity gaps, with emphasis on moving beyond basic infrastructure to meaningful connectivity that includes affordability, digital …
Event2.6 billion people remain offline globally, representing this dignity gap.
EventThe impact of jurisdiction size on regulation was also discussed. The example of Singapore’s small jurisdiction size potentially driving businesses away due to regulations was mentioned. However, it w…
EventAdvocates for a harmonised approach to regulation and policy-making believe that this method can yield positive outcomes. Specifically, the organisation META supports and promotes a harmonised approac…
EventEffective digital governance requires collaboration between government and industry stakeholders. This approach ensures that policies and regulations are practical, effective, and aligned with technol…
EventSingapore’s Ministry of Communications and Information (MCI), Digital Industry Singapore (DISG), Smart Nation and Digital Government Office (SNDGO),have partnered with Google Cloud to introduce the ‘A…
Updates2. Interoperability: The need for open standards and cross-border compatibility was emphasized by several speakers.
EventSuch standards are considered to promote transparency, collaboration, and interoperability.
EventAligning with standards allows companies to enter new markets and enhance competitiveness. Interoperability ensures seamless collaboration between different systems, promoting knowledge sharing and te…
EventThe overall tone was formal yet optimistic. Speakers acknowledged the serious challenges posed by rapid technological change but expressed confidence in the ability of democratic institutions and mult…
EventThe overall tone was formal yet warm and celebratory. Speakers expressed pride in the IFDT’s accomplishments and gratitude towards the host country, Montenegro. There was an underlying sense of urgenc…
EventThe tone is consistently formal, diplomatic, and optimistic yet cautionary. Speakers maintain a celebratory atmosphere acknowledging 20 years of progress while expressing serious concerns about curren…
EventThe tone throughout is consistently formal, diplomatic, and collaborative. Speakers maintain an optimistic and forward-looking perspective, emphasizing partnership and shared responsibility. The discu…
EventReferenced the wide sense of commitment and political will among member states and the promising, balanced nature of REV.1
EventThe tone was consistently critical and cautionary throughout, with Whittaker maintaining a technically informed but accessible warning about AI security risks. While not alarmist, the discussion carri…
EventThe discussion began with a cautiously optimistic tone, acknowledging both opportunities and risks. However, the tone became increasingly concerned and urgent as the conversation progressed, particula…
EventThe tone begins as analytical and educational but becomes increasingly cautionary and urgent throughout the conversation. While Kurbalija maintains an expert, measured delivery, there’s a growing sens…
EventThe tone begins confrontational and personal as Hunter-Torricke distances himself from his tech industry past, then shifts to educational and expansive while presenting AI capabilities. It becomes inc…
EventThe tone began optimistically with audience engagement but became increasingly concerned and urgent as panelists revealed the depth of AI-related challenges. Sherry Turkle acknowledged being “the Grin…
EventThe tone was notably optimistic and forward-looking throughout the conversation. Panelists consistently emphasized opportunities rather than obstacles, with particular enthusiasm around technology’s p…
EventThe tone was professional and collaborative throughout, with speakers building on each other’s points constructively. There was a sense of urgency about the challenges discussed, but also optimism abo…
EventThe discussion maintained a consistently collaborative and solution-oriented tone throughout. Speakers acknowledged serious challenges with urgency while remaining optimistic about potential solutions…
EventThe discussion maintained a consistently professional and collaborative tone throughout. It began with formal introductions and technical explanations, evolved into an enthusiastic presentation of pra…
EventThe discussion maintained a tone of “measured optimism” throughout. It began with urgency and concern (particularly in Baroness Shields’ opening about AI engineering “simulated intimacy”), evolved int…
EventThe tone was overwhelmingly optimistic and bullish throughout, with panelists consistently emphasizing the “limitless” potential and transformative nature of AI. The conversation maintained an excited…
EventThe tone is optimistic and collaborative throughout, with speakers sharing concrete examples of successful implementations and expressing confidence in achieving ambitious goals. There’s a sense of ur…
EventThe discussion maintained a professional, collaborative tone throughout, with panelists building on each other’s insights constructively. The tone was pragmatic and solution-oriented, acknowledging si…
Event“Both papers are available via QR codes for participants to download and discuss with the authors”
The knowledge base notes that QR codes and PDF downloads were provided to participants for accessing materials [S102].
“AI agents were not a thing a year ago, now they are emerging rapidly”
A source explicitly states that AI agents were not being discussed 12 months ago, matching the claim about their recent emergence [S23].
“Singapore’s sandbox partnership with Google lets the government “eat our own dog food” and build credibility before wider deployment”
Singapore’s Ministry of Communications and Information partnered with Google Cloud on an AI initiative (AI Trailblazers), which functions as a sandbox for testing AI solutions [S76].
“The discussion highlighted the need to apply AI assurance to autonomous “agentic” AI as the world moves in that direction”
Other sources discuss the growing adoption of agentic AI and the associated risks, underscoring the relevance of assurance for such systems [S54] and note that up to 90% of public-sector agencies plan to explore or implement agentic AI within two years [S110].
“Partnership on AI (PAI) has released two new resources: “Strengthening the AI Assurance Ecosystem” and a paper on global AI assurance”
The knowledge base confirms that the Partnership on AI is expanding and launching new initiatives related to AI challenges, though it does not list the exact titles mentioned in the report [S101].
The panel displayed strong consensus on the necessity of a robust, inclusive AI assurance ecosystem that incorporates rigorous testing, standards, third‑party verification, multilingual considerations, and shared multilateral effort. There is agreement that accessible tools, capacity building, and continuous monitoring for agentic AI are essential. The convergence across government, industry, and civil‑society voices signals a solid foundation for coordinated action.
High consensus across most speakers, indicating a shared understanding of the core pillars needed for trustworthy AI and suggesting that forthcoming policy and technical initiatives are likely to receive broad support.
The panel broadly agrees on the necessity of a robust, trustworthy AI assurance ecosystem, but diverges on where responsibility should lie (national sandbox vs multilateral coordination), the balance between centralized standards and local capacity, the role of financial incentives, and the emphasis on continuous post‑deployment monitoring.
Moderate to high disagreement: while there is consensus on the goal, the differing viewpoints on governance structures, incentive mechanisms and operational focus indicate significant strategic gaps that could impede coordinated action unless reconciled.
The discussion was shaped by a series of pivotal comments that moved the conversation from high‑level declarations to concrete, actionable frameworks. Josephine Teo’s shift toward proactive, sandbox‑based governance and her three‑pillar model set the conceptual foundation. Frederic Werner broadened the scope by highlighting connectivity and digital‑literacy gaps, prompting a focus on capacity building in the Global South. Owen Larter supplied tangible technical standards for agent interoperability, while Stephanie Ifayemi provided a structured taxonomy of assurance challenges and a tiered risk‑based approach. Natasha Crampton’s closing synthesis framed assurance as essential infrastructure, tying together the technical, policy, and equity strands. Collectively, these insights redirected the dialogue toward practical standards, inclusive capacity building, and a shared‑responsibility mindset, steering the panel toward concrete next steps rather than remaining in abstract debate.
Disclaimer: This is not an official session record. DiploAI generates these resources from audiovisual recordings, and they are presented as-is, including potential errors. Due to logistical challenges, such as discrepancies in audio/video or transcripts, names may be misspelled. We strive for accuracy to the best of our ability.
Related event

