Setting the Rules_ Global AI Standards for Growth and Governance

20 Feb 2026 17:00h - 18:00h

Setting the Rules_ Global AI Standards for Growth and Governance

Session at a glance

Summary

This discussion focused on the critical need for global AI standards and the challenges of implementing them across different stakeholders. The panel, moderated by AI transformation consultant Bhushan Sethi, brought together representatives from major tech companies (Microsoft, Google DeepMind, OpenAI, Qualcomm), standards organizations (ML Commons, Bureau of Indian Standards), policy makers (Singapore government), and the Frontier Model Forum to explore why AI standards are essential and how they can be developed effectively.


The panelists agreed that AI standards serve multiple crucial purposes: building consumer and enterprise trust, enabling global cooperation, solving collective action problems, and providing a common language for risk management across the AI supply chain. They emphasized that standards help define “what good looks like” in AI development and deployment, particularly important as regulations often reference standards that don’t yet exist. The discussion highlighted three key areas where standards are most needed: testing methodologies for AI systems, transparency and disclosure practices, and incident reporting mechanisms.


A major theme was the challenge of measurement and benchmarking in AI systems. Rebecca Weiss from ML Commons explained that effective benchmarking requires developing taxonomies, datasets, and evaluator systems that can estimate uncertainty rather than provide binary safety assessments. The panelists stressed that standards must be inclusive and accessible, particularly for smaller companies that lack resources to develop their own risk management frameworks. They also addressed concerns about ensuring standards are substantive rather than performative, noting that regulatory requirements create pressure for meaningful compliance.


Looking forward, the panel emphasized the need for interoperable standards that can evolve with rapidly advancing AI capabilities while maintaining consistent processes for risk identification and management. The discussion concluded with recognition that successful AI standards require ongoing collaboration between industry, government, and civil society to address both technical challenges and diverse global needs, including language bias and cultural considerations.


Keypoints

Major Discussion Points:

The Need for AI Standards and Global Cooperation: Panelists emphasized that AI standards are essential for establishing trust, enabling global cooperation, and creating alignment on “what good looks like” across different stakeholders. Standards help solve collective action problems and provide legitimacy through open, inclusive processes that benefit companies of all sizes, not just large tech firms.


Technical Measurement and Benchmarking Challenges: The discussion highlighted the complexity of measuring AI systems, focusing on estimating uncertainty rather than binary “safe/unsafe” determinations. This involves developing taxonomies, datasets, and evaluator systems that can provide statistical guarantees about AI behavior under specific conditions, with different sectors having varying tolerance levels for uncertainty.


Standards vs. Regulation Relationship: Panelists explored how standards often fill gaps left by regulation, with some jurisdictions requiring AI frameworks without specifying their contents. Standards provide the technical details needed for regulatory compliance while offering market differentiation opportunities even without regulatory mandates.


Implementation and Future-Proofing: The conversation addressed practical challenges of implementing standards across the AI value chain, from model developers to application deployers. Emphasis was placed on creating process-oriented standards that can adapt to evolving AI capabilities while maintaining interoperability and avoiding the need to “reinvent the wheel” for each new development.


Inclusivity and Accessibility Concerns: Discussion covered ensuring standards are accessible to smaller companies and address diverse global needs, including language bias and cultural considerations. Panelists acknowledged the need for broader stakeholder participation beyond large tech companies to create truly representative standards.


Overall Purpose:

The discussion aimed to demystify AI standard-setting by bringing together diverse stakeholders (tech companies, standard-setting organizations, government representatives, and policy experts) to explore why AI standards are necessary, how they should be developed and measured, and what implementation looks like in practice. The goal was to demonstrate alignment across different sectors on the importance of collaborative, inclusive approaches to AI governance.


Overall Tone:

The discussion maintained a consistently collaborative and constructive tone throughout. Panelists demonstrated remarkable consensus on the importance of standards, with no significant disagreements or tensions. The tone was professional yet accessible, with participants building on each other’s points rather than challenging them. The atmosphere remained optimistic about the potential for global cooperation on AI standards, despite acknowledging significant technical and implementation challenges. The tone became slightly more technical during the measurement discussion but returned to broader strategic themes, maintaining engagement with the audience throughout.


Speakers

Speakers from the provided list:


Bhushan Sethi – Consultant around AI transformation, helps companies implement AI and drive return on investment in a responsible way


Lee Wan Sie – Works in Singapore government in AI governance and policy


Chris Meserole – Executive director of the Frontier Model Forum, focuses on advancing Frontier AI safety and security


Etienne Chaponniere – Vice president of technical standards at Qualcomm


Esther Tetruashvily – AI Standards Lead at OpenAI


Kshitij Bathla – Works at Bureau of Indian Standards (BIS), the National Standards Body of India, representing ISO ICJTC1 SC42


Joslyn Barnhart – Works at Google DeepMind on AI standards, governance, and policy


Amanda Craig – Leads the public policy team with AI and the Office of Responsible AI at Microsoft


Rebecca Weiss – Executive director of ML Commons, an AI benchmarking organization and engineering consortium


Audience – Multiple audience members who asked questions during the Q&A session


Additional speakers:


None – all speakers who participated in the discussion were included in the provided speakers names list.


Full session report

This comprehensive discussion on AI standards brought together a diverse panel of stakeholders to explore the critical need for global cooperation in establishing frameworks for artificial intelligence governance. Moderated by AI transformation consultant Bhushan Sethi at a summit in India focused on “planet, people, and prosperity,” the panel included representatives from major technology companies (Microsoft, Google DeepMind, OpenAI, Qualcomm), standards organisations (ML Commons, Bureau of Indian Standards), policy makers (Singapore government), and the Frontier Model Forum. The conversation aimed to demystify AI standard-setting whilst demonstrating the remarkable consensus that exists across different sectors on the importance of collaborative approaches to AI governance.


The Fundamental Need for AI Standards

The discussion began with establishing why AI standards are essential in today’s rapidly evolving technological landscape. Kshitij Bathla from the Bureau of Indian Standards emphasised that standards serve as tools enabling consumer trust whilst ensuring industry quality assurance across AI ecosystems. This perspective was reinforced by Chris Meserole from the Frontier Model Forum, who articulated that standards fundamentally solve collective action problems by ensuring no single actor is disadvantaged whilst managing AI risks effectively.


Lee Wan Sie from Singapore’s government highlighted that standards create alignment on “what good looks like” in AI governance, particularly in three key areas: testing methodologies for AI systems, transparency and disclosure practices, and incident reporting mechanisms. This framework provides a common language for stakeholders across the AI value chain to communicate about risk management and quality assurance.


Amanda Craig from Microsoft explained how standards function as translation mechanisms between internal company practices and external stakeholder understanding, noting that Microsoft has developed internal responsible AI standards that align all stakeholders around common expectations.


The Regulatory Context and Global Cooperation

A particularly striking revelation emerged from Joslyn Barnhart of Google DeepMind, who observed that “regulation has gone ahead and jumped to, you know, we’ve regulated and essentially made reference to standards that do not yet exist.” This comment fundamentally reframed the discussion from theoretical benefits to practical necessity, explaining why major technology companies are suddenly prioritising standards work with unprecedented urgency.


Chris Meserole elaborated on this phenomenon, noting that multiple jurisdictions are recognising frontier AI risks and delegating standard-setting to technical bodies rather than specifying requirements directly. This approach allows governments to address citizen concerns about AI risks whilst leveraging technical expertise for implementation details.


From India’s perspective, Kshitij Bathla explained how the country’s approach aligns with the “Manav mission” whilst adapting global standards to local use cases and conditions. He emphasised that standards bodies are interconnected globally, creating collaborative rather than siloed approaches to AI governance.


Technical Measurement and Industry Implementation

Rebecca Weiss from ML Commons provided crucial insights into benchmarking methodologies, explaining that effective benchmarking consists of three essential components: a taxonomy for categorising risks and capabilities, datasets for testing, and evaluator systems for assessment. She articulated a sophisticated approach to AI evaluation: “You’re trying to provide a sense of, I’m not going to tell you that your system is, quote-unquote, safe or not. What I’m going to tell you is, under these considerations, under these conditions, under these assumptions, the estimated likelihood of a particular risky behaviour is X.”


This probabilistic approach shifts responsibility to risk management professionals, deployers, and developers to determine whether the estimated risk levels are acceptable for their specific use cases, moving beyond binary “safe” or “unsafe” determinations.


Esther Tetruashvily from OpenAI explained that standards serve multiple functions for frontier AI laboratories: translating risk management practices into language that customers can understand, creating universal language for consumer trust, and enabling interoperability. OpenAI’s recent certification under ISO 42001 exemplifies how companies are using voluntary standards adoption for market differentiation and credibility building.


Etienne Chaponniere from Qualcomm brought a unique perspective as a chipset provider, emphasising the democratising potential of standards. He noted that whilst large companies have resources to develop internal risk management systems, the numerous smaller companies entering the AI space daily lack such capabilities. Standards provide these smaller players with accessible pathways to compliance and quality assurance.


Addressing Inclusivity and Global Diversity

The discussion acknowledged significant challenges in ensuring AI standards serve diverse global populations. A computer science student raised concerns about language bias, noting India’s 22 official languages and the complexity of thinking across multiple linguistic contexts.


Esther Tetruashvily responded by describing OpenAI’s efforts to evaluate model performance across various languages and dialects, including specific testing for Indian linguistic diversity. However, she acknowledged that addressing these challenges requires collective effort and partnership with local ecosystems.


Etienne Chaponniere added that even individual users often think across multiple languages and cultural contexts, emphasising that whilst perfect coverage may be impossible, the focus should be on creating reusable software frameworks that can be adapted for different languages whilst maintaining efficiency.


Future-Proofing and Market Dynamics

Looking towards the future, Chris Meserole distinguished between process standards, which can be future-proofed, and specific evaluations and controls, which will need regular updating. He argued that robust processes for identifying, evaluating, and mitigating risks can remain stable even as specific risks and capabilities evolve.


Lee Wan Sie made a crucial observation that standards provide value even without regulatory mandates, citing voluntary certifications as evidence of market-driven demand for credible quality signals. Joslyn Barnhart explained the economic logic behind industry cooperation on AI safety standards, noting that safety mitigations for frontier AI risks can be costly, creating strong incentives for collective action. She emphasised that “the worst thing for adoption would be a safety incident.”


This market-driven approach suggests that consumer and enterprise demand for trustworthy AI systems creates natural incentives for companies to pursue credible standards compliance, enabling what Barnhart described as a “race to the top” in safety and quality.


Addressing Legitimacy and Accountability Concerns

The discussion was enriched by challenging questions from the audience that highlighted tensions between industry leadership and public accountability. One audience member raised concerns about industry-driven standards potentially serving commercial interests over public needs, questioning how governments with limited technical capacity could effectively audit sophisticated AI compliance programmes.


Jules Polonetsky from the Future of Privacy Forum added complexity by noting that AI governance encompasses broad social policy issues with significant stakeholder disagreements, raising questions about whether standards should seek minimum viable consensus or address different stakeholder priorities through alternative mechanisms.


These concerns highlighted the ongoing challenge of ensuring that technical standards development remains legitimate and serves broader public interests rather than merely facilitating industry coordination.


Conclusions and Path Forward

The discussion revealed significant consensus across diverse stakeholders on fundamental questions about AI standards. All participants agreed that standards are essential for building trust, enabling adoption, solving collective action problems, and creating common frameworks for risk management.


Key areas of convergence included the importance of process-oriented standards that can adapt to evolving capabilities, the need for uncertainty quantification rather than binary safety assessments, the value of global cooperation through interconnected standards bodies, and the necessity of inclusive participation from diverse stakeholders.


However, several significant challenges remain unresolved, including tensions around the pace of standards development, questions about ensuring industry-driven standards serve public interests, and concerns about government capacity for effective oversight. The panellists also acknowledged the complexity of operating across different jurisdictions, including differences between approaches in China and the United States.


The path forward appears to require continued collaboration between industry, regulators, and standards bodies to develop robust process standards whilst building technical capacity for measurement and evaluation. The success of this endeavour will depend on maintaining collaborative spirit whilst addressing legitimate concerns about accountability, inclusivity, and democratic participation in AI governance.


The discussion demonstrated that whilst significant technical and governance challenges remain, there exists a foundation of shared understanding and commitment to collaborative solutions across the AI ecosystem, providing a basis for continued progress on global AI standards development.


Session transcript

Bhushan Sethi

I’m going to provide a brief introduction and then I’ll have my panelists introduce themselves and we’ll get into the discussion. So I’m a consultant around AI transformation. I help companies implement AI, drive the return on investment in a responsible way with AI. What’s really important about this discussion is we need to demystify what we mean by standard setting. There’s been a whole lot of discussion at this week’s summit around the importance of global cooperation, that the importance of inclusion around AI, driving solutions that meet everybody’s needs. The tech CEOs spoke about it yesterday. World leaders have spoken about it. We’re here in India where it’s about planet and people and prosperity. So that’s what the discussion is going to be about.

And we are going to have time for Q &A at the end. But I’m going to have my panelists introduce themselves first in the order that they’re sitting to introduce themselves and also talk about what standards mean for them? What lens they’re looking at from a standard perspective around AI?

Rebecca Weiss

Hello, my name is Rebecca Weiss I’m the executive director of ML Commons we are an AI benchmarking organization we are an engineering consortium that focuses on that problem and so for us as a technical standards organization around benchmarking what that means for us is two things one, we want to define the methodology for measurement and two, we want to create the technical artifacts that allow for engineers to integrate this methodology into their development life cycle. So for us, when we see what’s happening in the world today, the ability to measure risk is a big barrier to adoption and that ability to understand and estimate the uncertainty around the behavior of an AI system is something where we think benchmarking can help.

So, I will actually we have a large panel so I’m going to let everyone else have a chance to talk and I’m sure more will come out in our dialogue.

Etienne Chaponniere

My name is Etienne Chaponniere I work for Qualcomm. I’m a vice president of technical standards And so what we do within that role is, effectively, we have a team going to technical standards for AI, and we actually try to coordinate where is it that we need to go, how is it that we need to make sure that we understand what it means to be compliant. I come from a world of telecom, as Qualcomm can evoke to some folks. And for us, it’s a very different thing, right? For the telecom world, you cannot ship a product unless you comply to a standard because you need it for interoperability. In the world of AI standards, it’s a bit different.

So we’re talking more about safety standards, and those typically tend to trail the products. The products are out there, and then they’re going to comply to standards at some point when the standards are available. What matters, however, what is common in all of this is that the standards need to be available at scale for everyone and in a way that engineering teams can do it easily, at least from the product side. So I think I’ll leave it at that, and, yeah, that’s it.

Lee Wan Sie

I’m Wan Lee from Singapore government. I work in AI governance and policy. So many things, but specifically for standards, what it means to us is setting norms. That means alignment globally on what good looks like. And specifically in the area of AI governance, then a lot of it has to do at this stage in terms of common methodologies and processes that we have to follow. So, but it’s still technical. It’s not a checkbox, but hopefully that helps us all align to what good looks like. Thanks.

Bhushan Sethi

And maybe before the next introduction, just so you can get a flavor, we have standard setters and measurers. We have people in industry and we have people who play in the policy and the regulatory environment. And that’s the importance around this topic.

Amanda Craig

Thank you. Hi, everyone. I’m Amanda from Microsoft. I lead the public policy team with AI. And the Office of Responsible AI at Microsoft. I think Juan C. said it well when she described standards as really, like, aligning around what good looks like. And I would offer, you know, we actually at Microsoft in our office, we define something called our responsible AI standard that applies to all of our internal kind of product groups, our engineering function, our sales function. And if you think about, like, the role of that internal standard is to align all of the internal stakeholders we have around what good looks like. Like, externally, we need the same sort of mechanism, right? And that’s the role that standards can play in the broader ecosystem.

So we want to partner with our industry colleagues, and we want to partner with governments and others around the world to be able to define what good looks like so we can all have that common language instead of expectations.

Joslyn Barnhart

Hello. Jocelyn, Google DeepMind, where I also work on issues of AI standards, governance, and policy. building on what’s been said. So I think that was an interesting point that often technical standards come first and process and safety standards often come later. In the space of AI at the moment, actually, regulation has gone ahead and jumped to, you know, we’ve regulated and essentially made reference to standards that do not yet exist. So for places like Google DeepMind who have not invested heavily in the standard space in the past, this is now of an utmost priority because we actually need this to assist with implementation and compliance. So that is a primary goal on our side.

Chris Meserole

I’m Chris Meserole,. I’m the executive director of the Frontier Model Forum. Our mission is to advance Frontier AI safety and security, and we work with many of the leading Frontier AI developers and employers, including several colleagues on the stage today, to advance, you know, best practices for risk management. For Frontier AI in particular, there’s a kind of unique and a set of unique and novel risks that over the last couple of years. the community has really started to develop and converge around a set of best practices that now I think need to start to graduate into actual formal standards, and I think that’s kind of why we’re here. That’s why we’re very interested in the standard -setting space.

Esther Tetruashvily

Hi, everyone. My name is Esther Tetruashvily, and I’m the AI Standards Lead at OpenAI. Echoing many of the things that have already been said, I think standards for us, especially as a frontier AI lab, is about translating some of our practices for risk management into the language of risk management for customers across the supply chain, and it’s also about creating a language for consumer trust and assurance. It’s also about, in the age of agents, thinking about interoperability and helping everyone benefit from this ecosystem that we’re developing here. So I’m really excited to be here and to talk about these issues with you all. Thank you.

Kshitij Bathla

Hello, everyone. I’m Kshitij Bathla from Bureau of Indian Standards, the NETS. National Standards Body of India, and here representing ISO ICJTC1 SC42, because BIS, European Standards, is a part of the SC42. and for us I would say standards are the tools which enables consumers’ trust in whatever ecosystem for which they are developed as well as enable us for the industry to get it done to ensure the quality and the consumer trust. That’s the main focus area for us. Thank you.

Bhushan Sethi

So let’s start with why we need standards. Why are we even here? Because there’s a lot of confusion between standards, regulation, legislation. Are we going to get global cooperation around these things? Maybe should it just from a standard setting perspective and then maybe from a regulatory perspective. Why are we here? What’s the problem we’re solving and for whom?

Kshitij Bathla

So I would say the problems, there are multiples in the standards domain. Specifically, it always starts with what we are tackling with. What is AI? That was the primary focus of the JTC1 and SE42 when it started. So it defined what is AI. what is generative AI now they are talking about what is agenting AI as of now talking about so I think the most of the specific points that needs to be taken care is what is coming next and to keep pace with that and apart from once it comes to that when we have kind of mentioned that what it is all about then how do we verify and validate whatever is being said that this is a system which is having AI for example I would say someone says they have an equipment call it washing machine or is equipped with AI but is it actually equipped with AI or it’s just a normal logic system so this is something that we are trying to do the standardization.

Bhushan Sethi

So it’s about trust it’s about verifying the tech firms here represented are moving very fast with the model development so it’s like we need standards there from aregulatory perspective what would you add there?

Lee Wan Sie

I think the most important thing I wouldn’t say from a regulatory perspective. Maybe in terms of why, from an AI policy perspective, we think standards are helpful. Like I said, it’s about defining alignment in what should be in, let’s say, transparency. So I think if you say what would be the top three things today that we want to think about testing, setting for standards would be one, testing. How do you do testing for AI? Whether it’s AI models or AI applications, I think that’s one area. Because then it defines what good testing can look like. Two, perhaps in transparency, what would disclosure look like? Everyone has their own way of sharing the information that they want to share.

One way is to standardize it so it’s easier for the readers, people who are consuming this information to understand. And I’m saying this in very, very broad terms. I mean, it depends on which reader you’re talking about, who’s going to consume. just in broad terms, perhaps one way of standardizing it. Maybe the third way could be in how you’re reporting or monitoring incidents. But it’s still very, very early days. But that’s where standards, again, in terms of alignment, that might be one that would be useful to find alignment in these areas.

Bhushan Sethi

So ,how do we report? How do we disclose? How do we make it credible? And so it’s not a subjective tick -the -box exercise, etc. From a standard setting, Chris and Rebecca, from a standard setting perspective, what would you add to that before we have kind of the industry view?

Rebecca Weiss

I’m happy to add to this. So I think there’s been a theme that has come across in this panel a couple of times, which is what is good enough? And I think in order to define that, a standard represents a consensus about what is good enough. The problem that we have is who contributes to that consensus. It shouldn’t probably be exclusively an industry perspective. You need to have more stakeholders or more constituencies that need to be represented in that definition. And then on top of that, what is good enough, as I think Jocelyn mentioned earlier when we were talking before this panel, there’s a scientific element to that. How do you define the characteristics of a system such that you can actually create?

the kind of uncertainty estimation that lives up to a statistical guarantee, but then there’s also the political element to that, which represents a whole set of issues that I’m actually not qualified to talk about, so I will pass it to Chris.

Chris Meserole

I think it’s worth backing up from this thing. One of the original questions was, what are standards for? Is Chris’s mind working? I was just saying, one of the things we should maybe do is back up a little bit to this question of what are standards for, and I think a big part of what standards are for is to try and solve this collective action problem. There’s a kind of unique set of risks that we are worried about. We want to make sure everyone’s on the same page so that no one kind of actor is disadvantaged or advantaged compared to others. Having standards for how we’re going to manage risks across an ecosystem are extremely useful for that, so there’s a policy dimension to it.

There’s also an adoption dimension to it, right, because people want to know that there’s kind of… of a common way across industry of handling a certain class of risk. And I think being able to set standards and have a formal standard -setting body, to one of the points that was made earlier, by definition a standard -setting body is open, right? So there’s a legitimacy and a credibility to standard -setting bodies that you don’t have if it’s just industry or just government in many cases. And I think, you know, all of those kind of factors coming together are exactly why we’re so keen on kind of pushing forward the standards discussions.

Bhushan Sethi

Yep. So maybe from a hyperscaler perspective, maybe Esther, then Jocelyn, and we can kind of like play it clear, the difference, how is this showing up kind of at your firms and how are you thinking about this?

Esther Tetruashvily

Yeah, no, that’s a great question. I think from sort of a market adoption perspective, a lot of our technology, like general purpose AI models or foundation models, are being integrated into existing ecosystems or on top of. stacks. And there’s a lot of confusion in terms of risk controls and risk management about what that means. We have our own risk management processes. They have their own risk management processes. And one of the barriers to adoption is having a common language to talk about how do you map those controls onto one another. There’s a separate challenge, I think, of who is best positioned to control a particular risk. What are the risks? What are the net new risks?

What are the risks that are already existing where we don’t need to create something net new? And so for us, it’s both an imperative in some ways to kind of translate what we’re doing in terms of managing risks into the language of upstream, downstream customers so that they can understand and map those same practices onto their controls. And then we kind of can create a universal language that can ease trust and assurance in an easy, rockable way across the market. There’s also just space for, I think several people have talked about. Regulations moving ahead. of the standards, where we are still developing methodologies, what is standardizable in what we’re doing, recognizing where the science is not cut up yet, and where we maybe are in a place of more maturity.

Bhushan Sethi

And maybe just to bring it to life for the audience, given the huge amount of subscribers you have in India, around the world, growing every day, what’s changed in the standard vernacular at OpenAI?

Esther Tetruashvily

In terms of our adoption, or in terms of how we’re distributing it?

Bhushan Sethi

Yeah, the prominence of it, how people are thinking about it, the importance of the topic.

Esther Tetruashvily

So I think there’s both an aspect of it that’s like, what does already exist that we can use that can reassure customers that we are following the best practices for the industry, say for privacy or cybersecurity. There’s an existing risk management standard, ISO 42001, that OpenAI just got certified in. And that definitely signals something to the market. And to customers. Then there’s also sort of a transparency. element, right? We have our safety frameworks, we update them, we disclose information about in our model cards performance on a variety of metrics. And then there’s certain things we do to kind of elevate and help stakeholders across the spectrum in terms of how to build evaluations. So we currently published a safety hub that gets updated regularly that kind of tells how we’re performing in a variety of metrics and what are the best methodologies and how to work with this.

Bhushan Sethi

Great. So Joslyn, can you bring to life how Google DeepMinds are thinking about standard setting in that context?

Joslyn Barnhart

Yes. I’ll take it back to what Chris was talking about in terms of collective action problems. So some of the mitigations we’re talking about associated with some of the more extreme risks that Frontier AI poses can be quite costly. And so I do think that there is just a strong industry incentive to work together to resolve this collective action problem. Again, as Chris said, doing this through standards through an open, legitimate process seems to be incredibly impactful. Again, like the… The worst… thing for adoption would be a safety incident. So again, we have a collective incentive as an industry to make sure that we raise the floor to avoid that on all of our behalves.

So I do think that that is seen, you know, I think standards at this point are seen as a very clear and important strategic play for making, you know, essentially clearing the path for rapid adoption.

Bhushan Sethi

Amanda, how do they show up at Microsoft right now? Can you hear the question? How do they, how do these standards show up at Microsoft? Amanda’s going to speak about Microsoft experience.

Amanda Craig

Thank you. Yeah, I was going to start by just thinking about, at Microsoft, at Google, at other places, it’s not a totally new kind of process that we’re going through, right, in terms of thinking about standards and the importance of standards for adoption of this technology, sufficient trust in order to have adoption and in order to really enable compliance. I mean, I think Esther made a really good point. and sort of acknowledging that, you know, especially as we are deploying this technology, we are working with customers that have their own set of standards and regulation, and part of the challenge that we find ourselves, like, facing right now in AI governance is we have a lot of high -level norms and expectations that, again, are not so different from the patterns we’ve seen before.

Basically, we want to know how AI providers are managing risk, but we are in the early days of defining really what that means in practice in a really detailed way, especially, like, across the AI value chain. So what are model developers really responsible for doing for risk management? What are application developers really responsible for doing? How does that dock in to what deployers of those applications that are oftentimes implementing existing standards and meeting existing regulatory requirements? How does all that fit together? And, again, you know, we’ve done this with other digital technologies as well, like software, like cloud services, where we’re ultimately trying to define in practice what are the challenges that we’re facing right now.

is everyone responsible for doing? How do we have a common language to be able to talk to each other among sort of providers or the supply chain of technology and those that are ultimately deploying it? But we actually really do need the standards to support that, right? Because otherwise we are stuck at the sort of like high level conversation about norms around we want to evaluate risk. We want to figure out what the kind of right transparency practices are. Or we can find ourselves in this sort of deep technical weeds but like sort of having a place in between that is really at the level of standards, of technical standards, really helps drive that kind of common set of expectations so that you can have trusted.

Bhushan Sethi

So we need them. They’re important. We’ve got to drive adoption. There’s a collective action agreement here. From a Qualcomm perspective, SCM, bring to life the business model, how you use this in engineering your products.

Etienne Chaponniere

Yeah, so I think there’s one thing that I’d like to note. I think there’s one thing that I’d like to note here. As Qualcomm, we basically provide chipsets, right? We’re not building chipsets. We’re not building chipsets. We’re not building big models. What matters to us still is the fact and the reason why we’re engaged in those standards, whether it’s in ISO, Sentinelic for Europe, ML Commons, when it’s other type of standards, is effectively the fact that it provides scale in the sense of providing scale not only across the globe but also allows any different type of companies to benefit from it. I mean, let’s be clear, right? If you look at the companies who have the type of resources to either set up their own standards and risk management systems internally, they’re typically pretty big companies.

Now, the thing with AI is that there’s a huge amount of companies who are being created every day, and they don’t have the resources to put this together. And so there’s two conditions for making sure that the type of standards that are being put together are, one, inclusive, is that they’re open, as Rebecca, you were alluding to before. And so, whether it’s ML Commons, which has a very open governance model, or ISO, or Sentinelic in Europe, there needs to be an opportunity for everyone to participate. So that’s the first step. However, we know, and that’s the reality, that not everyone has the means to participate. Because they’re like super focused, they need to bring up their own LLM for that particular use case or maybe very general use case, and they just don’t have the resources to do this.

So from that standpoint, having the standard as effectively a mechanism for them to go directly to product and know that they’re going to comply with what the, effectively, world or the community has set up is really important. So from Qualcomm, the reason why we want to participate is to enable this type of accessibility to companies which are not always the biggest one.

Bhushan Sethi

Yep. So agreement that we need them. Before we go into how we set standards, how we measure and benchmark them, and Rebecca will bring that to life, a wildcard question is, there could be a lot of people listening to this to say, the world is not connected and cooperating around this. We don’t have global regulations on AI. But yet we have… industry leaders, standard setters, vehemently agreeing. How should the audience think about that? Is there a disconnect there or would anyone like to comment on that?

Chris Meserole

I would actually put, so part of one of the reasons why I think we’re all so interested in standards is one of the things you have, one of the things you’re seeing is multiple jurisdictions saying some version of we think that there are new risks with frontier AI. We as the government are concerned on behalf of our citizens that we are kind of attending to those risks across industry. Those risks and how to manage those risks are probably best left to be developed or kind of managed through the standard setting process, but they aren’t always setting the standards. So in the United States, there’s a couple of different states, for example, within the United States that have passed requirements for frontier AI developers.

to have a frontier AI framework, but they don’t specify what should actually be in the framework. They kind of offload some of that to the standards process, which is why I think it’s so important to have these standards in place. Like, there’s a clear kind of policy and regulatory interest in there being mechanisms by which some of the risks that may come with frontier AI are managed, but we need to kind of color in the lines a little bit exactly, like, you know, how we’re all going

Bhushan Sethi

And before we go to Rebecca, just from an India perspective, PM Modiji talked about Manav yesterday and the AI vision. Through there, there was a lot of focus on validity and governance, so standards were implied there. Do you want to just bring to life kind of how India thinks about this before we go to Rebecca and talk about measurement?

Kshitij Bathla

So I would say the Manav mission, it’s welfare, human -centric, and all those aspects are there. And from the governance perspective, also what is going on is that the government is not going to be able to do anything about it. we as of now the India AI governance guidelines are there. This is providing you a framework that these are the things that you should look into. Just providing a reference to. So in this direction the Indian government as of now is moving into. Coming into the from the perspective of standardization and at the national level as well as the ISO level I am adding to the question that you asked previously. That standards bodies are interconnected with each other.

The ISO there is a license mechanisms. We have the ML Commons as the license there. The IEEE is there. All bodies are there. So they are all interconnected there and whatever is coming as of out of these bodies is an outcome which is based on the studies. but done by various forums it’s not only the one I would say just the ISO body or not so in this direction the Indian standards that we are working on we are developing are also in the direction because here is something which is global we can’t have cells was specifically for India there could be the risks there could be specific use cases that are India specific for that those we need to have some specific guidance but more or less everything is the global thing that we are trying to look

Bhushan Sethi

into and then adapt those with the specific use cases that we need to right so we need global we need to adapt that to kind of local kind of conditions and use cases so let’s get a bit more technical Rebecca like why is this hard how do we measure it like how does it compare to benchmarking maybe Rebecca and then and then from a regulatory perspective did you want to make

Lee Wan Sie

I just want to respond to Chris comment and your question about you know if there’s no regulations then why do we care about standards right I mean, sure, I think there will be regulators who will say, yes, turn to the technical standards to define the expectations, which I think is a fair point that Chris made. But even when there’s no regulations, I think the standards still are useful. I mean, Esther just mentioned that OpenAI is certified for 42 ,001. You didn’t need to do that, but why did you do it, right? And Entropy has done that as well. And I think the idea is that perhaps there’s also a way to differentiate for organizations, for enterprises.

And it doesn’t have to be the frontier model labs only. It could be app developers and so on. A way to differentiate themselves and say that, look, I’m adhering to a global standard. I’m demonstrating that I have actually implemented something that’s good enough. I’ve addressed a risk in this way. I think that’s one good…

Bhushan Sethi

Do you want to make a quick comment? Yes, do you want to make a response to everything we’re getting to? Sorry, Rebecca. Please.

Lee Wan Sie

I just want to respond to Chris’ comment and your question about, you know, if there’s no regulations, then why do we care about standards, right? I mean, sure, I think there will be regulators who will say, yes, turn to the technical standards to define the expectations, which I think is the fair point that Chris made. But even when there’s no regulations, I think the standards still are useful. I mean, Esther just mentioned that OpenAI is certified for 42 ,001. You didn’t need to do that, but why did you do it, right? And Entropy has done that as well. And I think the idea is that perhaps there’s also a way to differentiate for organizations, for enterprises.

And it doesn’t have to be the frontier model labs only. It could be app developers and so on. A way to differentiate themselves and say that, look, I’m adhering to a global standard. I’m demonstrating that I have actually implemented something that’s good enough. I’ve addressed it. I’ve risen this way. I think that’s one good… reason for standards, even if there’s no regulatory cover. So the certification assurance part is helpful. Yeah, I just wanted to add that as a little bit of colour just to give some benefits to the standards community that is still kind of very…

Bhushan Sethi

Thank you. Bringing the regulatory perspective and kind of the Singapore experience. So let’s get into measure. And the fellow panellists, if you want to respond to anything, just give me the signal. We’re going to make this an interactive conversation. So Rebecca, how do we measure this?

Rebecca Weiss

Well, solve all the problems in one definition. No, I’m kidding. But as I said earlier, benchmarking consists of two things. It consists of a methodology, at least from our perspective, the way that we do benchmarking consists of a measurement methodology, and it consists of reference builds, implementations of that methodology so that engineers can use that. And the definition of a benchmark, as we’ve been trying to operationalize this in places like ISO and others, is a taxonomy, a data set, and an evaluator system. And the point of all of that construct is, as Etienne pointed out, this allows for you to scale this kind of approach towards the type of deployments that we’re expecting to see in these types of AI settings.

The challenge behind all of this is that what you’re really trying to do is estimate uncertainty. Uncertainty. You’re trying to provide a sense of, I’m not going to tell you that your system is, quote -unquote, safe or not. What I’m going to tell you is, under these considerations, under these conditions, under these assumptions, the estimated likelihood of a particular risky behavior is X. And then it is up to you as a risk management professional, a deployer, a developer, it’s up for you to decide, is that enough? Is that good enough for your needs? And I don’t think it’s going to be the same for different sectors. I think sometimes. Sectors will have a much higher bar for the amount of uncertainty.

that needs to be estimated, and then other sectors will probably be like, that’s good enough for me. I don’t necessarily need to get much further than what you are offering right off the date. So we can go into all of the different questions that are made open, but those particular areas related to developing that taxonomy, developing those data sets, and developing those evaluators, the best practices and the standards to make it clear that this is the best in the industry, this is the way that it is, that’s what we need to get better at.

Bhushan Sethi

Yeah, so what I’m hearing is we need clarity. Clarity of the taxonomy, clarity of what we’re measuring, and it needs to be verifiable and credible. From an industry perspective, would anyone like to pick up, like, how’s that going to work? What’s in place now? What some of the challenges might be? How do you get organizational buy -in? Anything to add from an industry? Amanda, do you want to start us off?

Amanda Craig

Sure. I mean, I think there’s work to do across all the elements that Rebecca just laid out, and it’s really a reason why we are really invested in working with M.L. Cummins, because I think we need places that are bringing industry and and and civil society and stakeholders together to actually work through these problems and resolve these hard questions in ways that are really going to be sort of valid and reliable broadly. And so I think that’s really the work still ahead, but I think we are also making good progress, right? And thanks to ML Commons for helping to facilitate that. My thought on this is that we’ve been talking for years now about how nascent this field is and that actually to judge if we are actually making progress, this too could be standardized, right?

Like we don’t have common ways of assessing are we still in a nascent stage? What levels of uncertainty do we have? So to Rebecca’s point, I think this is absolutely essential so we can all align exactly on have we made some progress? We’ve made sufficient progress to start relying on these things. To what degree can we rely on them for important decision -making around deployments?

Esther Tetruashvily

yeah I think I’ll just add if we take this back down to the basics I think whether you’re an enterprise customer or you’re a consumer of our products you just want to know is this thing going to be accurate can I rely on this thing is this going to get me into trouble if I incorporate this in my workflows am I going to carry some sort of liability and at the core of standards is figuring out a way to have a common mechanism to provide an answer of reassurance you can trust us here’s a measurement certified by somebody else that this thing is reliable that this thing is accurate that I can rely on this thing and I can use this thing and I think we’re in this moment where we’re still trying to figure out as an industry and as a community about what that’s going to look like and so whether it’s advancing the measurement science because we currently don’t have enough of that in order to make sure that we can give an estimate of what is accurate what is reliable what is safe for specific risks or on the other side, what are the risks that we care about?

I think some risks might be some countries, some jurisdictions might have one list of risks. Other countries might have a different list of risks. And then there’s going to be a question of, like, how do you control for that, right? And that’s kind of what Rebecca Nemel -Commons and many others are working on, is how do you provide some sort of mechanism of credibility that says we’ve measured this, this thing is safe, that can then be certified, could be, you know, understood in the same way for everyone. So at the end of the day, in order for us to really unlock the value of this new technology that is transformative, I think many of us who are here today for the Indian Impact Summit recognize that potential.

We all also need to kind of answer those questions, and standards are the way you facilitate it.

Bhushan Sethi

Yeah, and so there’s a theme of trust that’s going through this. So maybe, Chris, add to that, and then I’ll add to that into a comment from a quote,

Chris Meserole

Yeah, just briefly, I think I also just want to situate how kind of benchmarking standards and some of the scientific questions we’ve been talking about fit in. Like there’s I think we’ve been talking a lot about different types of standards. I just want to clarify that there’s like a kind of broader, high -level set of process standards where you kind of say, all right, for this class of risk, what we’re going to do is we’re going to identify what the risk is. We’re then going to evaluate what that risk might actually be. And then we’re going to put in place certain kinds of mitigations and controls. Those are kind of, it’s a process for how you’re going to walk through risk management for something.

That absolutely needs to be standardized. But then even within that, once we get to, all right, once we have agreed on what the risk is that we’re trying to evaluate, how do we actually do that? And that’s where the standards come in for the benchmarks that we want to see developed. And that’s where some of these scientific questions, I think, really come into play because we need to have, you know, those kind of credible scientific evaluations and tests for the whole kind of broader risk management effort to hang together. And it’s, you know, again, critical, I think, for this whole process.

Bhushan Sethi

Yes, this has got to live next to the risk. Risk management, identification, mitigation strategy in any company. Go ahead, Jocelyn.

Joslyn Barnhart

I just had briefly. I think the possibility for comparison across models is also something that’s super important here. I think there’s an important safety dimension there. If we actually are all measuring the same thing and can give consumers some relative assessment of safety, of quality, this is actually going to potentially contribute to a race to the top as opposed to the bottom. And so we’re solving it.

Bhushan Sethi

So that’s the question of who we’re solving for. Two of the panelists have mentioned consumers. It’s not just about enterprise. It’s not just about government. It’s all about consumer trust. Essie, what would you add?

Etienne Chaponniere

What I wanted to add is the fact that here when we’re talking in general about trying to create standards to resolve the type of safety risk that we’re going to see, it’s just also to reassure the audience that it’s not that we’re trying to solve every single risk that happens. There is a huge amount of existing standard bodies, whether it’s in ISO and SensenELEC and other places, where they already have identified risk for their particular verticals or their particular… not silos, but the particular industries, those are already at work, right? So how they’re going to use AI, how the AI is going to be effectively, the AI safety is going to be translated to their own processes.

Those things are already happening, right? So it’s not only the people on this panel who are working on this, the entire community of standards, whether it’s in automotive, radio equipment directive, everything is already, everybody’s already looking at that, right? In the end, the difficult part is going to be to make sure that there is a commonality in terms of the type of techniques that we’re using whenever there’s an automated technique that we can use. Because from an industry standpoint, what is really useful, in particular if you’re a smaller company, is to make sure that you can run something efficiently and it addresses as much as the use cases that you run as possible. So that is an important thing that we need to keep in mind when we’re doing this.

So it’s why, I mean, from Qualcomm, obviously, we don’t address every single thing, but we want to make sure that at least in the areas we’re involved, there’s going to be as much as a commonality in terms of the measurement techniques that we’re going to use.

Bhushan Sethi

So consensus around the need to do it, consensus around the fact that it’s hard, but it’s important for consumers and business and investors. But Jocelyn made a point that we’ve been talking about how this is a nascent topic, et cetera. I want to look forward. What over the next two years does this look like? What have we got to get right? The models are changing. There could be regulation that changes. There could be changes around China, U.S. operating in different ways. What does this topic look like? How do we make sure we stay the course on this topic? Anyone want to offer a perspective as we look forward? And then we’ll start wrapping up. And thinking about questions so we can get questions from the audience.

I’ll take a crack at it. So at least from my perspective, there are a couple of things that I hope to see over the next couple of years. One is that I think this idea of benchmarks and other standards representing consensus, we should be seeing more things like certification that represent more types of consensus. If benchmarking represents consensus around how to estimate and measure a thing, certification could end up representing agreement. A definition of what is good enough deserves some form of certification. I don’t know necessarily what that’s going to look like today, but I have to imagine that those sort of represent truces, the temporary agreements about this is good enough for my industry, this is good enough for my deployment, this is good enough for my use case.

So that’s what I’m hoping we start to see over the next two years. Anyone else want to add to that? Because, I mean, Chris, jump in, but we’ve seen some of these disclosures in the past, and people commit to environmental goals or DEI goals or other set of standards or disclosures. Stakeholder capitalism was a big deal, and now it’s more about shareholders. So I’d love to understand our perspective on how do we stay the course.

Chris Meserole

Yeah, I might distinguish a little bit between how do we future -proof these standards and then how do we kind of ensure that they’re implemented over time. And I think the way that we future -proof them is to some extent to go back to the point I was making earlier about process standards, right? The process is somewhat agnostic to the actual kind of, you know, AI system itself and the capabilities it has. If you have a good process for identifying risks, evaluating risks, that process can kind of be a bit future -proofed. The specific evals you run are probably going to have to be updated over time to account for the greater capabilities of models as they advance, right?

And I think… similar with some of the controls that might need to be kind of used to manage some of the risks if there’s certain thresholds or kind of if the evaluations kind of indicate a certain level of risk, right? So the subcomponents of it might need to be evaluated. The overarching framework hopefully can kind of have some legs behind it over time in terms of future -proofing it. So we must commit to a process. We can’t future -proof because we can’t predict the future, but the process is so important. Even a good example of this would be something like the, I think, 40 ,001 has come up a few times. Like there’s a certain class of AI that 40 ,001 is very kind of tailored to, but even that AI has changed over time.

But 40 ,001 is still a very good kind of standard for managing those kinds of risks for those kind of applications of AI across a broad array of machine learning algorithms. But the other point that I would make in terms of, you know, you alluded to some of the kind of implementation of standards over time and making sure that they have the same currency to them. And there, I think we can rely on some of the incentives and the need, again, for there to be collective action on this that we’ve talked about before. Some of the incentive to make sure that there’s a collective action problem is going to rest with policymakers, which is why you’ve seen some regulatory activity.

Even in areas where there’s not, to Juan C.’s point, there’s a clear market need for these standards to be developed and implemented over time because consumers want to see, you know, they want to trust that the, you know, whether it’s individual consumers or enterprise, they want to trust that the model is actually safe and secure to use. And so I don’t see kind of the standards, the importance of standards diminishing over time. In fact, if anything, as the capabilities advance, consumers and enterprises are going to be more and more interested in making sure that they

Bhushan Sethi

Yes, it’s going to be consumer -driven. Juan C., just from a regulatory perspective, any thoughts? Chris mentioned implementation. Which is the hard stuff of where lots of this stuff gets stuck. Any perspective on implementation or from your experience as a regulator to add here?

Lee Wan Sie

Implementation of standards? Yes. I mean, Chris put it very well, right? One, regulators could say, I expect you to comply with certain requirements and this is how you do it. And that’s where the standards set on how you do it. Or regulators can don’t provide certain requirements or certain expectations. And the market sets out these requirements and these expectations. If you do it, then we will buy your product, for example. So I think from an implementation point of view, I think there will be some momentum, either from the market or from regulations, to move standards. But I think where I think, back to your original question, what’s going to happen in two years, I hope we can actually move faster on standards in terms of the definitions of standards.

I think that would be super useful. We’re leading some work on testing, well, benchmarking and rate teaming, primarily methodology definition. But… Yeah. We hope that in the next one year that can be done and sorted and accepted within the ISO process. But the experience has shown us that it takes a while. So in the next few years, hopefully we will find a way in which we can move to standards faster.

Bhushan Sethi

So we need to move with speed from a regulatory perspective. Amanda is going to have the last word and then we’re going to go to questions. So please prepare them. Amanda?

Amanda Craig

I didn’t realize that. No, the one thing I wanted to add in terms of like a goal for where we can find ourselves two years from now is thinking about like a system of standards that are interoperable where we have a sort of modular approach, right, where across like general purpose technology and, for example, in different sort of deployment scenarios, different use cases, different sectors, we actually can get some efficiency from, you know, these standards are all going to need to continuously evolve and improve and we’re going to learn from the science. And we’re going to keep evolving the benchmarks and the kind of methodology around the evaluations. But we don’t want to like keep starting from scratch with every piece of that, you know, puzzle.

And so we need to figure out a way to actually ensure that. like where we are making progress on the evaluation science and how we are doing this in the context of like evaluating AI models or systems and then how we are evaluating AI and deployment in like critical sectors, for example, we actually have some synergy built into the standards ecosystem so that we are making kind of more dynamic progress across everything at the same time.

Bhushan Sethi

Yeah, so it needs to be interoperable and we can’t keep reinventing the wheel. So audience, questions? I’m going to collect questions, maybe three to five. So the gentleman at the front, the gentleman at the back, and then the lady with the hand up.

Audience

Hi there. Thanks for taking my question. Maybe I have a bit of a tricky question for you. You know, on the panel, obviously, we have a lot of commercial interests. My question is this. How do we know in your assurance program or whatever you’re proposing that it’s going to be done since it’s driven primarily by industry, how do we know that you’re not just going to create something that cheaply satisfies the industry in front of… of us versus what the public actually needs. And assuming you do have a program that you’re going to talk about, how does a government or external agency audit such a program, given the skill gap to create such a very sophisticated compliance program, how can world governments come?

Because I’ve been on a lot of panels this week. The fear, uncertainty, and doubt is not only just the policy gap. It’s actually the technical gap, the inability of world governments to audit properly whatever you have. Thank you.

Bhushan Sethi

Thank you. So keep the questions brief. Thank you for that. So that’s about, like, how do we make it real? How do we make it not performative? I’m going to collect two other questions, and then we’ll throw them to the panelists. So keep your hands raised. We have a gentleman at the back. And I think there was a lady or a gentleman with a tie. Yeah, hi.

Audience

So… As a recent computer science student, I’m interested in building AI for India. Specifically with such a distinguished panel, I thought I’d shoot my shot. I’m a little nervous, so I apologize about that. I want to talk specifically about language bias. Being in India, there are 22 official languages, and I’m constantly thinking in two to three different languages. And when I utilize tools, such amazing tools built by everybody here, I’m wondering how you guys would go about tackling language bias and building guardrails around that to ensure that, you know, a small model like a student like me is making does not go haywire. Yeah, great

Bhushan Sethi

question about language. Thank you, sir. And then, gentleman with a tie. Which doesn’t mean, like, more gentlemen wear ties, but, yes, please. Hi, Jules

Audience

Polonetsky at the Future of Privacy Forum and our AI Governance Center. The standards always say… seem to be an easier path when they are more technical than… and challenging social policy, and AI governance seems to capture the most broad potential collections of social policy. And given that there’s a lot of disagreement and some debate over whether one should even measure certain areas, do you imagine that we’re talking about minimum viable consensus with the broadest number of stakeholders, or is there a path to in some way address some issues that some stakeholders see as absolutely necessary and others don’t want on the table? Yep. All

Bhushan Sethi

right. Soundbite responses panel. Like how do we make it real? How do we deal with the skills gap? How do we deal with the MVP? Anyone? Go on, Jocelyn. On the

Joslyn Barnhart

performative question, I think now that standards have been referred to within actual regulation, I think to the extent that we want to use these standards as evidence of conformity with those particular regulations, that’s set up a lot of the work that we’re doing. that’s a kind of minimum bar at the very least, because I think if we make these things too high level, too abstract, or too essentially lowest common denominator, I don’t think regulators are going to look at those standards as evidence of conformity. So I think there is that kind of interlocking pressure created by the regulation itself for some sort of degree of quality. Thank you.

Bhushan Sethi

And Esther, do you want to comment on the language perspective and how you’re thinking about that at OpenAI? Thank you.

Esther Tetruashvily

Yes, we do a series of evaluations like MMLU for determining how well our models perform on a variety of languages. We also have a specific test actually in QA. There’s also a specific test in QA that we also kind of test our models on that has a variety of dialects within India. So I think the short answer is that this is an area where we need more participants. And I believe ML Commons is playing an active role in helping further our capacity building. And I think working with local ecosystems to help clean and collect good data so that we can do this appropriately. This is another area, right, just like we’ve been saying, where we need to work in partnership to figure out how do we both collect the type of information, how do we measure this stuff, how do we build the evaluations, and then how do we build an industry standard where all of the actors are kind of held to that standard.

And it’s going to have to be a collective effort. Yeah. Okay.

Etienne Chaponniere

Just to add a little bit on the question regarding the language. In the end, I don’t think there’s like a – there’s no silver bullet solution, right? There’s going to be a need to have this type of – Either safety test or safety prompt. which are required for different type of languages. And you’re not going to be able to address every single thing because there’s just a huge amount of diversity. I mean, take me. I’m French from cultural background. I speak English and think in French and English all the time. There’s weird stuff that I say that will not be captured by a model that’s only for American English, right? So there’s going to be a need for more than one language which are captured, and probably a lot of them, but this is where the community of basically everybody needs to come and say, hey, this is what I want to capture for my type of language.

What matters to make sure that there is scale and that it still remains efficient is that hopefully the tool and the software framework around it can be reused. And that’s really a big advantage for that. Thank you.

Bhushan Sethi

So in summary, and thank you, dear panelists, for the great discussion. So you heard today that standards are important. This is a fast -moving world. We’ve got to be designing for consumers, for business people. There’s a commitment. There’s a commitment here around measurement. It’s both art and science. We need to have the process that’s consistent. But across regulators, across standard -setters, around policymakers, and the business and the tech community, there’s a consistent understanding. So it’s going to be an emerging topic, which I know we’ll continue to discuss. Thank you, panelists, and thank you to the audience. Thank you. Thank you. Thank you. Thank you. Thank you.

B

Bhushan Sethi

Speech speed

110 words per minute

Speech length

1735 words

Speech time

943 seconds

Trust and collective‑action imperative

Explanation

Bhushan stresses that trust is the central theme of AI standards and that building consumer confidence is essential for collective‑action across the ecosystem.


Evidence

“Yeah, and so there’s a theme of trust that’s going through this.” [7]. “It’s all about consumer trust.” [8].


Major discussion point

Purpose of AI standards: building trust and solving collective action


Topics

Artificial intelligence | Building confidence and security in the use of ICTs


Benchmarking represents consensus

Explanation

He argues that benchmarks create a shared understanding of measurement, and that certification can embody that consensus as a form of agreement.


Evidence

“One is that I think this idea of benchmarks and other standards representing consensus, we should be seeing more things like certification that represent more types of consensus.” [3]. “If benchmarking represents consensus around how to estimate and measure a thing, certification could end up representing agreement.” [5].


Major discussion point

Measurement and benchmarking methodology


Topics

Artificial intelligence | Monitoring and measurement


C

Chris Meserole

Speech speed

204 words per minute

Speech length

1311 words

Speech time

385 seconds

Standards solve collective‑action and give policy legitimacy

Explanation

Chris points out that without standardisation the collective‑action problem cannot be resolved, and that standards provide a legitimate basis for policy and adoption.


Evidence

“That absolutely needs to be standardized.” [24].


Major discussion point

Purpose of AI standards: building trust and solving collective action


Topics

Artificial intelligence | The enabling environment for digital development


J

Joslyn Barnhart

Speech speed

188 words per minute

Speech length

459 words

Speech time

146 seconds

Standards provide a safety floor and strategic incentive

Explanation

Joslyn notes that having a recognised safety dimension through standards gives industry a baseline to operate safely while also creating strategic incentives.


Evidence

“I think there’s an important safety dimension there.” [11].


Major discussion point

Purpose of AI standards: building trust and solving collective action


Topics

Artificial intelligence | Building confidence and security in the use of ICTs


L

Lee Wan Sie

Speech speed

171 words per minute

Speech length

917 words

Speech time

320 seconds

Standards useful even without regulation for differentiation

Explanation

Lee argues that certification and assurance can be valuable market signals even when formal regulation is absent, helping firms differentiate themselves.


Evidence

“So the certification assurance part is helpful.” [1].


Major discussion point

Purpose of AI standards: building trust and solving collective action


Topics

Artificial intelligence | Building confidence and security in the use of ICTs


Regulators can cite technical standards to define compliance

Explanation

He highlights that regulators can reference technical standards to set compliance expectations without prescribing every detail.


Evidence

“But that’s where standards, again, in terms of alignment, that might be one that would be useful to find alignment in these areas.” [21].


Major discussion point

Global cooperation, regulatory interplay and local adaptation


Topics

Artificial intelligence | The enabling environment for digital development


R

Rebecca Weiss

Speech speed

205 words per minute

Speech length

679 words

Speech time

197 seconds

Benchmarking needs clear methodology, taxonomy, data sets and reference implementations

Explanation

Rebecca stresses that a robust benchmarking framework is required to overcome the current barrier of measuring risk and uncertainty in AI systems.


Evidence

“So for us, when we see what’s happening in the world today, the ability to measure risk is a big barrier to adoption and that ability to understand and estimate the uncertainty around the behavior of an AI system is something where we think benchmarking can help.” [48].


Major discussion point

Measurement and benchmarking methodology


Topics

Artificial intelligence | Monitoring and measurement


Benchmarking enables uncertainty estimation to decide what is “good enough”

Explanation

She adds that benchmarking helps decide sector‑specific thresholds for acceptable performance by quantifying uncertainty.


Evidence

“So for us, when we see what’s happening in the world today, the ability to measure risk is a big barrier to adoption and that ability to understand and estimate the uncertainty around the behavior of an AI system is something where we think benchmarking can help.” [48].


Major discussion point

Measurement and benchmarking methodology


Topics

Artificial intelligence | Monitoring and measurement


A

Amanda Craig

Speech speed

180 words per minute

Speech length

984 words

Speech time

327 seconds

Internal responsible‑AI standards create a common language across product, engineering and sales

Explanation

Amanda describes a vision of interoperable, modular standards that can be reused across deployment scenarios, creating a shared vocabulary for different functions.


Evidence

“No, the one thing I wanted to add in terms of like a goal for where we can find ourselves two years from now is thinking about like a system of standards that are interoperable where we have a sort of modular approach, right, where across like general purpose technology and, for example, in different sort of deployment scenarios, different use cases, different sectors, we actually can get some efficiency from, you know, these standards are all going to need to continuously evolve and improve and we’re going to learn from the science.” [17].


Major discussion point

Implementation, inclusivity and scalability


Topics

Artificial intelligence | Capacity development


Working with ML Commons helps bring together industry, civil‑society and stakeholders

Explanation

She notes that ongoing evolution of benchmarks and methodologies, driven by collaborative platforms, helps resolve hard technical questions.


Evidence

“And we’re going to keep evolving the benchmarks and the kind of methodology around the evaluations.” [41].


Major discussion point

Implementation, inclusivity and scalability


Topics

Artificial intelligence | Internet governance


E

Esther Tetruashvily

Speech speed

180 words per minute

Speech length

1072 words

Speech time

355 seconds

Certification (ISO 42001) and a public safety hub give measurable trust signals

Explanation

Esther points out that OpenAI’s ISO 42001 certification and its publicly updated safety hub provide concrete, market‑visible assurances of risk management.


Evidence

“There’s an existing risk management standard, ISO 42001, that OpenAI just got certified in.” [2]. “So we currently published a safety hub that gets updated regularly that kind of tells how we’re performing in a variety of metrics and what are the best methodologies and how to work with this.” [10].


Major discussion point

Measurement and benchmarking methodology


Topics

Artificial intelligence | Building confidence and security in the use of ICTs


Multilingual evaluation suites detect and mitigate language bias

Explanation

She explains that OpenAI uses multilingual benchmarks such as MMLU and India‑dialect tests to surface and address language bias in models.


Evidence

“Yes, we do a series of evaluations like MMLU for determining how well our models perform on a variety of languages.” [30]. “There’s also a specific test in QA that we also kind of test our models on that has a variety of dialects within India.” [32].


Major discussion point

Specific challenges: language bias, consumer trust and risk communication


Topics

Closing all digital divides | Artificial intelligence


E

Etienne Chaponniere

Speech speed

194 words per minute

Speech length

1066 words

Speech time

328 seconds

Open, inclusive governance lets smaller firms adopt standards without building their own risk‑management systems

Explanation

Etienne stresses that standards should be framed so that even small organisations can plug them into their processes without the overhead of creating bespoke risk‑management frameworks.


Evidence

“So how they’re going to use AI, how the AI is going to be effectively, the AI safety is going to be translated to their own processes.” [45].


Major discussion point

Implementation, inclusivity and scalability


Topics

Artificial intelligence | Capacity development


Standards must be easy to integrate into engineering pipelines and product life‑cycles

Explanation

He notes that standards should be lightweight and directly usable within existing development workflows.


Evidence

“which are required for different type of languages.” [35].


Major discussion point

Implementation, inclusivity and scalability


Topics

Artificial intelligence | The enabling environment for digital development


K

Kshitij Bathla

Speech speed

149 words per minute

Speech length

526 words

Speech time

210 seconds

Global alignment needed but standards must be adaptable to national use‑cases (e.g., India’s “Manav” mission)

Explanation

Kshitij highlights that while standards should be globally consistent, they must allow for local adaptations such as India’s human‑centric “Manav” mission.


Evidence

“So I would say the Manav mission, it’s welfare, human -centric, and all those aspects are there.” [19].


Major discussion point

Global cooperation, regulatory interplay and local adaptation


Topics

Artificial intelligence | Closing all digital divides


A

Audience

Speech speed

159 words per minute

Speech length

387 words

Speech time

145 seconds

Governments face skill gaps that make auditing sophisticated compliance programs difficult

Explanation

The audience raises concerns about the capacity of governments to audit complex AI compliance regimes, pointing to broader skill‑gap challenges.


Evidence

“I want to talk specifically about language bias.” [31].


Major discussion point

Global cooperation, regulatory interplay and local adaptation


Topics

Capacity development | The enabling environment for digital development


Agreements

Agreement points

Standards are essential for building trust and enabling AI adoption

Speakers

– Kshitij Bathla
– Chris Meserole
– Lee Wan Sie
– Amanda Craig
– Esther Tetruashvily
– Rebecca Weiss
– Joslyn Barnhart

Arguments

Standards enable consumer trust and industry quality assurance in AI ecosystems


Standards solve collective action problems by ensuring no actor is disadvantaged while managing AI risks


Standards provide alignment on what constitutes “good” practices in AI governance and create common methodologies


Standards create common language for risk management across AI supply chains and enable compliance with regulations


Standards translate risk management practices into language customers can understand and create consumer trust


Standards represent consensus about what is “good enough” and need diverse stakeholder input beyond just industry


Standards help avoid safety incidents that would harm industry adoption and provide strategic advantage


Summary

All speakers agree that standards are fundamental for building trust between consumers and AI providers, enabling adoption, and creating common frameworks for risk management across the AI ecosystem


Topics

Artificial intelligence | Building confidence and security in the use of ICTs


Standards must address measurement and benchmarking challenges with focus on uncertainty estimation

Speakers

– Rebecca Weiss
– Amanda Craig
– Esther Tetruashvily
– Joslyn Barnhart

Arguments

Benchmarking requires methodology definition and technical artifacts, focusing on measuring risk and uncertainty rather than binary safety assessments


Need for common mechanisms to assess progress and reliability, moving beyond nascent stage discussions


Standards must provide credible measurement that can be certified and understood universally


Comparison across models enables race to the top rather than bottom in safety and quality


Summary

Speakers agree that effective standards require robust measurement methodologies that estimate uncertainty rather than providing binary assessments, enabling credible comparisons across AI systems


Topics

Artificial intelligence | Monitoring and measurement


Global cooperation and inclusive participation are necessary for effective standards

Speakers

– Kshitij Bathla
– Chris Meserole
– Etienne Chaponniere
– Rebecca Weiss
– Bhushan Sethi

Arguments

Standards bodies are interconnected globally, creating collaborative rather than siloed approaches


Multiple jurisdictions recognize frontier AI risks and delegate standard-setting to technical bodies rather than specifying requirements directly


Standards enable accessibility for smaller companies that lack resources to develop their own risk management systems


Standards represent consensus about what is “good enough” and need diverse stakeholder input beyond just industry


Standards panel representation should include diverse stakeholders from standard setters, industry, policy, and regulatory environments


Summary

All speakers emphasize the importance of global cooperation and inclusive participation from diverse stakeholders, including smaller companies and various constituencies beyond just industry


Topics

Artificial intelligence | The enabling environment for digital development | Human rights and the ethical dimensions of the information society


Standards must be interoperable and avoid reinventing solutions for each use case

Speakers

– Amanda Craig
– Etienne Chaponniere
– Chris Meserole

Arguments

Standards ecosystem must be interoperable and modular to avoid reinventing approaches for each use case


Existing standards bodies in various verticals are already addressing AI integration into their specific risk frameworks


Process standards can be future-proofed while specific evaluations need updating as AI capabilities advance


Summary

Speakers agree that standards should be designed with interoperability and modularity in mind, building on existing frameworks while avoiding duplication of effort across different sectors and use cases


Topics

Artificial intelligence | The enabling environment for digital development


Implementation requires both regulatory support and market incentives

Speakers

– Chris Meserole
– Lee Wan Sie
– Joslyn Barnhart

Arguments

Market incentives and regulatory pressure will drive implementation as consumers demand trusted AI systems


Standards provide differentiation mechanism even without regulations, as demonstrated by voluntary certifications


Standards help avoid safety incidents that would harm industry adoption and provide strategic advantage


Summary

Speakers agree that successful implementation of AI standards will be driven by both regulatory frameworks and market forces, with companies having incentives to adopt standards even without mandatory requirements


Topics

Artificial intelligence | The digital economy | The enabling environment for digital development


Similar viewpoints

Both speakers from major tech companies emphasize that standards serve as translation mechanisms between internal company practices and external stakeholder understanding, facilitating trust and compliance

Speakers

– Esther Tetruashvily
– Amanda Craig

Arguments

Standards translate risk management practices into language customers can understand and create consumer trust


Standards create common language for risk management across AI supply chains and enable compliance with regulations


Topics

Artificial intelligence | Building confidence and security in the use of ICTs


Both speakers from AI safety organizations emphasize that standards address collective action problems in the industry, ensuring no single actor is disadvantaged while promoting overall safety

Speakers

– Chris Meserole
– Joslyn Barnhart

Arguments

Standards solve collective action problems by ensuring no actor is disadvantaged while managing AI risks


Standards help avoid safety incidents that would harm industry adoption and provide strategic advantage


Topics

Artificial intelligence | Building confidence and security in the use of ICTs


Both speakers emphasize the democratizing effect of standards, making AI development accessible to smaller companies while ensuring quality and trust across the ecosystem

Speakers

– Etienne Chaponniere
– Kshitij Bathla

Arguments

Standards enable accessibility for smaller companies that lack resources to develop their own risk management systems


Standards enable consumer trust and industry quality assurance in AI ecosystems


Topics

Artificial intelligence | The enabling environment for digital development


Both speakers from policy/governance backgrounds emphasize the urgency of developing standards while recognizing the need for adaptable frameworks that can evolve with technology

Speakers

– Lee Wan Sie
– Chris Meserole

Arguments

Need for faster movement on standards definition, particularly in testing and benchmarking methodologies


Process standards can be future-proofed while specific evaluations need updating as AI capabilities advance


Topics

Artificial intelligence | Monitoring and measurement


Unexpected consensus

Voluntary adoption of standards without regulatory mandate

Speakers

– Lee Wan Sie
– Esther Tetruashvily
– Chris Meserole

Arguments

Standards provide differentiation mechanism even without regulations, as demonstrated by voluntary certifications


Standards translate risk management practices into language customers can understand and create consumer trust


Market incentives and regulatory pressure will drive implementation as consumers demand trusted AI systems


Explanation

Despite representing different sectors (government, industry, safety organization), speakers unexpectedly agreed that standards have value and will be adopted even without regulatory requirements, driven by market forces and competitive differentiation


Topics

Artificial intelligence | The digital economy


Industry acknowledgment of need for external stakeholder participation

Speakers

– Rebecca Weiss
– Esther Tetruashvily
– Amanda Craig

Arguments

Standards represent consensus about what is “good enough” and need diverse stakeholder input beyond just industry


Language bias and multilingual challenges require community participation and local ecosystem collaboration


Standards must address both general-purpose AI models and sector-specific deployment scenarios


Explanation

Industry representatives unexpectedly showed strong agreement on the need for broader stakeholder participation beyond just industry voices, acknowledging limitations of industry-only perspectives


Topics

Artificial intelligence | Human rights and the ethical dimensions of the information society


Recognition of technical and capacity limitations in government oversight

Speakers

– Audience
– Lee Wan Sie
– Chris Meserole

Arguments

Government skill gaps in auditing sophisticated AI compliance programs pose implementation challenges


Need for faster movement on standards definition, particularly in testing and benchmarking methodologies


Multiple jurisdictions recognize frontier AI risks and delegate standard-setting to technical bodies rather than specifying requirements directly


Explanation

There was unexpected consensus between audience concerns and speaker acknowledgments about government capacity limitations, with even policy representatives agreeing that technical standards development should be delegated to specialized bodies


Topics

Artificial intelligence | Capacity development


Overall assessment

Summary

The discussion revealed remarkably high consensus across diverse stakeholders on the fundamental need for AI standards, their role in building trust and enabling adoption, the importance of measurement and benchmarking, and the necessity of global cooperation. Key areas of agreement included the value of standards for collective action, the need for inclusive participation, and the importance of interoperable frameworks.


Consensus level

Very high level of consensus with no significant disagreements identified. This strong alignment across industry, government, standards bodies, and safety organizations suggests a mature understanding of the challenges and a shared commitment to collaborative solutions. The consensus extends beyond just the need for standards to specific approaches for implementation, measurement, and governance, indicating readiness for concrete action in AI standards development.


Differences

Different viewpoints

Speed vs. thoroughness in standards development

Speakers

– Lee Wan Sie
– Chris Meserole

Arguments

Need for faster movement on standards definition, particularly in testing and benchmarking methodologies


Process standards can be future-proofed while specific evaluations need updating as AI capabilities advance


Summary

Lee Wan Sie emphasizes the urgent need to accelerate standards development, noting that current ISO processes are too slow for the rapidly evolving AI landscape. Chris Meserole focuses on creating robust, future-proofed process standards that can accommodate changing AI capabilities over time, suggesting a more methodical approach.


Topics

Artificial intelligence | Monitoring and measurement


Industry-led vs. multi-stakeholder standards development

Speakers

– Rebecca Weiss
– Audience

Arguments

Standards represent consensus about what is ‘good enough’ and need diverse stakeholder input beyond just industry


Industry-driven standards risk serving commercial interests over public needs, requiring external audit capabilities


Summary

Rebecca Weiss acknowledges that standards consensus shouldn’t be exclusively from industry perspective but should include diverse stakeholders. The audience member goes further, expressing concern that industry-driven standards may prioritize commercial interests over public needs and questioning the legitimacy of industry-led processes.


Topics

Artificial intelligence | Human rights and the ethical dimensions of the information society


Scope of standardization – comprehensive vs. targeted approach

Speakers

– Etienne Chaponniere
– Amanda Craig

Arguments

Existing standards bodies in various verticals are already addressing AI integration into their specific risk frameworks


Standards ecosystem must be interoperable and modular to avoid reinventing approaches for each use case


Summary

Etienne emphasizes that existing vertical-specific standards bodies are already working on AI integration and suggests focusing on commonality in measurement techniques. Amanda advocates for a more comprehensive, interoperable system that works across general-purpose technology and different deployment scenarios to avoid fragmentation.


Topics

Artificial intelligence | The enabling environment for digital development


Unexpected differences

Government capacity for oversight

Speakers

– Audience
– Joslyn Barnhart

Arguments

Government skill gaps in auditing sophisticated AI compliance programs pose implementation challenges


Standards help avoid safety incidents that would harm industry adoption and provide strategic advantage


Explanation

Topics

Artificial intelligence | Capacity development


Social policy vs. technical standards

Speakers

– Audience
– Rebecca Weiss

Arguments

Social policy disagreements in AI governance require either minimum viable consensus or addressing stakeholder priorities differently


Standards represent consensus about what is ‘good enough’ and need diverse stakeholder input beyond just industry


Explanation

Topics

Artificial intelligence | Human rights and the ethical dimensions of the information society


Overall assessment

Summary

The discussion revealed relatively low levels of fundamental disagreement among panelists, with most tensions arising around implementation approaches rather than core objectives. Key areas of disagreement included the pace of standards development, the appropriate balance between industry leadership and multi-stakeholder involvement, and whether to pursue comprehensive or targeted standardization approaches.


Disagreement level

The disagreement level was moderate and constructive, focusing on methodological differences rather than fundamental opposition to AI standards. However, audience questions revealed a more significant gap between industry perspectives and public concerns about accountability and legitimacy. The implications suggest that while technical experts largely agree on the need for and approach to AI standards, broader stakeholder engagement and addressing capacity gaps for oversight remain significant challenges for successful implementation.


Partial agreements

Partial agreements

Similar viewpoints

Both speakers from major tech companies emphasize that standards serve as translation mechanisms between internal company practices and external stakeholder understanding, facilitating trust and compliance

Speakers

– Esther Tetruashvily
– Amanda Craig

Arguments

Standards translate risk management practices into language customers can understand and create consumer trust


Standards create common language for risk management across AI supply chains and enable compliance with regulations


Topics

Artificial intelligence | Building confidence and security in the use of ICTs


Both speakers from AI safety organizations emphasize that standards address collective action problems in the industry, ensuring no single actor is disadvantaged while promoting overall safety

Speakers

– Chris Meserole
– Joslyn Barnhart

Arguments

Standards solve collective action problems by ensuring no actor is disadvantaged while managing AI risks


Standards help avoid safety incidents that would harm industry adoption and provide strategic advantage


Topics

Artificial intelligence | Building confidence and security in the use of ICTs


Both speakers emphasize the democratizing effect of standards, making AI development accessible to smaller companies while ensuring quality and trust across the ecosystem

Speakers

– Etienne Chaponniere
– Kshitij Bathla

Arguments

Standards enable accessibility for smaller companies that lack resources to develop their own risk management systems


Standards enable consumer trust and industry quality assurance in AI ecosystems


Topics

Artificial intelligence | The enabling environment for digital development


Both speakers from policy/governance backgrounds emphasize the urgency of developing standards while recognizing the need for adaptable frameworks that can evolve with technology

Speakers

– Lee Wan Sie
– Chris Meserole

Arguments

Need for faster movement on standards definition, particularly in testing and benchmarking methodologies


Process standards can be future-proofed while specific evaluations need updating as AI capabilities advance


Topics

Artificial intelligence | Monitoring and measurement


Takeaways

Key takeaways

AI standards are essential for building consumer trust, enabling industry quality assurance, and solving collective action problems in AI risk management


Standards should focus on process frameworks that can be future-proofed rather than specific technical requirements that will quickly become outdated


Measurement and benchmarking must estimate uncertainty rather than provide binary safety assessments, requiring consensus on what constitutes ‘good enough’ for different use cases


Global cooperation on AI standards is achievable through interconnected standards bodies, even without unified global AI regulations


Standards serve multiple purposes: regulatory compliance, market differentiation, risk management translation across supply chains, and enabling smaller companies to access best practices


Implementation requires interoperable and modular standards ecosystems to avoid reinventing approaches for each sector or use case


Language bias and multilingual challenges require community participation and collaboration with local ecosystems to ensure inclusive AI development


Market incentives and consumer demand for trusted AI systems will drive standards adoption, supplemented by regulatory pressure where it exists


Resolutions and action items

ML Commons and other standards bodies to continue developing benchmarking methodologies and technical artifacts for risk measurement


Industry participants to work collectively on certification mechanisms that represent consensus on ‘good enough’ standards for specific deployments


Standards organizations to accelerate the pace of standards definition, particularly in testing and benchmarking methodologies


Continued collaboration between industry, regulators, and standards bodies to develop process standards that can accommodate advancing AI capabilities


Development of reusable software frameworks for language-specific safety testing while accommodating diverse linguistic and cultural contexts


Unresolved issues

How to ensure industry-driven standards serve public needs rather than just commercial interests, and how governments can develop audit capabilities given technical skill gaps


How to balance minimum viable consensus with addressing stakeholder priorities that some groups see as essential while others resist


How to handle disagreements over social policy aspects of AI governance within technical standards frameworks


How to scale standards development to accommodate the vast diversity of languages, dialects, and cultural contexts globally


How to maintain standards relevance and implementation as AI capabilities rapidly advance and new risks emerge


How to coordinate between existing vertical industry standards and new AI-specific standards to avoid conflicts or gaps


Suggested compromises

Focus on process standards that are capability-agnostic while allowing specific evaluations and controls to be updated as technology advances


Develop modular, interoperable standards systems that can be adapted across different sectors and use cases without starting from scratch


Use regulatory references to standards as a quality floor while allowing market forces to drive higher standards through competitive differentiation


Combine global standards frameworks with local adaptations for specific use cases, languages, and cultural contexts


Balance technical measurement capabilities with statistical uncertainty estimation rather than demanding absolute safety guarantees


Create open governance models in standards bodies while providing accessible implementation tools for smaller companies with limited resources


Thought provoking comments

In the space of AI at the moment, actually, regulation has gone ahead and jumped to, you know, we’ve regulated and essentially made reference to standards that do not yet exist. So for places like Google DeepMind who have not invested heavily in the standard space in the past, this is now of an utmost priority because we actually need this to assist with implementation and compliance.

Speaker

Joslyn Barnhart


Reason

This comment reveals a critical paradox in AI governance – that regulations are being written that reference non-existent standards, creating an urgent need for industry to catch up. It highlights the cart-before-horse nature of current AI regulation and explains why major tech companies are suddenly prioritizing standards work.


Impact

This comment fundamentally reframed the discussion from ‘why do we need standards?’ to ‘we urgently need standards because regulations already assume they exist.’ It shifted the conversation from theoretical benefits to practical necessity and helped explain the sudden industry urgency around standards development.


The problem that we have is who contributes to that consensus. It shouldn’t probably be exclusively an industry perspective. You need to have more stakeholders or more constituencies that need to be represented in that definition… there’s a scientific element to that… but then there’s also the political element to that.

Speaker

Rebecca Weiss


Reason

This comment cuts to the heart of legitimacy in standards-setting by identifying the tension between technical expertise and democratic representation. It acknowledges that defining ‘good enough’ isn’t purely technical but involves political and social value judgments.


Impact

This comment introduced crucial complexity to the discussion by highlighting that standards aren’t neutral technical artifacts but involve political choices about acceptable risk. It prompted deeper consideration of governance and representation in standards bodies, moving beyond purely technical discussions.


You’re trying to provide a sense of, I’m not going to tell you that your system is, quote-unquote, safe or not. What I’m going to tell you is, under these considerations, under these conditions, under these assumptions, the estimated likelihood of a particular risky behavior is X. And then it is up to you as a risk management professional, a deployer, a developer, it’s up for you to decide, is that enough?

Speaker

Rebecca Weiss


Reason

This comment fundamentally reframes AI safety from binary safe/unsafe determinations to probabilistic risk assessment with contextual decision-making. It clarifies that standards provide information for decision-making rather than making the decisions themselves.


Impact

This shifted the entire framing of the discussion from seeking absolute safety guarantees to understanding uncertainty quantification and risk management. It helped other panelists align on what standards can and cannot do, leading to more nuanced discussions about implementation across different sectors with different risk tolerances.


If you look at the companies who have the type of resources to either set up their own standards and risk management systems internally, they’re typically pretty big companies… there’s a huge amount of companies who are being created every day, and they don’t have the resources to put this together… having the standard as effectively a mechanism for them to go directly to product and know that they’re going to comply with what the world or the community has set up is really important.

Speaker

Etienne Chaponniere


Reason

This comment highlights a critical equity issue in AI development – that without accessible standards, only large companies can afford proper risk management, potentially creating barriers to entry for smaller innovators. It reframes standards as democratizing tools rather than bureaucratic burdens.


Impact

This comment broadened the discussion beyond big tech companies to consider the broader AI ecosystem, including startups and smaller players. It added an inclusion and accessibility dimension to the standards conversation and helped justify why open, accessible standards are crucial for innovation equity.


Some of the mitigations we’re talking about associated with some of the more extreme risks that Frontier AI poses can be quite costly. And so I do think that there is just a strong industry incentive to work together to resolve this collective action problem… The worst thing for adoption would be a safety incident.

Speaker

Joslyn Barnhart


Reason

This comment reveals the economic logic behind industry cooperation on AI safety standards – that safety measures are expensive and a major incident would hurt everyone. It explains why competitors are willing to collaborate on standards despite competitive pressures.


Impact

This comment helped explain the seemingly paradoxical situation of competitors collaborating on standards by revealing the shared economic incentives. It shifted the discussion from viewing standards as regulatory compliance to understanding them as collective risk management, making the business case for cooperation clear.


Even when there’s no regulations, I think the standards still are useful… perhaps there’s also a way to differentiate for organizations, for enterprises… A way to differentiate themselves and say that, look, I’m adhering to a global standard. I’m demonstrating that I have actually implemented something that’s good enough.

Speaker

Lee Wan Sie


Reason

This comment challenges the assumption that standards are primarily about regulatory compliance by highlighting their market differentiation value. It shows how standards can create competitive advantages and consumer trust even without regulatory mandates.


Impact

This comment expanded the discussion beyond regulatory compliance to include market dynamics and competitive positioning. It helped explain why companies like OpenAI pursue certifications voluntarily and added a business strategy dimension to the standards conversation.


Overall assessment

These key comments fundamentally shaped the discussion by introducing critical tensions and complexities that moved the conversation beyond surface-level agreement. Joslyn Barnhart’s observation about regulations preceding standards created urgency and explained industry motivation. Rebecca Weiss’s comments about consensus-building and uncertainty quantification provided technical depth while highlighting political dimensions. Etienne Chaponniere’s equity concerns broadened the scope to include smaller players, while Lee Wan Sie’s market differentiation point showed standards’ value beyond compliance. Together, these comments transformed what could have been a superficial discussion about the need for standards into a nuanced exploration of legitimacy, technical challenges, economic incentives, and democratic participation in AI governance. The discussion evolved from ‘why standards?’ to ‘how do we create legitimate, accessible, and effective standards that serve diverse stakeholders while managing unprecedented technological risks?’


Follow-up questions

How do we define the characteristics of a system such that you can actually create the kind of uncertainty estimation that lives up to a statistical guarantee?

Speaker

Rebecca Weiss


Explanation

This addresses the scientific challenge of creating reliable measurement methodologies for AI systems that can provide statistically valid uncertainty estimates, which is fundamental to trustworthy AI standards.


What are the risks that we care about and how do different jurisdictions prioritize different lists of risks?

Speaker

Esther Tetruashvily


Explanation

This highlights the need to understand how different countries and regions may have varying risk priorities for AI systems, which affects global standardization efforts.


What are the net new risks versus existing risks where we don’t need to create something new?

Speaker

Esther Tetruashvily


Explanation

This is important for avoiding duplication of effort and focusing standardization work on genuinely novel AI-specific risks rather than rehashing existing risk management approaches.


Who is best positioned to control a particular risk across the AI supply chain?

Speaker

Esther Tetruashvily


Explanation

This addresses the critical question of responsibility allocation between model developers, application developers, and deployers in managing AI risks.


How do we have common ways of assessing whether we are still in a nascent stage and what levels of uncertainty do we have?

Speaker

Amanda Craig


Explanation

This would help the field objectively measure progress in AI safety and standards development rather than relying on subjective assessments.


How do we create a system of interoperable standards that work across different deployment scenarios, use cases, and sectors?

Speaker

Amanda Craig


Explanation

This is crucial for creating efficiency in standards development and avoiding the need to start from scratch for each new application area.


How can world governments audit sophisticated AI compliance programs given the technical skill gap?

Speaker

Audience member


Explanation

This addresses a critical implementation challenge where regulatory bodies may lack the technical expertise to effectively oversee AI standards compliance.


How do we tackle language bias and build guardrails for multilingual AI systems, particularly for countries with many official languages like India?

Speaker

Computer science student (audience)


Explanation

This highlights the need for inclusive AI development that works across diverse linguistic contexts, which is essential for global AI deployment.


How do we address social policy disagreements in AI governance standards when stakeholders disagree on what should even be measured?

Speaker

Jules Polonetsky (audience)


Explanation

This addresses the challenge of building consensus on AI standards when there are fundamental disagreements about values and priorities among stakeholders.


How do we ensure standards are not just performative but actually serve public needs rather than just satisfying industry requirements?

Speaker

Audience member


Explanation

This questions the legitimacy and effectiveness of industry-driven standards processes and highlights the need for genuine public benefit.


How do we move faster on standards development while maintaining quality and consensus?

Speaker

Lee Wan Sie


Explanation

This addresses the tension between the rapid pace of AI development and the typically slower pace of standards development processes.


How do we future-proof AI standards as models and capabilities continue to evolve rapidly?

Speaker

Implied by multiple speakers


Explanation

This is essential for ensuring that standards remain relevant and effective as AI technology continues to advance at a rapid pace.


Disclaimer: This is not an official session record. DiploAI generates these resources from audiovisual recordings, and they are presented as-is, including potential errors. Due to logistical challenges, such as discrepancies in audio/video or transcripts, names may be misspelled. We strive for accuracy to the best of our ability.