10 Dec 2024

Are AI safety institutes shaping the future of trustworthy AI?

The primary functions of AI safety institutes include conducting research, developing standards, and fostering international cooperation. While these institutes have the potential to make significant advancements, they are not without challenges.

Contents

Summary

As AI advances at an extraordinary pace, governments worldwide are implementing measures to manage associated opportunities and risks. Beyond traditional regulatory frameworks, strategies include substantial investments in research, global standard setting, and international collaboration. A key development has been the establishment of AI safety institutes (AISIs), which aim to evaluate and verify AI models before public deployment, among other functions.

In November 2023, the UK and the USA launched their AI Safety Institutes, setting an example for others. In the following months, Japan, Canada, and the European Union followed suit through its AI Office. This wave of developments was further reinforced at the AI Seoul Summit in May 2024, where the Republic of Korea and Singapore introduced their institutes. Meanwhile, Australia, France, and Kenya announced similar initiatives.

Except for the EU AI Office, all other AI safety institutes established so far need more regulatory authority. Their primary functions include conducting research, developing standards, and fostering international cooperation. While AISIs have the potential to make significant advancements, they are not without challenges. Critics highlight issues such as overlapping mandates with existing standard-making bodies like the International Organization for Standardization that may create inefficiencies and the risk of undue industry influence shaping their agendas. Others argue that the narrow focus on safety sidelines broader risks, such as ethical misuse, economic disruption, and societal inequality. Some also warn that this approach could stifle innovation and competitiveness, raising concerns about balancing safety with progress.

Introduction

The AI revolution, while built on decades-old technology, has taken much of the world by surprise, including policymakers. The EU legislators, for instance, have had to scramble to update their advanced legal drafts to account for the rise of generative AI tools like ChatGPT. The risks are considerable, ranging from AI-driven disinformation, autonomous systems causing ethical dilemmas, potential malfunctions, and loss of oversight to cybersecurity vulnerabilities. The World Economic Forum’s Global Cybersecurity Outlook 2024 reports that half of industry leaders in sectors such as finance and agriculture view generative AI as a major cybersecurity threat within two years. These concerns, coupled with fears of economic upheaval and threats to national security, make clear that swift and coordinated action is essential.

The European Union’s AI Act, for instance, classifies AI systems by risk and mandates transparency along with rigorous testing protocols (among other requirements). Other regions are drafting similar legislation, while some governments opt for voluntary commitments from industry leaders. These measures alone cannot address the full scope of challenges posed by AI. In response, some countries have created specialised AI Safety Institutes to fill critical gaps. These institutes are meant to provide oversight and also advance empirical research, develop safety standards, and foster international collaboration – key components for responding to the rapid evolution of AI technologies.

In May 2024, a significant advancement in global AI safety collaboration was achieved by establishing the International Network of AI Safety Institutes. This coalition brings together AI safety institutions from different regions, including Australia, Canada, the EU, France, Japan, Kenya, the Republic of Korea, Singapore, the UK, and the USA.

In November 2024, the International Network of AI Safety Institutes convened for its inaugural meeting, marking an important step in global collaboration on AI safety. Discussions centred on advancing research, developing best practices for model testing, promoting global inclusion and knowledge-sharing, laying the foundation for future initiatives ahead of the AI Action Summit in Paris in February 2025.

The first wave of AI safety institutes, established primarily by developed nations, has centred on safeguarding national security and reinforcing democratic values. As other countries establish their institutes, whether they will replicate these models or pursue alternative frameworks more attuned to local needs and contexts remains unclear. As in other digital policy areas, future initiatives from China and India could potentially serve as influential models.

Furthermore, while there is widespread consensus on the importance of key concepts such as ‘AI ethics,’ ‘human oversight,’ and ‘responsible AI,’ their interpretation often varies significantly. These terms are frequently moulded to align with individual nations’ political and cultural priorities, resulting in diverse practical applications. This divergence will inevitably influence the collaboration between AI safety institutes as the global landscape grows increasingly varied.

Finally, a Trump presidency in the USA, with its expected emphasis on deregulation, a more detached US stance toward multilateral institutions, and heightened focus on national security and competitiveness, could further undermine the cooperation needed for these institutes to achieve meaningful impact on AI safety.

Overview of AI safety institutes

The UK AI Safety Institute

Established: In November 2023, with a mission to lead international efforts on AI safety governance and develop global standards. Backed by £100 million in funding through 2030, enabling comprehensive research and policy development.

Key initiatives:
– In November 2024, the UK and the US AI safety institutes jointly evaluated Anthropic’s updated Claude 3.5 Sonnet model, testing its biological, cyber, and software capabilities. The evaluation found that the model provided ‘answers that should have been prevented’ when tested on jailbreaks or actions that produce a response from a model that is intended to be restricted.

– Researched and created structured templates, such as the ‘inability’ template, to demonstrate AI systems’ safety within specific deployment contexts.

– Released tools like Inspect Evals to evaluate AI systems.
Offers up to £200,000 in grants for researchers advancing systemic AI safety.

– Partnered with institutes in the US and France to develop safety frameworks, share research insights, and foster talent exchange.

– Expanded globally with a San Francisco office and published major studies, such as the International Scientific Report on Advanced AI Safety.

The UK AI Safety Institute, launched in November 2023 with £100 million in funding through 2030, was created to spearhead global efforts in AI safety. Its mission centres on establishing robust international standards and advancing cutting-edge research. Key initiatives include risk assessments of advanced AI models (so-called ‘frontier models’) and fostering global collaboration to align safety practices. The institute’s flagship event, the Bletchley Park AI Safety Summit, highlighted the UK’s approach to tackling frontier AI risks, focusing on technical and empirical solutions. Frontier AI is being described as follows in the Bletchely declaration:

‘Particular safety risks arise at the ‘frontier’ of AI, understood as being those highly capable general-purpose AI models, including foundation models, that could perform a wide variety of tasks – as well as relevant specific narrow AI that could exhibit capabilities that cause harm – which match or exceed the capabilities present in today’s most advanced models. Substantial risks may arise from potential intentional misuse or unintended control issues relating to alignment with human intent. These issues are in part because those capabilities are not fully understood and are, therefore, hard to predict. We are especially concerned by such risks in domains such as cybersecurity and biotechnology and where frontier AI systems may amplify risks such as disinformation.‘

However, this narrow emphasis has drawn criticism, questioning whether it sufficiently addresses AI’s broader, everyday challenges.

At the 2024 Stanford AI+Policy Symposium, Oliver Ilott, Director of the AI Safety Institute, articulated the UK’s vision for AI governance. He underscored that AI risks are highly context- and scenario-specific, arguing that no single institution could address all the challenges AI presents. ‘Creating such an entity would be like duplicating government itself,’ Ilott explained, advocating instead for a cross-governmental engagement where each sector addresses AI risks relevant to its domain. This approach highlights the UK’s deliberate choice to concentrate on ‘frontier harms’ – the most advanced and potentially existential AI threats – rather than adopting the broader, risk-based regulatory model championed by the EU.

The Bletchley Park AI Safety Summit reinforced this philosophy, with participating countries agreeing on the need for a ‘technical, empirical, and measurable’ understanding of AI risks. Ilott noted that the ‘core problem for governments is one of ignorance,’ cautioning that policymakers risk being perpetually surprised by rapid AI advancements. While high-profile summits elevate the political discourse, Ilott stressed that consistent technical work between these events is critical. To this end, the UK institute has prioritised building advanced testing capabilities and coordinating efforts across the government to ensure preparedness.

The UK’s approach diverges significantly from the EU’s more comprehensive, risk-based framework. The EU has implemented sweeping regulations addressing various AI applications, from facial recognition to general-purpose systems. In contrast, the UK’s more laissez-faire policy focuses narrowly on frontier technologies, promoting flexibility and innovation. The Safety Institute, with its targeted focus on addressing frontier risks, illustrates the UK’s approach. However, this narrow focus may leave gaps in governance, overlooking pressing issues like algorithmic bias, data privacy, and the societal impacts of AI already integrated into daily life.

Ultimately, the long-term success of the UK AI Safety Institute depends on the government’s ability to coordinate effectively across departments and to ensure that its focus does not come at the expense of broader societal safeguards.

The US AI Safety Institute

Established: In 2023 under the National Institute of Standards and Technology, with a US$10 million budget, with a focus on empirical research, model testing, and safety guidelines.

Key initiatives:
– In November 2024, the US Artificial Intelligence Safety Institute at the US Department of Commerce’s National Institute of Standards and Technology announced the formation of the Testing Risks of AI for National Security Taskforce, which brings together partners from across the US government to identify, measure, and manage the emerging national security and public safety implications of rapidly evolving AI technology.

– Conducted joint pre-deployment evaluations (Anthropic’s Claude 3.5 model).

– Launched the International Network of AI Safety Institutes to foster international collaboration, with an inaugural convening in San Francisco in November 2024.

– Issued guidance documents, requested input on chemical/biological AI risks, and formed a consortium with over 200 stakeholders to advance AI safety.
Signed agreements with entities like Anthropic and OpenAI to enhance research and evaluation efforts.

– Expanded leadership and outlined a strategic vision for global cooperation, aligning with the Biden administration’s AI Executive Order.

The US AI Safety Institute, established in 2023 under the National Institute of Standards and Technology with a US$10 million budget, is a critical component of the US’s approach to AI governance. Focused on empirical research, rigorous model testing, and developing comprehensive safety guidelines, the institute has sought to bolster national and global AI safety. Elizabeth Kelly, the institute’s director, explained at the 2024 AI+Policy Symposium, ‘AI safety is far from straightforward and filled with many open questions.’ She underscored the institute’s dual objective of addressing future harms while simultaneously mitigating present risks, emphasising that ‘safety drives innovation’ and that a robust safety framework can fuel healthy competition.

Kelly highlighted the collaborative nature of the US approach, which involves working closely with agencies like the Department of Energy to leverage specialised expertise, particularly in high-stakes areas such as nuclear safety. The institute’s priorities include fundamental research, advanced testing and evaluation, and developing standards for content authentication, like watermarking, to combat AI-generated misinformation. According to Kelly, the institute’s success hinges on building ‘an AI safety ecosystem larger than any single government,’ underscoring a vision for broad, cross-sectoral engagement.

The institute’s strategy emphasises a decentralised and adaptive model of governance. By leveraging the expertise of various federal agencies, the US approach aims to remain nimble and responsive to emerging risks. Similar to the UK approach, this model contrasts the European Union’s AI Office, where AI Safety is just one of the five specialised units supported by two advisory roles. The EU AI Office distinguishes itself from other AI Safety Institutes by adopting a centralised and hierarchical model with a strong focus on compliance and harmonisation across the EU member states. Being part of a centralised structure, the AI Safety unit may face delays in responding to rapidly emerging challenges due to its reliance on more rigid decision-making processes.

The US model’s flexibility supports innovation but may leave gaps in areas such as ethical governance and long-term accountability. The Institute operates under a presidential order, making its directives susceptible to shifts in political priorities. The election of Donald Trump for a new mandate introduces significant uncertainty into the institute’s future. Given Trump’s history of favouring deregulation, his administration could alter or dismantle the institute’s initiatives, reduce funding, or pivot away from stringent AI oversight. Such a shift could undermine progress in AI safety and lead to inconsistencies in governance, particularly if policies become more relaxed or innovation-focused at the expense of rigorous safety measures.

A repeal of Biden’s AI Executive Order appears likely, signalling shifts in AI policy priorities. Yet, Trump’s earlier AI executive orders emphasised civil liberties, privacy, and trustworthy AI alongside innovation, and it is possible that his future policy initiatives could maintain this balance.

Ultimately, the future of the US AI Safety Institute will depend on whether it can secure more permanent legislative backing to withstand political fluctuations. Elon Musk, a tech billionaire entrepreneur and a prominent supporter of Trump, advocates extensively to shift the focus of the AI policy debate to existential AI risks, and these efforts might also impact the work of the US AI Safety Institute.

Japan’s AI Safety Institute

Established: In 2024, under the Council for Science, Technology, and Innovation, as part of the G7 Hiroshima AI Process.

Key initiatives:
– Conducts surveys, evaluates AI safety methods, and develops standards while acting as a central hub for collaboration between industry, academia, and AI safety-related organisations in Japan.

– Addresses a wide range of AI-related issues, including social impact, AI systems, data governance, and content, with flexibility to adapt to global trends.

– Focuses on creating safety assessment standards, exploring anti-disinformation tools, cybersecurity measures, and developing a testbed environment for AI evaluation.

– Engages in global collaboration with the AI safety institutes in the UK and USA to align efforts and share expertise.

The Japan AI Safety Institute plays a central role in the nation’s AI governance strategy, aligning its efforts with Japan’s broader commitments under the G7 Hiroshima AI Process. Operating under the Council for Science, Technology, and Innovation, the institute is dedicated to fostering a safe, secure, and trustworthy AI ecosystem.

Akiko Murakami, Executive Director of the institute, emphasised at the 2024 AI+Policy Symposium the need to ‘balance innovation and regulation,’ underscoring that AI safety requires both interagency efforts and robust international collaboration. Highlighting recent progress, she referenced the agreement on interoperable standards reached during the US-Japan Summit in April 2024, underscoring Japan’s commitment to global alignment in AI governance.

Murakami explained that the institute’s approach stands out in terms of integrating private sector expertise. Many members, including leadership figures, participate part-time while continuing their roles in the industry. This model promotes a continuous exchange of insights between policy and practice, ensuring that the institute remains attuned to real-world technological advancements. However, she acknowledged that the institute faces challenges in setting traditional key performance indicators due to the rapid pace of AI development, suggesting the need for ‘alternative metrics’ to assess success beyond conventional safety benchmarks.

The Japan AI Safety Institute’s model prioritises flexibility, real-world industry engagement, and collaboration. The institute benefits from up-to-date expertise and insights by incorporating part-time private sector professionals, making it uniquely adaptable. This hybrid structure differs significantly from the centralised model of the US AI Safety Institute, which relies on federal budgets and agency-specific mandates to drive empirical research and safety guidelines. Japan’s model is also distinct from the European Union’s AI Office, which, besides the AI Safety Unit, has broad enforcement responsibilities of the AI Act across all member states and from the UK’s primary focus on frontier risks.

Zooming out from the AI safety institutes and examining each jurisdiction’s broader AI governance systems reveals differences in approaches. The EU’s governance is defined by its top-down regulatory framework, exemplified by ex-ante regulatory frameworks such as the AI Act, which aims to enforce uniform risk-based oversight across member states. In contrast, Japan employs a participatory governance model integrating government, academia, and industry through voluntary guidelines such as the Social Principles of Human-Centric AI. This strategy fosters flexibility, with stakeholders contributing directly to policy developments through ongoing dialogues; however, the reliance on voluntary standards risks weaker enforcement and accountability. The USA takes an agency-driven, sector-specific approach, emphasising national security and economic competitiveness while leaving the broader AI impacts less regulated. The UK is closer to the US approach, with an enhanced focus on frontier risks addressed mostly through empirical research and technical safeguards.

Japan’s emphasis on international collaboration and developing interoperable standards is a strategic choice. By actively participating in global efforts and agreements, Japan positions itself as a key player in shaping the international AI safety landscape.

While the Hiroshima AI Process and partnerships like the one with the USA are central to Japan’s strategy, they also make its success contingent on stable international relations. If geopolitical tensions were to rise or if global cooperation were to wane, Japan’s AI governance efforts could face setbacks.

Singapore’s AI Safety Institute

Funding: $50 million grant, starting from October 2022.

Key initiatives:
– Focuses on rigorous evaluation of AI systems, including generative AI, to address gaps in global AI safety science.

– Develops frameworks for the design, development, and deployment of safe and reliable AI models.

– Researches and implements methods to ensure the accuracy and reliability of AI-generated content.

– Provides science-based input for AI governance and contributes to international AI safety frameworks.

– Works with other AI safety institutes, including those in the USA and UK, to advance shared goals in AI safety and governance.

– Led the launch of the ASEAN Guide on AI Governance and Ethics to address regional AI safety needs cohesively and interoperably.

Unlike the US and the UK that established new institutions, Singapore repurposed an existing government body, the Digital Trust Centre. At the time of this writing, not enough information is publicly available to assess the work of the Centre.

Canada’s AI Safety Institute

Established: November 2024, as part of Canada’s broader strategy to ensure the safe and responsible development of AI. Funding: C$50 million.

Key initiatives:
– CAISI operates under Innovation, Science and Economic Development Canada (ISED) and collaborates with the National Research Council of Canada (NRC) and the Canadian Institute for Advanced Research (CIFAR).

– It conducts applied and investigator-led research through CIFAR and government-directed projects to address AI safety risks.

– Plays a key role in the International Network of AI Safety Institutes, contributing to global efforts on AI safety and co-developing guidance for responsible AI practices.

– Supporting Canada’s Pan-Canadian Artificial Intelligence Strategy, the Artificial Intelligence and Data Act (Bill C-27), and voluntary codes of conduct for advanced AI systems.

As of this writing, more publicly available information is needed to evaluate the work of the Institute, which was only recently established.

European Union’s AI Office

Established: January 2024, the European Commission has launched an AI innovation package to support startups and SMEs in developing trustworthy AI that complies with EU values and rules. The AI office was part of this package. Funding: €46.5 million, setup funding.

Key Initiatives:
– Contributing to the coherent application of the AI Act across the member states, including the set-up of advisory bodies at EU level, facilitating support and information exchange.

– Developing tools, methodologies, and benchmarks for evaluating capabilities and reach of general-purpose AI models, and classifying models with systemic risks.

– Drawing up state-of-the-art codes of practice to detail out rules, in cooperation with leading AI developers, the scientific community and other experts

– Investigating possible infringements of rules, including evaluations to assess model capabilities, and requesting providers to take corrective action.

– Preparing guidance and guidelines, implementing and delegated acts, and other tools to support effective implementation of the AI Act and monitor compliance with the regulation.

The EU AI Office stands out as both an AI safety institute, through its AI Safety Unit, and a regulatory body with broad enforcement powers under the AI Act across EU member states. The AI Safety Unit fulfills the typical functions of a safety institute, conducting evaluations and representing the office internationally in meetings with its counterparts. It is not clear whether the AI Safety Unit will have the necessary resources, both in terms of personnel and funding, to perform similar model testing as its UK and US counterparts.

Republic of Korea’s AI Safety Institute

Established: November 2024, to ensure the safe use of artificial intelligence technology.

Key initiatives:
– Preemptively addresses risks like misuse, technical limitations, and loss of control to enhance AI reliability.

– Provides guidance to reduce AI side effects, such as deepfakes, and supports companies in navigating global regulations and certifications.

– Participates in international efforts to establish AI safety norms and align with global frameworks.

– Partners with 24 domestic organisations to strengthen AI safety research and create a secure R&D environment.

– Collaborates with companies like Naver, LG, and SK Telecom to promote ethical AI practices and manage potential risks.

As of this writing, insufficient publicly available information exists to evaluate the work of the Institute, which was only recently established.

Conclusion

The AI safety institutes are beginning their journey, having only established their first basis for collaboration. While early testing efforts offer a glimpse of their potential, it remains to be seen whether these actions alone can effectively curb deploying AI models that pose significant risks. Diverging priorities, including national security concerns, data-sharing policies, and the further weakening of multilateral systems, could undermine their collective effectiveness.

Notably, nations such as India, Brazil, and China have yet to establish AI safety institutes. The governance models these countries propose may differ from existing approaches, setting the stage for a competition between differing visions of global AI safety.

Building trust between the institutes and the AI industry will be critical for meaningful collaboration. This trust could be cultivated through transparent engagement and mutual accountability. Equally, civil society must play an active role in this ecosystem, acting as a watchdog to ensure accountability and safeguard the broader public interest.

Finally, the evolving geopolitical landscape will profoundly impact the trajectory of these initiatives. The success of the AI safety institutes will depend on their ability to adapt to technical and policy challenges and how effectively they navigate and influence the complex global dynamics shaping AI governance.