GPT-5.5 ranks among strongest models in UK cyber evaluation

The UK AI Security Institute has published cyber evaluations of OpenAI’s GPT-5.5, finding that the model is among the strongest it has tested on cyber tasks and the second to complete one of its end-to-end multi-step cyber-attack simulations.

According to the institute, GPT-5.5’s results suggest that recent gains in cyber capability are not limited to a single model family. It says an earlier evaluation of Anthropic’s Claude Mythos Preview had already pointed to a step up over previous frontier systems, and GPT-5.5 appears to reinforce that broader trend across leading models.

The institute uses a suite of 95 narrow cyber tasks across four difficulty tiers to test capabilities such as reverse engineering, web exploitation, cryptography, vulnerability research, and exploitation. On expert-level tasks in its advanced suite, GPT-5.5 achieved an average pass rate of 71.4%, ahead of Mythos Preview at 68.6%, GPT-5.4 at 52.4%, and Opus 4.7 at 48.6%.

The UK AI Security Institute also tests models in cyber ranges designed to measure multi-step attack capability. In The Last Ones, a 32-step corporate network intrusion simulation modelled on an enterprise kill chain, GPT-5.5 completed the full attack chain in 2 of 10 attempts, becoming the second model to do so after Mythos Preview. In the Cooling Tower industrial control system simulation, GPT-5.5 did not complete the range, and no model has yet done so.

The institute stresses that these are controlled capability evaluations and do not necessarily reflect what is available to ordinary public users. It also notes that the current ranges do not yet include all the defensive conditions of real-world environments, such as active defenders, defensive tooling, or alert penalties.

Separately, the institute evaluated GPT-5.5’s cyber safeguards and OpenAI’s mitigations against malicious cyber use. It said expert red-teamers identified a universal jailbreak that elicited prohibited cyber content across all malicious cyber queries provided by OpenAI, including in multi-turn agentic settings. OpenAI later updated its safeguard stack, but the institute said a configuration issue prevented it from verifying the effectiveness of the final version.

The institute adds that if offensive cyber capability is emerging as a byproduct of broader gains in autonomy, reasoning, and coding, further increases in model cyber performance could follow quickly. At the same time, it notes that the same capabilities may also help defenders and points to related UK government work on cyber resilience, vulnerability management, and preparation for a possible ‘vulnerability patch wave’.

Why does it matter?

The significance of the evaluation is not only that GPT-5.5 performed strongly on cyber tasks, but that it adds to the evidence that offensive cyber capability may be improving across multiple frontier model families at roughly the same time. If those gains are being driven by broader advances in reasoning, coding, and agentic execution, then cyber risk may rise even when models are not explicitly optimised for offensive use. That makes evaluation, safeguards, and realistic testing environments increasingly important, especially as the same capabilities can also strengthen defensive work and shorten response times for cybersecurity teams.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

UNDP highlights challenges in public sector digital transformation outcomes

According to UNDP, global public sector investment in digital technology now exceeds US$800 billion, yet most transformation efforts continue to fall short of expectations. UNDP reports that global public-sector investment in digital technology exceeds US$800 billion, while many transformation efforts fall short of expectations.

The report links persistent underperformance to structural and institutional barriers rather than technological limitations. The report also notes that digital initiatives often lack alignment with broader policy goals, resulting in fragmented systems that improve internal processes but do not transform public services.

UNDP identifies six recurring issues that continue to undermine progress across governments. These include rigid funding models that treat software as a one-time investment, fragmented mandates across institutions, limited data sharing, shortages of specialised talent, and procurement systems that prioritise risk avoidance over adaptability.

The report suggests that closing the gap between digital potential and real-world results may require a shift in approach. According to the report, sustainable transformation depends on reforming governance, funding, and incentives so technology can deliver measurable public value.

What does it matter? 

The persistent gap between digital investment and actual outcomes signals a deeper governance challenge that goes far beyond technology. When most public sector transformation projects fail despite high spending, the issue is not innovation capacity but institutional design.

Outdated funding models, siloed mandates, and rigid procurement systems prevent governments from adapting at the speed required by modern digital tools, including AI. As a result, public institutions risk embedding inefficiency at scale while appearing digitally modern on the surface.

From a broader perspective, this has direct implications for state capacity and public trust. Governments that cannot translate digital investment into effective services will struggle to maintain competitiveness, especially as private sector systems become faster, more integrated, and more user-centric.

The issue also shapes global inequality in digital capability, as countries unable to reform underlying structures fall further behind in productivity and service delivery. Ultimately, the challenge is not technological adoption, but whether institutions can evolve fast enough to turn digital potential into real public value.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!  

Code for America highlights challenges in measuring AI use in public services in the US states

According to Code for America, AI is reshaping how public services are delivered across the United States, yet adoption remains uneven and difficult to measure. They added that state governments are rapidly embracing AI through low-risk pilot programmes while still lacking clear frameworks to evaluate impact.

The report describes AI adoption as following a staged progression beginning with readiness, where leadership structures, workforce skills and infrastructure are developed.

Piloting then introduces experimentation through sandboxes and limited deployments, while implementation embeds AI into operational systems such as fraud detection, document automation, research support and citizen-facing chat assistants.

The report also notes that despite growing experimentation, most US states have not yet transitioned into fully operational and measurable systems.

Leading states, including Utah, New Jersey, Pennsylvania, North Carolina, Maryland, Texas and Vermont, are advancing institutional capabilities required to govern AI as a long-term public asset. Others, such as West Virginia, Wyoming, Nebraska, Alaska, Florida and Kansas, remain at earlier stages of readiness and adoption.

The report identifies measuring outcomes as a key challenge. It states that while AI promises efficiency gains and cost reductions, short-term deployment often increases workload for public employees before benefits materialise.

It adds that evaluation frameworks remain underdeveloped, leaving governments with strong governance structures but limited visibility into real performance improvements.

According to Amanda Renteria, CEO of Code for America, the opportunity extends beyond adoption alone, as governments must shape AI in ways that are human-centred and grounded in measurable public outcomes.

The report suggests that states that succeed in aligning technology with real community impact will move beyond experimentation and define the future of public service in the AI era.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!

DeepSeek V4 trails US frontier by eight months, according to CAISI evaluation

The Centre for AI Standards and Innovation, a unit within the US National Institute of Standards and Technology, has published an evaluation of DeepSeek V4, finding that it is the most capable Chinese-developed model it has assessed to date, but that it still trails leading US models overall.

According to the evaluation, DeepSeek V4 was tested in April 2026 and lagged top US frontier models by about eight months in CAISI’s aggregate capability measure. The report says the model performed strongly across several domains and was the most capable PRC model assessed by CAISI so far.

The findings highlight DeepSeek V4’s strongest results in mathematics, software engineering, and natural sciences. In mathematics, the model achieved particularly strong scores on benchmarks such as OTIS-AIME-2025 and PUMaC 2024, while still lagging the top US systems in overall capability.

CAISI also says DeepSeek V4 is more cost-efficient than other models of similar capability. Compared with the most cost-competitive US reference model, GPT-5.4 mini, it was more cost-efficient on five of seven benchmarks, ranging from 53% less expensive to 41% more expensive depending on the task.

The report notes that CAISI selected a US reference model for comparison and evaluated both benchmark performance and token pricing. It adds that DeepSeek’s lower cost profile makes it notable in the current frontier model landscape, even though it remains behind the leading US systems in aggregate capability.

The Center for AI Standards and Innovation (CAISI), a unit within the US National Institute of Standards and Technology (NIST), has published an evaluation of DeepSeek V4 Pro. has published an evaluation of DeepSeek V4 Pro, finding that the model is the most capable Chinese-developed model it has assessed to date, but still trails leading US models overall.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

US military expands AI deployment across classified networks

The US Department of Defence has announced agreements with leading technology firms to deploy advanced AI capabilities across classified military networks. The initiative forms part of a broader effort to position the United States as a more AI-enabled military power.

Companies including OpenAI, Google, Microsoft, Amazon Web Services, NVIDIA, and SpaceX are reported to be involved in supporting deployment within high-security Impact Level 6 and 7 environments. The integration is intended to improve data synthesis, situational awareness, and operational decision-making across defence systems.

The department’s internal platform, GenAI.mil, is also being presented as a central part of this push, with senior officials describing it as a way to put advanced AI tools into the hands of personnel across the department and across different classification levels.

Officials have emphasised that maintaining access to a range of AI providers is important to avoid vendor lock-in and preserve long-term flexibility. In that sense, the move reflects a wider attempt to strengthen national security through advanced technology while keeping the military AI stack diversified rather than dependent on a single company or model family. However, this is an inference based on the reported Pentagon framing of the agreements.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!  

Victorian officials outline approach to managing AI risks in public sector

Ian Pham at the Victorian Managed Insurance Authority (VMIA) outlined approaches to managing AI adoption during the PSN Victorian Government Cyber Security Showcase. Organisations face the challenge of adopting AI while maintaining effective risk management as these systems become more embedded in government operations.

Cybersecurity teams have traditionally operated with a risk-averse approach focused on minimising threats. Such an approach can slow innovation when applied to AI systems used in public sector environments.

A shift towards managing risk in line with organisational objectives is presented as necessary. This includes prioritising relevant risks and moving from reactive responses towards supporting decision-making processes.

AI adoption involves secure environments for experimentation with defined guardrails, including synthetic or non-sensitive data, monitoring mechanisms, usage conditions, and identity and access controls. Exposure can then be increased gradually, supported by governance and continuous reassessment.

Risks linked to AI systems include data leakage, privacy concerns, unauthorised use, and data quality issues. These risks are described as requiring visibility and management, alongside organisational awareness and engagement to support confidence in AI use.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Singapore’s HTX signs agreements to advance public safety technologies

The Home Team Science and Technology Agency has signed 10 agreements with partners across government, industry and academia to advance public safety technologies. The announcement was made at MTX 2026.

The partnerships focus on areas including AI, space technology and cybersecurity, aiming to accelerate development of next-generation capabilities for public safety operations.

Several agreements involve industry collaboration to apply commercial innovations, while others expand research links with academic institutions to deepen expertise in areas such as forensics and autonomous systems.

HTX said the partnerships will strengthen collaboration, innovation and knowledge sharing across the public safety ecosystem. The agreements were announced at an event in Singapore.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Brazil’s Ceará state introduces AI assistant for document review

The Junta Comercial do Estado do Ceará has launched an AI-powered document analysis assistant, marking the first public-facing AI service by the Government of the State of Ceará in Brazil. The initiative was announced through an official statement.

The tool is integrated into the Jucec services portal and acts as a pre-analysis system. It reviews documents, cross-checks data and identifies inconsistencies before formal submission.

Officials say the AI system allows users to correct errors in advance, reducing delays and improving efficiency. The analysis is conducted quickly and clearly highlights issues for businesses and accountants.

The initiative is part of wider efforts to modernise public services and support digital transformation in Brazil.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

New MIT research hub targets future of advanced computation

IBM and the MIT Schwarzman College of Computing have launched the MIT-IBM Computing Research Lab, expanding their long-running partnership into a broader research agenda focused on AI, algorithms, and quantum computing.

The initiative builds on the earlier MIT-IBM Watson AI Lab and reflects the rapid shift towards AI deployment and emerging quantum technologies.

The lab aims to explore the convergence of AI and quantum systems, including hybrid computing models that combine classical infrastructure with next-generation quantum hardware.

Research priorities include efficient AI architectures, advanced optimisation methods, and new algorithmic frameworks designed to improve reliability, transparency, and real-world applicability of machine learning systems.

Alongside AI development, the lab will focus on quantum algorithms for complex scientific problems in fields such as chemistry, biology, and materials science. Work will also address the mathematical foundations of modelling dynamic systems, with potential applications ranging from improved weather prediction to financial forecasting and supply chain optimisation.

Leaders from both MIT and IBM describe the lab as a platform for shaping the next generation of computing systems through integrated advances in AI and quantum technologies.

Why does it matter? 

The launch of the MIT-IBM Computing Research Lab signals a broader shift in how foundational computing breakthroughs are now being shaped through close academic–industry collaboration.

As AI and quantum computing converge, the boundaries of what machines can model, predict, and optimise are being fundamentally redefined.

From a wider perspective, these developments could reshape entire sectors, including healthcare, finance, climate science, and global logistics, by enabling faster and more accurate problem-solving at scales that classical systems cannot handle.

The direction of this research also matters for technological sovereignty, as countries and institutions compete to lead in next-generation computing capabilities that will underpin future economic and scientific power.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!  

European Commission urges fast rollout of EU age verification app

The European Commission has adopted a recommendation urging member states to accelerate the rollout of the EU age verification app and make it available by the end of the year. The recommendation says the app can be deployed either as a standalone solution or integrated into a European Digital Identity Wallet.

According to the Commission, the app is intended to let users prove they meet a required age threshold without disclosing their exact age, identity, or other personal details. The Commission has also published a blueprint for the system, leaving it to member states to customise and produce the app for their citizens.

The recommendation sets out actions for member states to support rapid availability and interoperability, including implementation plans and coordination to ensure the swift rollout of the solution across the EU.

The measure forms part of the EU’s wider approach to protecting minors online under the Digital Services Act, which requires online platforms to ensure a high level of privacy, safety, and security for minors.

Executive Vice-President Henna Virkkunen said: ‘Effective and privacy-preserving age verification is the next piece of the puzzle that we are getting closer to completing, as we work towards an online space where our children are safe and empowered to use positively and responsibly without restricting the rights of adults.’

Why does it matter?

The move takes age verification in the EU from a general policy objective to a more concrete implementation phase. Rather than leaving platforms and member states to develop separate solutions, the Commission is trying to steer the bloc towards a common privacy-preserving model that can work across borders.

That matters for both child protection and regulatory coherence, because if countries adopt incompatible systems or move at very different speeds, enforcement under the Digital Services Act could become uneven in practice.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!