GPT-5.5 ranks among strongest models in UK cyber evaluation

The UK AI Security Institute has published cyber evaluations of OpenAI’s GPT-5.5, finding that the model is among the strongest it has tested on cyber tasks and the second to complete one of its end-to-end multi-step cyber-attack simulations.

According to the institute, GPT-5.5’s results suggest that recent gains in cyber capability are not limited to a single model family. It says an earlier evaluation of Anthropic’s Claude Mythos Preview had already pointed to a step up over previous frontier systems, and GPT-5.5 appears to reinforce that broader trend across leading models.

The institute uses a suite of 95 narrow cyber tasks across four difficulty tiers to test capabilities such as reverse engineering, web exploitation, cryptography, vulnerability research, and exploitation. On expert-level tasks in its advanced suite, GPT-5.5 achieved an average pass rate of 71.4%, ahead of Mythos Preview at 68.6%, GPT-5.4 at 52.4%, and Opus 4.7 at 48.6%.

The UK AI Security Institute also tests models in cyber ranges designed to measure multi-step attack capability. In The Last Ones, a 32-step corporate network intrusion simulation modelled on an enterprise kill chain, GPT-5.5 completed the full attack chain in 2 of 10 attempts, becoming the second model to do so after Mythos Preview. In the Cooling Tower industrial control system simulation, GPT-5.5 did not complete the range, and no model has yet done so.

The institute stresses that these are controlled capability evaluations and do not necessarily reflect what is available to ordinary public users. It also notes that the current ranges do not yet include all the defensive conditions of real-world environments, such as active defenders, defensive tooling, or alert penalties.

Separately, the institute evaluated GPT-5.5’s cyber safeguards and OpenAI’s mitigations against malicious cyber use. It said expert red-teamers identified a universal jailbreak that elicited prohibited cyber content across all malicious cyber queries provided by OpenAI, including in multi-turn agentic settings. OpenAI later updated its safeguard stack, but the institute said a configuration issue prevented it from verifying the effectiveness of the final version.

The institute adds that if offensive cyber capability is emerging as a byproduct of broader gains in autonomy, reasoning, and coding, further increases in model cyber performance could follow quickly. At the same time, it notes that the same capabilities may also help defenders and points to related UK government work on cyber resilience, vulnerability management, and preparation for a possible ‘vulnerability patch wave’.

Why does it matter?

The significance of the evaluation is not only that GPT-5.5 performed strongly on cyber tasks, but that it adds to the evidence that offensive cyber capability may be improving across multiple frontier model families at roughly the same time. If those gains are being driven by broader advances in reasoning, coding, and agentic execution, then cyber risk may rise even when models are not explicitly optimised for offensive use. That makes evaluation, safeguards, and realistic testing environments increasingly important, especially as the same capabilities can also strengthen defensive work and shorten response times for cybersecurity teams.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

UNDP highlights challenges in public sector digital transformation outcomes

According to UNDP, global public sector investment in digital technology now exceeds US$800 billion, yet most transformation efforts continue to fall short of expectations. UNDP reports that global public-sector investment in digital technology exceeds US$800 billion, while many transformation efforts fall short of expectations.

The report links persistent underperformance to structural and institutional barriers rather than technological limitations. The report also notes that digital initiatives often lack alignment with broader policy goals, resulting in fragmented systems that improve internal processes but do not transform public services.

UNDP identifies six recurring issues that continue to undermine progress across governments. These include rigid funding models that treat software as a one-time investment, fragmented mandates across institutions, limited data sharing, shortages of specialised talent, and procurement systems that prioritise risk avoidance over adaptability.

The report suggests that closing the gap between digital potential and real-world results may require a shift in approach. According to the report, sustainable transformation depends on reforming governance, funding, and incentives so technology can deliver measurable public value.

What does it matter? 

The persistent gap between digital investment and actual outcomes signals a deeper governance challenge that goes far beyond technology. When most public sector transformation projects fail despite high spending, the issue is not innovation capacity but institutional design.

Outdated funding models, siloed mandates, and rigid procurement systems prevent governments from adapting at the speed required by modern digital tools, including AI. As a result, public institutions risk embedding inefficiency at scale while appearing digitally modern on the surface.

From a broader perspective, this has direct implications for state capacity and public trust. Governments that cannot translate digital investment into effective services will struggle to maintain competitiveness, especially as private sector systems become faster, more integrated, and more user-centric.

The issue also shapes global inequality in digital capability, as countries unable to reform underlying structures fall further behind in productivity and service delivery. Ultimately, the challenge is not technological adoption, but whether institutions can evolve fast enough to turn digital potential into real public value.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!  

Code for America highlights challenges in measuring AI use in public services in the US states

According to Code for America, AI is reshaping how public services are delivered across the United States, yet adoption remains uneven and difficult to measure. They added that state governments are rapidly embracing AI through low-risk pilot programmes while still lacking clear frameworks to evaluate impact.

The report describes AI adoption as following a staged progression beginning with readiness, where leadership structures, workforce skills and infrastructure are developed.

Piloting then introduces experimentation through sandboxes and limited deployments, while implementation embeds AI into operational systems such as fraud detection, document automation, research support and citizen-facing chat assistants.

The report also notes that despite growing experimentation, most US states have not yet transitioned into fully operational and measurable systems.

Leading states, including Utah, New Jersey, Pennsylvania, North Carolina, Maryland, Texas and Vermont, are advancing institutional capabilities required to govern AI as a long-term public asset. Others, such as West Virginia, Wyoming, Nebraska, Alaska, Florida and Kansas, remain at earlier stages of readiness and adoption.

The report identifies measuring outcomes as a key challenge. It states that while AI promises efficiency gains and cost reductions, short-term deployment often increases workload for public employees before benefits materialise.

It adds that evaluation frameworks remain underdeveloped, leaving governments with strong governance structures but limited visibility into real performance improvements.

According to Amanda Renteria, CEO of Code for America, the opportunity extends beyond adoption alone, as governments must shape AI in ways that are human-centred and grounded in measurable public outcomes.

The report suggests that states that succeed in aligning technology with real community impact will move beyond experimentation and define the future of public service in the AI era.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!

DeepSeek V4 trails US frontier by eight months, according to CAISI evaluation

The Centre for AI Standards and Innovation, a unit within the US National Institute of Standards and Technology, has published an evaluation of DeepSeek V4, finding that it is the most capable Chinese-developed model it has assessed to date, but that it still trails leading US models overall.

According to the evaluation, DeepSeek V4 was tested in April 2026 and lagged top US frontier models by about eight months in CAISI’s aggregate capability measure. The report says the model performed strongly across several domains and was the most capable PRC model assessed by CAISI so far.

The findings highlight DeepSeek V4’s strongest results in mathematics, software engineering, and natural sciences. In mathematics, the model achieved particularly strong scores on benchmarks such as OTIS-AIME-2025 and PUMaC 2024, while still lagging the top US systems in overall capability.

CAISI also says DeepSeek V4 is more cost-efficient than other models of similar capability. Compared with the most cost-competitive US reference model, GPT-5.4 mini, it was more cost-efficient on five of seven benchmarks, ranging from 53% less expensive to 41% more expensive depending on the task.

The report notes that CAISI selected a US reference model for comparison and evaluated both benchmark performance and token pricing. It adds that DeepSeek’s lower cost profile makes it notable in the current frontier model landscape, even though it remains behind the leading US systems in aggregate capability.

The Center for AI Standards and Innovation (CAISI), a unit within the US National Institute of Standards and Technology (NIST), has published an evaluation of DeepSeek V4 Pro. has published an evaluation of DeepSeek V4 Pro, finding that the model is the most capable Chinese-developed model it has assessed to date, but still trails leading US models overall.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

Victorian officials outline approach to managing AI risks in public sector

Ian Pham at the Victorian Managed Insurance Authority (VMIA) outlined approaches to managing AI adoption during the PSN Victorian Government Cyber Security Showcase. Organisations face the challenge of adopting AI while maintaining effective risk management as these systems become more embedded in government operations.

Cybersecurity teams have traditionally operated with a risk-averse approach focused on minimising threats. Such an approach can slow innovation when applied to AI systems used in public sector environments.

A shift towards managing risk in line with organisational objectives is presented as necessary. This includes prioritising relevant risks and moving from reactive responses towards supporting decision-making processes.

AI adoption involves secure environments for experimentation with defined guardrails, including synthetic or non-sensitive data, monitoring mechanisms, usage conditions, and identity and access controls. Exposure can then be increased gradually, supported by governance and continuous reassessment.

Risks linked to AI systems include data leakage, privacy concerns, unauthorised use, and data quality issues. These risks are described as requiring visibility and management, alongside organisational awareness and engagement to support confidence in AI use.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Singapore’s HTX signs agreements to advance public safety technologies

The Home Team Science and Technology Agency has signed 10 agreements with partners across government, industry and academia to advance public safety technologies. The announcement was made at MTX 2026.

The partnerships focus on areas including AI, space technology and cybersecurity, aiming to accelerate development of next-generation capabilities for public safety operations.

Several agreements involve industry collaboration to apply commercial innovations, while others expand research links with academic institutions to deepen expertise in areas such as forensics and autonomous systems.

HTX said the partnerships will strengthen collaboration, innovation and knowledge sharing across the public safety ecosystem. The agreements were announced at an event in Singapore.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Microsoft report highlights growing use of AI in healthcare systems

Healthcare systems worldwide are entering a new phase of digital transformation, driven by the rapid adoption of AI, as highlighted in a Microsoft report.

Growing administrative pressure, complex workflows and rising patient demand are pushing hospitals to integrate AI not as a future concept, but as an immediate operational tool to improve efficiency and care quality.

Across different regions, AI is being deployed to reduce clinician workload and streamline hospital operations.

In the United States, AI-assisted documentation tools are helping medical staff reduce time spent on administrative tasks, allowing them to focus more on patient care. Similar approaches are being applied globally to improve workflow efficiency and support overstretched healthcare professionals.

In emerging and developed markets alike, AI is also strengthening system resilience and accessibility. Applications range from improving pharmacy inventory management in Kenya to enhancing cybersecurity in Japan’s hospital networks following ransomware attacks.

In Spain, AI-based diagnostic tools are helping accelerate the detection of rare diseases, improving both speed and accuracy of medical decisions.

These developments highlight a broader shift in healthcare systems towards AI-driven infrastructure that supports not only clinical outcomes but also operational stability and data security.

Collaboration among healthcare providers, technology companies, and policymakers is becoming increasingly important to ensure that AI integration remains effective, responsible, and scalable.

Why does it matter? 

AI-driven healthcare transformation is reshaping how modern health systems operate at a structural level, shifting the focus from reactive treatment to more efficient, data-informed, and system-wide care delivery.

As hospitals increasingly rely on digital tools, the balance between human clinical expertise and automated support systems is being redefined.

From a broader perspective, the impact extends beyond hospitals and patients, influencing national health resilience, cost efficiency, and equitable access to care.

Countries that successfully integrate AI into healthcare infrastructure are likely to gain significant advantages in service quality, system sustainability, and their ability to respond to future public health challenges.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!  

Brazil’s Ceará state introduces AI assistant for document review

The Junta Comercial do Estado do Ceará has launched an AI-powered document analysis assistant, marking the first public-facing AI service by the Government of the State of Ceará in Brazil. The initiative was announced through an official statement.

The tool is integrated into the Jucec services portal and acts as a pre-analysis system. It reviews documents, cross-checks data and identifies inconsistencies before formal submission.

Officials say the AI system allows users to correct errors in advance, reducing delays and improving efficiency. The analysis is conducted quickly and clearly highlights issues for businesses and accountants.

The initiative is part of wider efforts to modernise public services and support digital transformation in Brazil.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

New MIT research hub targets future of advanced computation

IBM and the MIT Schwarzman College of Computing have launched the MIT-IBM Computing Research Lab, expanding their long-running partnership into a broader research agenda focused on AI, algorithms, and quantum computing.

The initiative builds on the earlier MIT-IBM Watson AI Lab and reflects the rapid shift towards AI deployment and emerging quantum technologies.

The lab aims to explore the convergence of AI and quantum systems, including hybrid computing models that combine classical infrastructure with next-generation quantum hardware.

Research priorities include efficient AI architectures, advanced optimisation methods, and new algorithmic frameworks designed to improve reliability, transparency, and real-world applicability of machine learning systems.

Alongside AI development, the lab will focus on quantum algorithms for complex scientific problems in fields such as chemistry, biology, and materials science. Work will also address the mathematical foundations of modelling dynamic systems, with potential applications ranging from improved weather prediction to financial forecasting and supply chain optimisation.

Leaders from both MIT and IBM describe the lab as a platform for shaping the next generation of computing systems through integrated advances in AI and quantum technologies.

Why does it matter? 

The launch of the MIT-IBM Computing Research Lab signals a broader shift in how foundational computing breakthroughs are now being shaped through close academic–industry collaboration.

As AI and quantum computing converge, the boundaries of what machines can model, predict, and optimise are being fundamentally redefined.

From a wider perspective, these developments could reshape entire sectors, including healthcare, finance, climate science, and global logistics, by enabling faster and more accurate problem-solving at scales that classical systems cannot handle.

The direction of this research also matters for technological sovereignty, as countries and institutions compete to lead in next-generation computing capabilities that will underpin future economic and scientific power.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!  

Digital Dubai rolls out AI workforce programme across public sector

Digital Dubai has launched the AI Workforce Transformation Programme to train 50,000 government employees in AI skills. The initiative is being delivered with the Dubai Government Human Resources Department and the Dubai Centre for Artificial Intelligence.

The programme aims to equip staff with practical knowledge to apply AI in public services and internal processes. It includes tailored training tracks based on job roles, from leadership to general employees.

Officials say the initiative will improve productivity, support innovation and enable more efficient service delivery. It also forms part of wider efforts to strengthen AI adoption across government operations.

The programme is designed to build long-term institutional capabilities and support a technology-driven government model. The initiative was launched by Digital Dubai in Dubai.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot