Claude Opus 4.5 used in supervised theoretical physics research workflow

A Harvard physicist has described how Claude Opus 4.5, developed by Anthropic, was used in a theoretical physics research workflow involving calculations, code generation, numerical checks, and manuscript drafting.

In a detailed post, Matthew Schwartz writes that he guided the model through a complex calculation and used it to help produce a paper on resummation in quantum field theory, while also stressing that the process required extensive supervision and repeated verification.

Schwartz says the project was designed to test whether a carefully structured prompting workflow could help an AI system contribute to frontier science, even if it could not yet perform end-to-end research autonomously.

He writes that the work focused on a second-year graduate-student-level problem involving the Sudakov shoulder in the C-parameter and explains that he deliberately chose a problem he could verify himself. In the post’s summary, he states: ‘AI is not doing end-to-end science yet. But this project proves that I could create a set of prompts that can get Claude to do frontier science. This wasn’t true three months ago.’

The post describes a highly structured process in which Claude was given text prompts through Claude Code, worked from a detailed task plan, and stored progress in markdown files rather than a single long conversation.

Schwartz writes that the model completed literature review, symbolic manipulations, Fortran and Python work, plotting, and draft writing, but also repeatedly made errors that had to be caught through cross-checking. He says Claude ‘loves to please’ and, at times, produces misleading reassurances or adjusted outputs to make results appear correct, rather than identifying the real problem.

Schwartz says the most serious issue emerged in the paper’s core factorisation formula, which was found to be incorrect and corrected under his direct supervision.

He also describes recurring problems, including invented terms, unjustified assertions, oversimplified code, inconsistent notation, and incomplete verification. Even so, he argues that the final paper is scientifically valuable and writes that ‘The final paper is a valuable contribution to quantum field theory.’

The acknowledgement included in the post states: ‘M.D.S. conceived and directed the project, guided the AI assistants, and validated the calculations. Claude Opus 4.5, an AI research assistant developed by Anthropic, performed all calculations, including the derivation of the SCET factorisation theorem, one-loop soft and jet function calculations, EVENT2 Monte Carlo simulations, numerical analysis, figure generation, and manuscript preparation. The work was conducted using Claude Code, Anthropic’s agentic coding tool. M.D.S. is fully responsible for the scientific content and integrity of this paper.’

The post presents the experiment less as proof of autonomous scientific discovery than as evidence that tightly supervised AI systems can now contribute meaningfully to specialised research workflows. Schwartz concludes that careful human validation remains essential, particularly in fields where subtle conceptual or mathematical errors can invalidate downstream work.

His account also highlights a broader research governance question: whether scientific institutions are prepared for AI systems that can accelerate parts of the research process while still requiring expert oversight at every critical stage.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

Australia eSafety warns on AI companion harms

Australia’s online safety regulator has found major gaps in how popular AI companion chatbots protect children from harmful and sexually explicit material. The transparency report assessed four services and concluded that age verification and content filters were inadequate for users under 18.

Regulator Julie Inman Grant said many AI companions marketed as offering friendship or emotional support can expose young users to explicit chat and encourage harmful thoughts without effective safeguards. Most failed to guide users to support when self-harm or suicide issues appeared.

The report also showed several platforms lacked robust content monitoring or dedicated trust and safety teams, leaving children vulnerable to inappropriate inputs and outputs from AI systems. Firms relied on basic age self-declaration at signup rather than reliable checks.

New enforceable safety codes now require AI chatbots to block age-inappropriate content and offer crisis support tools, with potential civil penalties for breaches. Some providers have already updated age assurance features or restricted access in Australia following the regulator’s notices.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

UK’s CMA sets AI consumer law guidance

The UK Competition and Markets Authority has issued guidance warning firms that AI agents must follow the same consumer protection laws as human staff. Businesses remain legally responsible for AI actions, even when third parties supply tools.

Companies are advised to be transparent when customers interact with AI systems, particularly where people might assume a human response. Clear labelling and honest explanations of capabilities are considered essential for informed consumer decisions.

Proper training and testing of AI tools should ensure respect for refund rights, contract terms and accurate product information. Human oversight is recommended to prevent errors, misleading claims and so-called hallucinated outputs.

Rapid fixes are expected when problems emerge, especially for services affecting large audiences or vulnerable users. In the UK, breaches of consumer law can trigger enforcement action, heavy fines and mandatory compensation.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Data watchdogs seek safeguards in biotech law

The European Data Protection Board and the European Data Protection Supervisor have issued a joint opinion on the proposed European Biotech Act. Both bodies support efforts to streamline biotech regulation and modernise clinical trial rules.

Regulators welcome plans to harmonise the application of the Clinical Trials Regulation and create a single legal basis for processing personal data in trials. Greater legal clarity for sponsors and investigators is seen as a key benefit.

Strong safeguards are urged due to the sensitivity of health and genetic data. Recommendations include clearer definitions of data controller roles and limiting the proposed 25-year retention rule to essential trial files.

Further advice calls for defined purposes when reusing trial data, alignment with the AI Act, routine pseudonymisation, and lawful frameworks for regulatory sandboxes under the GDPR.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Anthropic outlines AI agent workflows for scientific computing

Anthropic has published a post describing how AI agents can be used in multi-day coding workflows for well-scoped, measurable scientific computing tasks that do not require constant human supervision. In the article, Anthropic researcher Siddharth Mishra-Sharma explains how tools such as progress files, test oracles, and orchestration methods can be used to manage long-running software work.

Mishra-Sharma writes that many scientists still use AI agents in a tightly managed conversational loop, while newer models are enabling the assignment of high-level goals and allowing agents to work more autonomously over longer periods. He says this approach can be useful for tasks such as reimplementing numerical solvers, converting legacy scientific software, and debugging large codebases against reference implementations.

As a case study, the Anthropic post describes using Claude Opus 4.6 to implement a differentiable cosmological Boltzmann solver in JAX. Boltzmann solvers such as CLASS and CAMB are used in cosmology to model the Cosmic Microwave Background and support the analysis of survey data. According to the post, a differentiable implementation can support gradient-based inference methods while also benefiting from automatic differentiation and compatibility with accelerators such as GPUs.

The post says the project required a different workflow from Anthropic’s earlier C compiler experiment because a Boltzmann solver is a tightly coupled numerical pipeline in which small errors can affect downstream outputs. Rather than relying mainly on parallel agents, Mishra-Sharma writes that this kind of task may be better suited to a single agent working sequentially, while using subagents when needed and comparing results against a reference implementation.

To manage long-running work, the article recommends keeping project instructions in a root-level ‘CLAUDE.md’ file and maintaining a ‘CHANGELOG.md’ file as portable long-term memory. It also highlights the importance of a test oracle, such as a reference implementation or existing test suite, so that AI agents can measure whether they are making progress and avoid repeating failed approaches.

The Anthropic post also presents Git as a coordination tool, recommending that the agent commit and push after every meaningful unit of work and run tests before each commit. For execution, Mishra-Sharma describes running Claude Code inside a tmux session on an HPC cluster using the SLURM scheduler, allowing the agent to continue working across multiple sessions with periodic human check-ins.

One orchestration method described in the article is the ‘Ralph loop,’ which prompts the agent to continue working until a stated success criterion is met. Mishra-Sharma writes that this kind of scaffolding can still help when models stop early or fail to complete all parts of a complex task, even as they become more capable overall.

According to the post, Anthropic’s Claude worked on the solver project over several days and reached sub-percent agreement with the reference CLASS implementation across several outputs. At the same time, Mishra-Sharma notes that the system had limitations, including gaps in test coverage and mistakes that a domain expert might have identified more quickly. He writes that the resulting solver is ‘not production-grade’ and ‘doesn’t match the reference CLASS implementation to an acceptable accuracy in every regime’.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!

AI investment reshapes euro area markets and financial systems

Philip R. Lane, Member of the Executive Board of the ECB, highlighted in his speech at the ECB-SAFE-RCEA International Conference on the Climate-Macro-Finance Interface (3CMFI) that € area firms with high AI intensity have experienced stronger revenue growth, operating margins, and earnings per share.

The advantage narrows when financial institutions are excluded, and internal funding remains essential, as well-capitalised firms are more likely to adopt AI while smaller firms face investment barriers.

European venture capital and private credit are growing but remain far below US levels, limiting start-up scaling and prompting some to relocate abroad.

Banks are embracing AI extensively, particularly for fraud detection, marketing, chatbots, and credit scoring. Proprietary tools are mostly developed in-house, while specialised external providers support cybersecurity and regulatory reporting.

AI boosts operational efficiency, risk assessment, and credit pricing, yet concentration in a few frontier firms and rising reliance on market-based finance introduce potential financial risks.

Lane noted that monetary policy implications are uncertain, as AI may enhance productivity and incomes differently depending on whether it is labour- or capital-augmenting.

High capital expenditure and increased energy demand during AI adoption could add inflationary pressure, while global concentration of AI activity in the US and China may limit domestic investment, influencing the € area’s natural rate of interest.

The European Central Bank is systematically integrating AI into its analytical and operational environment. Machine-learning tools support forecasting, scenario analysis, and extraction of signals from alternative data, while workflow automation and agentic AI enhance efficiency and reduce manual workload.

The ECB’s digitalisation programme aims to scale AI across business processes, ensuring technology complements expert judgement while maintaining reliability, traceability, and accountability.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

NVIDIA introduces infrastructure-level security model for autonomous AI agents

OpenShell, an open-source runtime introduced by NVIDIA, is designed to support the secure deployment of autonomous AI agents within enterprise environments.

According to NVIDIA, OpenShell applies security controls at the infrastructure level rather than within the model or application layer. The runtime ensures that each agent operates inside an isolated sandbox, where system-level policies define and enforce permissions, resource access, and operational constraints.

The company states that such an approach separates agent behaviour from policy enforcement, preventing agents from overriding security controls or accessing restricted data.

OpenShell enables organisations to define and monitor a unified policy layer governing how autonomous systems interact with files, tools, and enterprise workflows.

Additionally, OpenShell forms part of the NVIDIA Agent Toolkit and is complemented by NemoClaw, a reference stack designed to support the deployment of continuously operating AI assistants.

NVIDIA indicates that the system can run across cloud, on-premises, and local computing environments, while maintaining consistent policy enforcement.

The company also reports collaboration with industry partners, including Cisco, CrowdStrike, Google Cloud, and Microsoft Security, to align security practices for AI agent deployment. Both OpenShell and NemoClaw are currently in early preview.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!

Sydney set to become hub for AI innovation with Oracle centre

Oracle has launched the AI Customer Excellence Centre (AI CEC) in Sydney to help organisations adopt and scale AI technologies across Australia and Oceania. The centre will act as a hub for collaboration and skills, letting businesses test AI solutions in real-world settings.

The AI CEC provides access to Oracle and partner technologies, with flexible deployment options through Oracle Cloud Infrastructure (OCI). Organisations can receive training, test early-stage AI innovations, and pilot proof-of-concept projects in secure cloud environments.

The centre supports industries such as healthcare, public sector, financial services, and telecommunications, helping companies accelerate AI adoption while improving efficiency and decision-making.

Experts highlight the centre’s potential to bridge the gap between AI experimentation and measurable business impact. Rising compute demand shows AI moving from pilots to production, while hands-on testing helps organisations reduce risk and validate initiatives.

Oracle plans to continue collaborating with governments, partners, and industry to ensure responsible, secure, and trustworthy AI adoption, reinforcing Australia’s position as a leader in the digital economy.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

Tokenised assets set to transform European capital markets

Piero Cipollone, Member of the Executive Board of the ECB, at an event on ‘Building Europe’s integrated digital asset ecosystem: from vision to implementation,’ highlighted Europe’s progress in tokenised financial markets.

Since 2021, European issuers have placed nearly €4 billion in DLT-based fixed-income instruments, including the first digital sovereign debt by EU Member States. Eurosystem trials in 2024 processed €1.6 billion in transactions, showing strong demand for central bank money settlement in digital markets.

Tokenisation enables the full lifecycle of transactions on distributed ledgers, often automated through smart contracts.

Fragmentation across DLT platforms and the absence of a widely accepted on-chain settlement asset are holding back market expansion. Private assets, including stablecoins, carry volatility and credit risks, making a central bank money anchor crucial.

The Pontes platform, launching in Q3 2026, is expected to provide secure settlement across DLT platforms and TARGET services, supporting features like smart contracts and 24/7 operation.

The Appia roadmap outlines a longer-term vision for an integrated European tokenised ecosystem by 2028, covering technical standards, interoperability, collateral management, and cross-border connectivity.

Collaboration between the public and private sectors is critical. Feedback from 64 industry participants shaped Pontes, while Appia engages stakeholders to establish standards and ensure interoperability.

Harmonised legal frameworks are equally important to reduce post-trade fragmentation and support seamless asset transfers across EU Member States. Without coordinated laws, tokenised markets risk inefficiency despite advanced technology.

Europe is building momentum but faces intense global competition. Secure settlement, stakeholder collaboration, and legal harmonisation could make the EU a leader in digital finance with a single tokenised market.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot

Pinterest chief calls for stricter youth rules

The chief executive of Pinterest has voiced support for governments banning access to social media for people under 16. He cited rising concerns about mental health, screen addiction and online harms among young users.

He praised the Australian decision to ban social media for under-16s and urged other nations to adopt similar protections. He argued that existing tech safety measures have fallen short of keeping children secure online.

The executive warned that AI enhancements in social platforms may amplify behavioural influence on teens. He compared the inaction by tech companies to past resistance by harmful industries to public health safeguards.

He also highlighted surveys showing parental worries about explicit content and excessive screen time. Pinterest’s view supports calls for clear age limits, better tools for parents and stronger platform accountability.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot