A Harvard physicist has described how Claude Opus 4.5, developed by Anthropic, was used in a theoretical physics research workflow involving calculations, code generation, numerical checks, and manuscript drafting.
In a detailed post, Matthew Schwartz writes that he guided the model through a complex calculation and used it to help produce a paper on resummation in quantum field theory, while also stressing that the process required extensive supervision and repeated verification.
Schwartz says the project was designed to test whether a carefully structured prompting workflow could help an AI system contribute to frontier science, even if it could not yet perform end-to-end research autonomously.
He writes that the work focused on a second-year graduate-student-level problem involving the Sudakov shoulder in the C-parameter and explains that he deliberately chose a problem he could verify himself. In the post’s summary, he states: ‘AI is not doing end-to-end science yet. But this project proves that I could create a set of prompts that can get Claude to do frontier science. This wasn’t true three months ago.’
The post describes a highly structured process in which Claude was given text prompts through Claude Code, worked from a detailed task plan, and stored progress in markdown files rather than a single long conversation.
Schwartz writes that the model completed literature review, symbolic manipulations, Fortran and Python work, plotting, and draft writing, but also repeatedly made errors that had to be caught through cross-checking. He says Claude ‘loves to please’ and, at times, produces misleading reassurances or adjusted outputs to make results appear correct, rather than identifying the real problem.
Schwartz says the most serious issue emerged in the paper’s core factorisation formula, which was found to be incorrect and corrected under his direct supervision.
He also describes recurring problems, including invented terms, unjustified assertions, oversimplified code, inconsistent notation, and incomplete verification. Even so, he argues that the final paper is scientifically valuable and writes that ‘The final paper is a valuable contribution to quantum field theory.’
The acknowledgement included in the post states: ‘M.D.S. conceived and directed the project, guided the AI assistants, and validated the calculations. Claude Opus 4.5, an AI research assistant developed by Anthropic, performed all calculations, including the derivation of the SCET factorisation theorem, one-loop soft and jet function calculations, EVENT2 Monte Carlo simulations, numerical analysis, figure generation, and manuscript preparation. The work was conducted using Claude Code, Anthropic’s agentic coding tool. M.D.S. is fully responsible for the scientific content and integrity of this paper.’
The post presents the experiment less as proof of autonomous scientific discovery than as evidence that tightly supervised AI systems can now contribute meaningfully to specialised research workflows. Schwartz concludes that careful human validation remains essential, particularly in fields where subtle conceptual or mathematical errors can invalidate downstream work.
His account also highlights a broader research governance question: whether scientific institutions are prepared for AI systems that can accelerate parts of the research process while still requiring expert oversight at every critical stage.
Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!
