OpenAI model revises proof claim
Experts review frontier reasoning challenge results.
OpenAI has published its attempts to solve all 10 problems in the First Proof challenge, a research-level maths test designed to assess whether AI can produce checkable, domain-specific proofs. Leading experts created the issues and require extended reasoning rather than short answers.
The company said at least five of its proof attempts are likely correct following expert feedback, although one previously confident submission has now been judged incorrect. Several other attempts remain under review as specialists continue to assess the arguments.
According to OpenAI, the evaluation involved limited human supervision, with researchers sometimes prompting the model to refine or clarify reasoning. The process included exchanges between an internal model and ChatGPT for verification, formatting and style adjustments.
OpenAI described frontier research challenges, such as First Proof, as crucial for testing next-generation AI systems. The company said it plans to deepen its engagement with academics to develop more rigorous evaluation frameworks for research-grade reasoning.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
