Apple study finds AI fails on complex tasks
Large reasoning models reduce effort as tasks grow harder.
A recent study by Apple researchers exposed significant limitations in the capabilities of advanced AI systems and huge reasoning models (LRMs).
Apple’s team suggested this may point to a fundamental limit in how current AI models scale up to general reasoning.
These models, designed to solve complex problems through step-by-step thinking, experienced what the paper called a ‘complete accuracy collapse’ when faced with high-complexity tasks. Even when given an algorithm that should have ensured success, the models failed to deliver correct solutions.
The study found that LRMs performed well with low- and medium-difficulty tasks but deteriorated sharply as the complexity increased.
Rather than increasing their effort as problems became harder, the models reduced their reasoning paradoxically, leading to complete failure.
Experts, including AI researcher Gary Marcus and University of Surrey’s Andrew Rogoyski in the UK, called the findings alarming and indicative of a potential dead end in current AI development.
The study tested systems from OpenAI, Google, Anthropic and DeepSeek, raising serious questions about how close the industry is to achieving AGI.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!