Qwen3-Max-Thinking hits perfect scores as Alibaba raises the bar on AI reasoning

Built on Qwen3-Max, the thinking variant emphasises deliberate, step-by-step solutions for algebra, number theory, and probability.

Alibaba's Qwen3-Max-Thinking scores 100 percent on AIME 2025 and HMMT, matching OpenAI's top model on elite reasoning tests.

Alibaba unveiled Qwen3-Max-Thinking, which scored 100 percent on AIME 2025 and HMMT, matching OpenAI’s top model on reasoning tests. It targets high-precision problem-solving across algebra, number theory, and probability. Researchers regard elite maths contests as strong proxies for reasoning.

Built on Qwen3-Max, a trillion-parameter flagship, the thinking variant emphasises step-by-step solutions. Alibaba says it matches or beats Claude Opus 4, DeepSeek V3.1, Grok 4, and GPT-5 Pro. Positioning stresses accuracy, traceability, and controllable latency.

Signal from a live trading trial added momentum. In a two-week crypto experiment, Qwen3-Max returned 22.3 percent on 10,000 US dollars. Competing systems underperformed, with DeepSeek at 4.9 percent and several US models booking losses.

Access is available via the Qwen web chatbot and Alibaba Cloud APIs. Early adopters can test tool use and stepwise reasoning on technical tasks. Enterprises are exploring finance, research, and operations cases requiring reliability and auditability.

Alibaba researchers say further tuning will broaden task coverage without diluting peak maths performance. Plans include multilingual reasoning, safety alignment, and robustness under distribution shift. Community benchmarks and contests will track progress.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!