Alibaba’s Qwen3 upgrade beats OpenAI and DeepSeek on benchmarks

Token context is extended to 256k, enabling longer and more complex document handling in non‑thinking mode.

Alibaba Qwen3 update, open‑source AI, AI coding benchmark, Qwen3‑235B, MultiPL‑E, AIMEE 2025 score, non‑thinking mode, AI context window, Chinese AI rivalry, DeepSeek R1, AI reasoning capabilities, HuggingFace Qwen3

Alibaba has unveiled a significant update to its flagship open‑source Qwen3 family, spotlighting the Qwen3‑235B‑A22B‑Instruct‑2507‑FP8 model.

However, this revision delivers enhanced capabilities across multiple domains, such as instruction understanding, logical reasoning, text analysis, mathematics, science, coding, and tool integration, and pushes Qwen3 to the top of several key benchmarks.

The upgraded model scored 70.3 on the American Invitational Mathematics Exam in competitive metrics, well ahead of DeepSeek‑V3 (46.6) and OpenAI’s GPT‑4o (26.7).

In MultiPL‑E, which evaluates coding, it achieved 87.9, beating DeepSeek (82.2) and OpenAI (82.7), though Anthropic’s Claude Opus 4 edged ahead with 88.5.

A notable technical advancement is the eightfold increase in context capacity to 256k tokens, allowing it to process longer documents in non‑thinking mode.

The open‑source release on reputable platforms like HuggingFace and ModelScope reinforces Alibaba’s commitment to building a transparent, high‑performance AI ecosystem.

This update intensifies competition in China’s AI landscape, with Alibaba closing the benchmark gap versus Western leaders and rival Chinese startups such as DeepSeek, whose upgraded R1‑0528 has reportedly matched Qwen3 in some reasoning tasks.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!