10 Jul 2025

xAI unveils Grok 4 with top benchmark scores

Elon Musk revealed Grok 4, claiming it surpasses PhD-level knowledge, despite recent backlash over offensive AI-generated content.

Elon Musk’s AI company, xAI, has launched its latest flagship model, Grok 4, alongside an ultra-premium $300 monthly plan named SuperGrok Heavy.

Grok 4, which competes with OpenAI’s ChatGPT and Google’s Gemini, can handle complex queries and interpret images. It is now integrated more deeply into the social media platform X, which Musk also owns.

Despite recent controversy, including antisemitic responses generated by Grok’s official X account, xAI focused on showcasing the model’s performance.

Musk claimed Grok 4 is ‘better than PhD level’ in all academic subjects and revealed a high-performing version called Grok 4 Heavy, which uses multiple AI agents to solve problems collaboratively.

The models scored strongly on benchmark exams, including a 25.4% score for Grok 4 on Humanity’s Last Exam, outperforming major rivals. With tools enabled, Grok 4 Heavy reached 44.4%, nearly doubling OpenAI’s and Google’s results.

It also achieved a leading score of 16.2% on the ARC-AGI-2 pattern recognition test, nearly double that of Claude Opus 4.

xAI is targeting developers through its API and enterprise partnerships while teasing upcoming tools: an AI coding model in August, a multi-modal agent in September, and video generation in October.

Yet the road ahead may be rocky, as the company works to overcome trust issues and position Grok as a serious rival in the AI arms race.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!