Meta faces backlash over Llama 4 release
Instead of setting a new standard, Meta’s Llama 4 release highlighted how benchmark manipulation risks misleading the AI community.

Over the weekend, Meta unveiled two new Llama 4 models—Scout, a smaller version, and Maverick, a mid-sized variant it claims outperforms OpenAI’s GPT-4o and Google’s Gemini 2.0 Flash across multiple benchmarks.
Maverick quickly climbed to second place on LMArena, an AI benchmarking platform where human evaluators compare and vote on model outputs. Meta proudly pointed to Maverick’s ELO score of 1417, placing it just beneath Gemini 2.5 Pro, instead of trailing behind the usual leaders.
However, AI researchers noticed a critical detail buried in Meta’s documentation: the version of Maverick that ranked so highly wasn’t the one released to the public. Instead of using the standard model, Meta had submitted an ‘experimental’ version specifically optimised for conversations.
LMArena later criticised this move, saying Meta failed to clearly indicate the model was customised, prompting the platform to update its policies to ensure future evaluations remain fair and reproducible.
Meta’s spokesperson acknowledged the use of experimental variants, insisting the company frequently tests different configurations.
While this wasn’t a violation of LMArena’s existing rules, the episode raised concerns about the credibility of benchmark rankings when companies submit fine-tuned models instead of the ones accessible to the wider community.
Independent AI researcher Simon Willison expressed frustration, saying the impressive ranking lost all meaning once it became clear the public couldn’t even use the same version.
The controversy unfolded against a backdrop of mounting competition in open-weight AI, with Meta under pressure following high-profile releases like China’s DeepSeek model.
Instead of offering a smooth rollout, Meta released Llama 4 on a Saturday—an unusual move—which CEO Mark Zuckerberg explained simply as ‘that’s when it was ready.’ But for many in the AI space, the launch has only deepened confusion around what these models can genuinely deliver.
For more information on these topics, visit diplomacy.edu.