Google’s Gemini AI completes Pokémon Blue with a little help

An independent experiment has sparked excitement across the AI world by pushing a powerful language model into the unpredictable realm of retro gaming.

The Gemini 2.5 Pro model shows notable improvements over previous versions, scoring well in maths and science benchmarks, and excelling in coding tasks like web app creation.

Google’s cutting-edge AI model, Gemini 2.5 Pro, has made headlines by completing the 1996 classic video game Pokémon Blue. While Google didn’t achieve the feat directly, it was orchestrated by Joel Z, an independent software engineer who created a livestream called Gemini Plays Pokémon.

Despite being unaffiliated with the tech giant, Joel’s project has drawn enthusiastic support from Google executives, including CEO Sundar Pichai, who celebrated the victory on social media. The challenge of beating a game like Pokémon Blue has become an informal benchmark for testing the reasoning and adaptability of large language models.

Earlier this year, AI company Anthropic revealed its Claude model was making strides in a similar title, Pokémon Red, but has yet to complete it. While comparisons between the two AIs are inevitable, Joel Z clarified that such evaluations are flawed due to differences in tools, data access, and gameplay frameworks.

To play the game, Gemini relied on a complex system called an ‘agent harness,’ which feeds the model visual and contextual information from the game and translates its decisions into gameplay actions. Joel admits to making occasional interventions to improve Gemini’s reasoning but insists these did not include cheats or explicit hints. Instead, his guidance was limited to refining the model’s problem-solving capabilities.

The project remains a work in progress, and Joel continues to enhance the framework behind Gemini’s gameplay. While it may not be an official benchmark for AI performance, the achievement is a playful demonstration of how far AI systems have come in tackling creative and unexpected challenges.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!