Gemini 2.5 Computer Use brings human-like interface control to AI agents

Google’s latest Gemini AI model blends visual reasoning, web control and multi-layer safety features to power next-generation autonomous agents for real-world applications.

The Gemini 2.5 Computer Use model extends Gemini 2.5 Pro with capabilities for navigating UIs, filling forms and automating complex digital workflows via the Gemini API.

Google DeepMind has launched the Gemini 2.5 Computer Use model, a specialised version of Gemini 2.5 Pro designed to let AI agents interact directly with digital user interfaces.

Available in preview through the Gemini API, developers can build agents capable of performing web and mobile tasks such as form-filling, navigation and interaction within apps.

Unlike models limited to structured APIs, Gemini 2.5 Computer Use can reason visually about what it sees on screen, making it possible to complete tasks requiring clicks, scrolls and text input.

While maintaining low latency, it outperforms rivals on several benchmarks, including Browserbase’s Online-Mind2Web and WebVoyager.

The model’s safety design includes per-step risk checks, built-in safeguards against misuse and developer-controlled restrictions on high-risk actions such as payments or security changes.

Google has already integrated it into systems like Project Mariner, Firebase Testing Agent and AI Mode in Search, while early testers report faster, more reliable automation.

Gemini 2.5 Computer Use is now available in public preview via Google AI Studio and Vertex AI, enabling developers to experiment with advanced interface-aware agents that can perform complex digital workflows securely and efficiently.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!