4 Mar 2026

Developers gain early access to Gemini 3.1 Flash-Lite

Benchmarks show the new model surpasses prior Gemini generations, with faster response times and strong reasoning capabilities.

Google’s Gemini 3.1 Flash-Lite has launched in preview for developers via AI Studio and for enterprises through Vertex AI. Designed for high-volume workloads, it promises fast, cost-effective performance while maintaining high-quality outputs.

Priced at just $0.25 per million input tokens and $1.50 per million output tokens, 3.1 Flash-Lite offers 2.5X faster response times and 45% higher output speed than the previous 2.5 Flash model.

Benchmarks show strong performance across reasoning and multimodal tasks, including an Elo score of 1432 on Arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro, surpassing some older, larger Gemini models.

The model also provides adaptive intelligence features, allowing developers to adjust how much the AI ‘thinks’ for each task. The model handles both high-frequency tasks, such as translation, and complex tasks, such as interface generation and simulations.

Early-access developers and companies report that 3.1 Flash-Lite handles complex workloads with precision comparable to larger models. Its speed, affordability, and reasoning capabilities make it an attractive choice for scalable, real-time AI applications.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!