21 May 2025

Google unveils Veo 3 with audio capabilities

Veo 3 can sync audio like sound effects and speech with AI-generated video, moving beyond the limitations of earlier models.

Google has introduced Veo 3, its most advanced video-generating AI model to date, capable of producing sound effects, ambient noise and dialogue to accompany the footage it creates.

Announced at the Google I/O 2025 developer conference, Veo 3 is available through the Gemini chatbot for those subscribed to the $249.99-per-month AI Ultra plan. The model accepts both text and image prompts, allowing users to generate audiovisual scenes rather than silent clips.

Unlike other AI tools, Veo 3 can analyse raw video pixels to synchronise audio automatically, offering a notable edge in an increasingly crowded field of video-generation platforms. While sound-generating AI isn’t new, Google claims Veo 3’s ability to match audio precisely with visual content sets it apart.

The progress builds on DeepMind’s earlier work in ‘video-to-audio’ AI and may rely on training data from YouTube, though Google hasn’t confirmed this.

To help prevent misuse, such as the creation of deepfakes, Google says Veo 3 includes SynthID, its proprietary watermarking technology that embeds invisible markers in every generated frame. Despite these safeguards, concerns remain within the creative industry.

Artists fear tools like Veo 3 could replace thousands of jobs, with a recent study predicting over 100,000 roles in film and animation could be affected by AI before 2026.

Alongside Veo 3, Google has also updated Veo 2. The earlier model now allows users to edit videos more precisely, adding or removing elements and adjusting camera movements. These features are expected to become available soon on Google’s Vertex AI API platform.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!