DeepMind Blog · 12 Dec

Improved Gemini audio models for powerful voice experiences

productsmodel

Google DeepMind has released an upgraded Gemini 2.5 Flash Native Audio model, specifically designed to power more capable live voice agents.

The update significantly improves the model's function calling precision and its ability to follow complex user instructions.

Conversations now proceed more smoothly, with the model capable of retrieving and using context from previous conversation turns.

This contextual awareness enables more natural, flowing dialogues rather than disconnected question-answer exchanges.

Earlier this week, Google also upgraded its Gemini 2.5 Pro and Flash Text-to-Speech models, giving developers greater control over expressive audio generation.

Together, these updates address both sides of voice interactions: understanding spoken input and generating natural-sounding audio output.

A new live speech translation feature is now available in the Google Translate app beta, launching first on Android in the US, Mexico, and India.

This translation capability handles over 70 languages while preserving the speaker's intonation and vocal characteristics in real-time.

Developers can immediately begin building sophisticated voice agents using Gemini 2.5 Flash Native Audio through Google's Vertex AI platform.

These improvements position Google competitively in the rapidly evolving voice AI space, where natural conversation handling is increasingly critical.

Read original → deepmind.google