The Gemini 3.1 Flash TTS model is now available on Google AI Studio and Vertex AI, offering developers and enterprises enhanced capabilities for creating advanced text-to-speech applications. This model allows for fine-tuned control over speech delivery through the use of over 200 audio tags, making it suitable for a variety of contexts including gaming, banking, and audiobooks.
Key Features
- High Fidelity Speech: Supports over 70 languages with precise control over style, accent, and pacing.
- Watermarked Output: Audio generated is embedded with SynthID to identify AI-generated content.
- Customizable Voice Styles: Choose from 30 prebuilt voices and apply natural language instructions for stylization.
Using Audio Tags
Audio tags are a new feature that allows users to guide vocal style and pacing directly within the text input. The format for embedding tags is as follows:
[pacing tag] + spoken text + [expressive tag] + spoken text + [pause tag] + spoken text
Common tags include:
- [enthusiasm]
- [whispers]
- [short pause]
- [laughs]
Applications of Gemini 3.1 Flash TTS
This model can be utilized in various sectors:
- Accessibility: Provides clear audio for screen readers and communication devices.
- Gaming: Enhances audio descriptions in games, ensuring clarity and engagement.
- Creative Content: Ideal for audiobooks and media, allowing for dramatic storytelling.
- Enterprise Solutions: Useful for banking notifications and customer service communications.
Getting Started
Developers can access Gemini 3.1 Flash TTS through:
- Vertex AI: For scalable applications.
- Google AI Studio: For rapid prototyping and testing.
To learn more about best practices, refer to the developer documentation and resources available in the Google ecosystem.