Gemini-TTS

Text-to-Speech Gemini-TTS is the latest evolution of our Text-to-Speech technology that's moving beyond just naturalness to giving granular control over generated audio using text-based prompts. Using Gemini-TTS, you can synthesize speech from short snippets to long-form narratives, precisely dictating style, accent, pace, tone, and even emotional expression, all steerable through natural-language prompts.

Gemini-TTS capabilities are supported by the following:

  • gemini-2.5-flash-preview-tts : Gemini 2.5 Flash Preview is good for cost-efficient everyday applications.

  • gemini-2.5-pro-preview-tts : Gemini 2.5 Pro Preview is good for controllable speech generation (TTS) and for state-of-the-art quality of complex prompts.

Model Optimized for Input modality Output modality Single speaker
Gemini 2.5 Flash Preview TTS
Low latency, controllable, single- and multi-speaker Text-to-Speech audio generation for cost-efficient everyday applications Text Audio ✔️
Gemini 2.5 Pro Preview TTS
High control for structured workflows like podcast generation, audiobooks, customer support, and more Text Audio ✔️

Additional controls and capabilities include the following:

  1. Natural conversation: Voice interactions of remarkable quality, more appropriate expressivity, and prosody (patterns of rhythm) are delivered with very low latency so you can converse fluidly.

  2. Style control: Using natural language prompts, you can adapt the delivery within the conversation by steering it to adopt specific accents and produce a range of tones and expressions including a whisper.

  3. Dynamic performance: These models can bring text to life for expressive readings of poetry, newscasts, and engaging storytelling. They can also perform with specific emotions and produce accents when requested.

  4. Enhanced pace and pronunciation control: Controlling delivery speed helps to ensure more accuracy in pronunciation including specific words.

Examples

model: "gemini-2.5-pro-preview-tts"
prompt: "You are having a casual conversation with a friend. Say the following in a friendly and amused way."
text: "hahah I did NOT expect that. Can you believe it!."
speaker: "Callirhoe"

model: "gemini-2.5-flash-preview-tts"
prompt: "Say the following in a curious way"
text: "OK, so... tell me about this [uhm] AI thing.",
speaker: "Orus"

model: "gemini-2.5-flash-preview-tts"
prompt: "Say the following"
text: "[extremely fast] Availability and terms may vary. Check our website or your local store for complete details and restrictions."
speaker: "Kore"

See Use Gemini-TTS section for details on how to use these voices programmatically.

Voice Options

Gemini-TTS offers a wide range of voice options similar to our existing Chirp 3: HD Voices, each with distinct characteristics:

Name Gender Demo
Achernar
Female
Achird
Male
Algenib
Male
Algieba
Male
Alnilam
Male
Aoede
Female
Autonoe
Female
Callirrhoe
Female
Charon
Male
Despina
Female
Enceladus
Male
Erinome
Female
Fenrir
Male
Gacrux
Female
Iapetus
Male
Kore
Female
Laomedeia
Female
Leda
Female
Orus
Male
Pulcherrima
Female
Puck
Male
Rasalgethi
Male
Sadachbia
Male
Sadaltager
Male
Schedar
Male
Sulafat
Female
Umbriel
Male
Vindemiatrix
Female
Zephyr
Female
Zubenelgenubi
Male

Language availability

Gemini-TTS offers a wide range of voice options similar to our existing Chirp 3: HD Voices, each with distinct characteristics:

Language BCP-47 Code
English (United States) en-US

Regional availability

Gemini-TTS models are available in the following Google Cloud regions respectively:

Google Cloud zone Launch readiness
us Public Preview

Supported output formats

The default response format is LINEAR16 . Other supported formats include the following:

API method Format
batch ALAW, MULAW, MP3, OGG_OPUS, and PCM

Use Gemini-TTS

Discover how to use Gemini-TTS models to synthesize single-speaker speech.

Perform synchronous speech synthesis request

Python

  # google-cloud-texttospeech minimum version 2.29.0 is required. 
 import 
  
 os 
 from 
  
 google.cloud 
  
 import 
 texttospeech 
 PROJECT_ID 
 = 
 os 
 . 
 getenv 
 ( 
 "GOOGLE_CLOUD_PROJECT" 
 ) 
 def 
  
 synthesize 
 ( 
 prompt 
 : 
 str 
 , 
 text 
 : 
 str 
 , 
 model_name 
 : 
 str 
 , 
 output_filepath 
 : 
 str 
 = 
 "output.mp3" 
 ): 
  
 """Synthesizes speech from the input text and saves it to an MP3 file. 
 Args: 
 prompt: Stylisting instructions on how to synthesize the content in 
 the text field. 
 text: The text to synthesize. 
 model_name: Gemini model to use. Currently, the available models are 
 gemini-2.5-flash-preview-tts and gemini-2.5-pro-preview-tts 
 output_filepath: The path to save the generated audio file. 
 Defaults to "output.mp3". 
 """ 
 client 
 = 
 texttospeech 
 . 
  TextToSpeechClient 
 
 () 
 synthesis_input 
 = 
 texttospeech 
 . 
  SynthesisInput 
 
 ( 
 text 
 = 
 text 
 , 
 prompt 
 = 
 prompt 
 ) 
 # Select the voice you want to use. 
 voice 
 = 
 texttospeech 
 . 
  VoiceSelectionParams 
 
 ( 
 language_code 
 = 
 "en-US" 
 , 
 name 
 = 
 "Charon" 
 , 
 # Example voice, adjust as needed 
 model_name 
 = 
 model_name 
 ) 
 audio_config 
 = 
 texttospeech 
 . 
  AudioConfig 
 
 ( 
 audio_encoding 
 = 
 texttospeech 
 . 
  AudioEncoding 
 
 . 
 MP3 
 ) 
 # Perform the text-to-speech request on the text input with the selected 
 # voice parameters and audio file type. 
 response 
 = 
 client 
 . 
  synthesize_speech 
 
 ( 
 input 
 = 
 synthesis_input 
 , 
 voice 
 = 
 voice 
 , 
 audio_config 
 = 
 audio_config 
 ) 
 # The response's audio_content is binary. 
 with 
 open 
 ( 
 output_filepath 
 , 
 "wb" 
 ) 
 as 
 out 
 : 
 out 
 . 
 write 
 ( 
 response 
 . 
 audio_content 
 ) 
 print 
 ( 
 f 
 "Audio content written to file: 
 { 
 output_filepath 
 } 
 " 
 ) 
 

CURL

  # Make sure to install gcloud cli, and sign in to your project. 
 # Make sure to use your PROJECT_ID value. 
 # Currently, the available models are gemini-2.5-flash-preview-tts and gemini-2.5-pro-preview-tts 
 # To parse the JSON output and use it directly see the last line of the command. 
 # Requires JQ and ffplay library to be installed. 
 PROJECT_ID 
 = 
 YOUR_PROJECT_ID 
curl  
-X  
POST  
 \ 
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
application-default  
print-access-token ) 
 " 
  
 \ 
-H  
 "x-goog-user-project: 
 $PROJECT_ID 
 " 
  
 \ 
-H  
 "Content-Type: application/json" 
  
 \ 
-d  
 '{ 
 "input": { 
 "prompt": "Say the following in a curious way", 
 "text": "OK, so... tell me about this [uhm] AI thing." 
 }, 
 "voice": { 
 "languageCode": "en-us", 
 "name": "Kore", 
 "model_name": "gemini-2.5-flash-preview-tts" 
 }, 
 "audioConfig": { 
 "audioEncoding": "LINEAR16" 
 } 
 }' 
  
 \ 
 "https://texttospeech.googleapis.com/v1/text:synthesize" 
  
 \ 
 | 
  
jq  
-r  
 '.audioContent' 
  
 | 
  
base64  
-d  
 | 
  
ffplay  
-  
-autoexit 
Design a Mobile Site
View Site in Mobile | Classic
Share by: