Text-to-Speech Gemini-TTS is the latest evolution of our Text-to-Speech technology that's moving beyond just naturalness to giving granular control over generated audio using text-based prompts. Using Gemini-TTS, you can synthesize speech from short snippets to long-form narratives, precisely dictating style, accent, pace, tone, and even emotional expression, all steerable through natural-language prompts.
Gemini-TTS capabilities are supported by the following:
-
gemini-2.5-flash-preview-tts
: Gemini 2.5 Flash Preview is good for cost-efficient everyday applications. -
gemini-2.5-pro-preview-tts
: Gemini 2.5 Pro Preview is good for controllable speech generation (TTS) and for state-of-the-art quality of complex prompts.
Model | Optimized for | Input modality | Output modality | Single speaker |
---|---|---|---|---|
Gemini 2.5 Flash Preview TTS
|
Low latency, controllable, single- and multi-speaker Text-to-Speech audio generation for cost-efficient everyday applications | Text | Audio | ✔️ |
Gemini 2.5 Pro Preview TTS
|
High control for structured workflows like podcast generation, audiobooks, customer support, and more | Text | Audio | ✔️ |
Additional controls and capabilities include the following:
-
Natural conversation: Voice interactions of remarkable quality, more appropriate expressivity, and prosody (patterns of rhythm) are delivered with very low latency so you can converse fluidly.
-
Style control: Using natural language prompts, you can adapt the delivery within the conversation by steering it to adopt specific accents and produce a range of tones and expressions including a whisper.
-
Dynamic performance: These models can bring text to life for expressive readings of poetry, newscasts, and engaging storytelling. They can also perform with specific emotions and produce accents when requested.
-
Enhanced pace and pronunciation control: Controlling delivery speed helps to ensure more accuracy in pronunciation including specific words.
Examples
model: "gemini-2.5-pro-preview-tts" prompt: "You are having a casual conversation with a friend. Say the following in a friendly and amused way." text: "hahah I did NOT expect that. Can you believe it!." speaker: "Callirhoe"
model: "gemini-2.5-flash-preview-tts" prompt: "Say the following in a curious way" text: "OK, so... tell me about this [uhm] AI thing.", speaker: "Orus"
model: "gemini-2.5-flash-preview-tts" prompt: "Say the following" text: "[extremely fast] Availability and terms may vary. Check our website or your local store for complete details and restrictions." speaker: "Kore"
See Use Gemini-TTS section for details on how to use these voices programmatically.
Voice Options
Gemini-TTS offers a wide range of voice options similar to our existing Chirp 3: HD Voices, each with distinct characteristics:
Name | Gender | Demo |
---|---|---|
Achernar
|
Female | |
Achird
|
Male | |
Algenib
|
Male | |
Algieba
|
Male | |
Alnilam
|
Male | |
Aoede
|
Female | |
Autonoe
|
Female | |
Callirrhoe
|
Female | |
Charon
|
Male | |
Despina
|
Female | |
Enceladus
|
Male | |
Erinome
|
Female | |
Fenrir
|
Male | |
Gacrux
|
Female | |
Iapetus
|
Male | |
Kore
|
Female | |
Laomedeia
|
Female | |
Leda
|
Female | |
Orus
|
Male | |
Pulcherrima
|
Female | |
Puck
|
Male | |
Rasalgethi
|
Male | |
Sadachbia
|
Male | |
Sadaltager
|
Male | |
Schedar
|
Male | |
Sulafat
|
Female | |
Umbriel
|
Male | |
Vindemiatrix
|
Female | |
Zephyr
|
Female | |
Zubenelgenubi
|
Male |
Language availability
Gemini-TTS offers a wide range of voice options similar to our existing Chirp 3: HD Voices, each with distinct characteristics:
Language | BCP-47 Code |
---|---|
English (United States) | en-US |
Regional availability
Gemini-TTS models are available in the following Google Cloud regions respectively:
Google Cloud zone | Launch readiness |
---|---|
us
|
Public Preview |
Supported output formats
The default response format is LINEAR16
. Other supported formats include the following:
API method | Format |
---|---|
batch
|
ALAW, MULAW, MP3, OGG_OPUS, and PCM |
Use Gemini-TTS
Discover how to use Gemini-TTS models to synthesize single-speaker speech.
Perform synchronous speech synthesis request
Python
# google-cloud-texttospeech minimum version 2.29.0 is required.
import
os
from
google.cloud
import
texttospeech
PROJECT_ID
=
os
.
getenv
(
"GOOGLE_CLOUD_PROJECT"
)
def
synthesize
(
prompt
:
str
,
text
:
str
,
model_name
:
str
,
output_filepath
:
str
=
"output.mp3"
):
"""Synthesizes speech from the input text and saves it to an MP3 file.
Args:
prompt: Stylisting instructions on how to synthesize the content in
the text field.
text: The text to synthesize.
model_name: Gemini model to use. Currently, the available models are
gemini-2.5-flash-preview-tts and gemini-2.5-pro-preview-tts
output_filepath: The path to save the generated audio file.
Defaults to "output.mp3".
"""
client
=
texttospeech
.
TextToSpeechClient
()
synthesis_input
=
texttospeech
.
SynthesisInput
(
text
=
text
,
prompt
=
prompt
)
# Select the voice you want to use.
voice
=
texttospeech
.
VoiceSelectionParams
(
language_code
=
"en-US"
,
name
=
"Charon"
,
# Example voice, adjust as needed
model_name
=
model_name
)
audio_config
=
texttospeech
.
AudioConfig
(
audio_encoding
=
texttospeech
.
AudioEncoding
.
MP3
)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type.
response
=
client
.
synthesize_speech
(
input
=
synthesis_input
,
voice
=
voice
,
audio_config
=
audio_config
)
# The response's audio_content is binary.
with
open
(
output_filepath
,
"wb"
)
as
out
:
out
.
write
(
response
.
audio_content
)
print
(
f
"Audio content written to file:
{
output_filepath
}
"
)
CURL
# Make sure to install gcloud cli, and sign in to your project.
# Make sure to use your PROJECT_ID value.
# Currently, the available models are gemini-2.5-flash-preview-tts and gemini-2.5-pro-preview-tts
# To parse the JSON output and use it directly see the last line of the command.
# Requires JQ and ffplay library to be installed.
PROJECT_ID
=
YOUR_PROJECT_ID
curl
-X
POST
\
-H
"Authorization: Bearer
$(
gcloud
auth
application-default
print-access-token )
"
\
-H
"x-goog-user-project:
$PROJECT_ID
"
\
-H
"Content-Type: application/json"
\
-d
'{
"input": {
"prompt": "Say the following in a curious way",
"text": "OK, so... tell me about this [uhm] AI thing."
},
"voice": {
"languageCode": "en-us",
"name": "Kore",
"model_name": "gemini-2.5-flash-preview-tts"
},
"audioConfig": {
"audioEncoding": "LINEAR16"
}
}'
\
"https://texttospeech.googleapis.com/v1/text:synthesize"
\
|
jq
-r
'.audioContent'
|
base64
-d
|
ffplay
-
-autoexit