Using speech-to-speech translation

The speech-to-speech translation feature uses AI to interpret language, enabling conversations between individuals and systems who speak different languages. Your application can use this feature to process an audio stream containing speech in one language and translate it into another language in real time.

Unlike other Live API features that support turn-based conversations, speech-to-speech translation continuously processes audio input and streams the following outputs as they become available:

Transcription:The recognized text from the input audio stream in the original language.
Translation:The translated text in the target language.
Synthesized audio:An audio stream of the translated text spoken in the target language that matches the original speaker's voice.

Supported models

You can use speech-to-speech translation with the following model:

Model version	Availability level
`gemini-2.5-flash-s2st-11-2025-exp`	Private experimental

Input audio requirements

speech-to-speech translation only supports audio input. For information on supported audio formats, codecs, and specifications like sample rate, see Supported audio formats .

Use speech-to-speech translation

To use speech-to-speech translation, see the following code examples:

Python

 # Set language_code to your desired language, in this case, Mandarin Chinese. 
 speech_config 
 = 
 SpeechConfig 
 ( 
 language_code 
 = 
 "cmn" 
 ) 
 config 
 = 
 LiveConnectConfig 
 ( 
 response_modalities 
 = 
 [ 
 "AUDIO" 
 ], 
 speech_config 
 = 
 speech_config 
 , 
 input_audio_transcription 
 = 
 input_transcription 
 , 
 output_audio_transcription 
 = 
 output_transcription 
 , 
 ) 
 audio_file 
 = 
 Part 
 . 
 from_uri 
 ( 
 file_uri 
 = 
 audio_url 
 , 
 mime_type 
 = 
 "audio/mpeg" 
 ) 
 contents 
 = 
 [ 
 audio_file 
 ] 
 response 
 = 
 client 
 . 
 models 
 . 
 generate_content 
 ( 
 model 
 = 
 MODEL_ID 
 , 
 contents 
 = 
 contents 
 ) 
 display 
 ( 
 Markdown 
 ( 
 response 
 . 
 text 
 ))

Python

 import 
  
 asyncio 
 # Set model generation_config 
 CONFIG 
 = 
 { 
 "response_modalities" 
 : 
 [ 
 "AUDIO" 
 ], 
 "speech_config" 
 : 
 { 
 "language_code" 
 : 
 "cmn" 
 , 
 }, 
 } 
 headers 
 = 
 { 
 "Content-Type" 
 : 
 "application/json" 
 , 
 "Authorization" 
 : 
 f 
 "Bearer 
 { 
 bearer_token 
 [ 
 0 
 ] 
 } 
 " 
 , 
 } 
 # Connect to the server 
 async 
 with 
 connect 
 ( 
 SERVICE_URL 
 , 
 additional_headers 
 = 
 headers 
 ) 
 as 
 ws 
 : 
 # Setup the session 
 await 
 ws 
 . 
 send 
 ( 
 json 
 . 
 dumps 
 ( 
 { 
 "setup" 
 : 
 { 
 "model" 
 : 
 MODEL 
 , 
 "generation_config" 
 : 
 CONFIG 
 , 
 "input_audio_transcription" 
 : 
 {}, 
 "output_audio_transcription" 
 : 
 {}, 
 "enable_speech_to_speech_translation" 
 : 
 True 
 , 
 } 
 } 
 ) 
 ) 
 # Receive setup response 
 raw_response 
 = 
 await 
 ws 
 . 
 recv 
 ( 
 decode 
 = 
 False 
 ) 
 setup_response 
 = 
 json 
 . 
 loads 
 ( 
 raw_response 
 . 
 decode 
 ( 
 "ascii" 
 )) 
 print 
 ( 
 setup_response 
 ) 
 msg 
 = 
 { 
 "realtime_input" 
 : 
 { 
 "audio" 
 : 
 { 
 "mime_type" 
 : 
 "audio/pcm" 
 , 
 "data" 
 : 
 base64 
 . 
 b64encode 
 ( 
 wav_data 
 ) 
 . 
 decode 
 ( 
 'utf-8' 
 ), 
 } 
 } 
 } 
 await 
 ws 
 . 
 send 
 ( 
 json 
 . 
 dumps 
 ( 
 msg 
 )) 
 overall_responses 
 = 
 [] 
 timeout_seconds 
 = 
 10 
 # Set timeout to 3 seconds 
 # Receive chucks of server response with a timeout 
 try 
 : 
 while 
 True 
 : 
 try 
 : 
 raw_response 
 = 
 await 
 asyncio 
 . 
 wait_for 
 ( 
 ws 
 . 
 recv 
 ( 
 decode 
 = 
 False 
 ), 
 timeout_seconds 
 ) 
 response 
 = 
 json 
 . 
 loads 
 ( 
 raw_response 
 . 
 decode 
 ()) 
 server_content 
 = 
 response 
 . 
 pop 
 ( 
 "serverContent" 
 , 
 None 
 ) 
 if 
 server_content 
 is 
 None 
 : 
 break 
 # Input Transcription. 
 input_transcription 
 = 
 server_content 
 . 
 pop 
 ( 
 "inputTranscription" 
 , 
 None 
 ) 
 if 
 input_transcription 
 is 
 not 
 None 
 : 
 raw_text 
 = 
 input_transcription 
 . 
 pop 
 ( 
 "text" 
 , 
 None 
 ) 
 if 
 raw_text 
 is 
 not 
 None 
 : 
 display 
 ( 
 Markdown 
 ( 
 f 
 "**Input >** 
 { 
 raw_text 
 } 
 " 
 )) 
 # Output Transcription. 
 output_transcription 
 = 
 server_content 
 . 
 pop 
 ( 
 "outputTranscription" 
 , 
 None 
 ) 
 if 
 output_transcription 
 is 
 not 
 None 
 : 
 raw_text 
 = 
 output_transcription 
 . 
 pop 
 ( 
 "text" 
 , 
 None 
 ) 
 if 
 raw_text 
 is 
 not 
 None 
 : 
 display 
 ( 
 Markdown 
 ( 
 f 
 "**Response >** 
 { 
 raw_text 
 } 
 " 
 )) 
 model_turn 
 = 
 server_content 
 . 
 pop 
 ( 
 "modelTurn" 
 , 
 None 
 ) 
 if 
 model_turn 
 is 
 not 
 None 
 : 
 parts 
 = 
 model_turn 
 . 
 pop 
 ( 
 "parts" 
 , 
 None 
 ) 
 if 
 parts 
 is 
 not 
 None 
 : 
 for 
 part 
 in 
 parts 
 : 
 pcm_data 
 = 
 base64 
 . 
 b64decode 
 ( 
 part 
 [ 
 "inlineData" 
 ][ 
 "data" 
 ]) 
 overall_responses 
 . 
 append 
 ( 
 np 
 . 
 frombuffer 
 ( 
 pcm_data 
 , 
 dtype 
 = 
 np 
 . 
 int16 
 )) 
 # End of turn 
 # turn_complete = server_content.pop("turnComplete", None) 
 # if turn_complete: 
 #     break 
 except 
 asyncio 
 . 
 TimeoutError 
 : 
 print 
 ( 
 f 
 "Timeout: No response received from the websocket within 
 { 
 timeout_seconds 
 } 
 seconds." 
 ) 
 if 
 overall_responses 
 : 
 display 
 ( 
 Audio 
 ( 
 np 
 . 
 concatenate 
 ( 
 overall_responses 
 ), 
 rate 
 = 
 24000 
 , 
 autoplay 
 = 
 True 
 )) 
 break 
 # Exit the loop on timeout 
 except 
 websockets 
 . 
 exceptions 
 . 
 ConnectionClosed 
 as 
 e 
 : 
 print 
 ( 
 f 
 "Connection closed by exception, code: 
 { 
 e 
 . 
 code 
 } 
 , reason: 
 { 
 e 
 . 
 reason 
 } 
 " 
 ) 
 if 
 overall_responses 
 : 
 display 
 ( 
 Audio 
 ( 
 np 
 . 
 concatenate 
 ( 
 overall_responses 
 ), 
 rate 
 = 
 24000 
 , 
 autoplay 
 = 
 True 
 )) 
 break 
 # Exit the loop on connection closed 
 except 
 Exception 
 as 
 e 
 : 
 print 
 ( 
 f 
 "An unexpected error occurred: 
 { 
 e 
 } 
 " 
 ) 
 if 
 overall_responses 
 : 
 display 
 ( 
 Audio 
 ( 
 np 
 . 
 concatenate 
 ( 
 overall_responses 
 ), 
 rate 
 = 
 24000 
 , 
 autoplay 
 = 
 True 
 )) 
 break 
 # Exit the loop on other exceptions 
 finally 
 : 
 try 
 : 
 await 
 ws 
 . 
 close 
 ( 
 code 
 = 
 1000 
 , 
 reason 
 = 
 "Normal closure" 
 ) 
 #example close 
 except 
 websockets 
 . 
 exceptions 
 . 
 ConnectionClosed 
 as 
 e 
 : 
 print 
 ( 
 f 
 "Connection closed by exception, code: 
 { 
 e 
 . 
 code 
 } 
 , reason: 
 { 
 e 
 . 
 reason 
 } 
 " 
 ) 
 except 
 Exception 
 as 
 e 
 : 
 print 
 ( 
 f 
 "An unexpected error occurred: 
 { 
 e 
 } 
 " 
 )

Supported languages

Language Code	Language
aa	Afar
ab	Abkhazian
ace	Achinese
ach	Acoli
af	Afrikaans
ak	Akan
alz	Alur
am	Amharic
an	Aragonese
ar	Arabic
as	Assamese
av	Avaric
awa	Awadhi
ay	Aymara
az	Azerbaijani
ba	Bashkir
bal	Baluchi
ban	Balinese
bbc	Batak Toba
bci	Baoulé
be	Belarusian
bem	Bemba
ber	Berber
bew	Betawi
bg	Bulgarian
bgc	Haryanvi
bho	Bhojpuri
bi	Bislama
bm	Bambara
bn	Bengali
bo	Tibetan
br	Breton
bs	Bosnian
bts	Batak Simalungun
btx	Batak Karo
ca	Catalan
ce	Chechen
ceb	Cebuano
cgg	Chiga
ch	Chamorro
chk	Chuukese
cmn	Mandarin Chinese
cnh	Hakha Chin
co	Corsican
cr	Cree
crh	Crimean Tatar
crs	Seselwa Creole French
cs	Czech
cv	Chuvash
cy	Welsh
da	Danish
de	German
din	Dinka
doi	Dogri
dov	Dombe
dv	Divehi
dyu	Dyula
dz	Dzongkha
ee	Ewe
el	Greek
en	English
eo	Esperanto
es	Spanish
et	Estonian
eu	Basque
fa	Farsi
ff	Fulah
fi	Finnish
fil	Filipino
fj	Fijian
fo	Faroese
fon	Fon
fr	French
fur	Friulian
fy	Western Frisian
ga	Irish
gaa	Ga
gd	Gaelic
gl	Galician
gn	Guarani
gu	Gujarati
gv	Manx
ha	Hausa
haw	Hawaiian
he	Hebrew
hi	Hindi
hil	Hiligaynon
hmn	Hmong
ho	Hiri Motu
hr	Croatian
hrx	Hunsrik
ht	Haitian, Haitian Creole
hu	Hungarian
hy	Armenian
hz	Herero
iba	Iban
id	Indonesian
ig	Igbo
ilo	Iloko
is	Icelandic
it	Italian
iu	Inuktitut
ja	Japanese
jam	Jamaican Creole English
jv	Javanese
ka	Georgian
kac	Kachin
kek	Kekchi
kg	Kongo
kha	Khasi
ki	Kikuyu
kj	Kuanyama
kk	Kazakh
kl	Greenlandic
km	Central Khmer
kn	Kannada
ko	Korean
kok	Konkani
kr	Kanuri
kri	Krio
ks	Kashmiri
ktu	Kituba
ku	Kurdish
kv	Komi
kw	Cornish
ky	Kyrgyz
la	Latin
lb	Luxembourgish
lg	Ganda
li	Limburgan
lij	Ligurian
lmo	Lombard
ln	Lingala
lo	Lao
lt	Lithuanian
lu	Luba-Katanga
lua	Luba-Lulua
luo	Dholuo
lus	Mizo
lv	Latvian
mad	Madurese
mai	Maithili
mak	Makasar
mam	Mam
mfe	Morisyen
mg	Malagasy
mh	Marshallese
min	Minangkabau
mk	Macedonian
ml	Malayalam
mn	Mongolian
mr	Marathi
ms	Malay
mt	Maltese
mwr	Marwari
my	Burmese
na	Nauru
nb	Norwegian Bokmål
nd	North Ndebele
ndc	Ndau
ne	Nepali
new	Newari
ng	Ndonga
nhe	Eastern Huasteca Nahuatl
nl	Dutch
nn	Norwegian Nynorsk
nr	South Ndebele
nso	Pedi
nus	Nuer
nv	Navajo
ny	Chichewa
oc	Occitan
oj	Ojibwa
om	Oromo
or	Oriya
os	Ossetian
pa	Punjabi
pag	Pangasinan
pam	Pampanga
pap	Papiamento
pl	Polish
ps	Pashto
pt	Portuguese
qu	Quechua
rm	Romansh
rn	Rundi
ro	Romanian
ru	Russian
rw	Kinyarwanda
sa	Sanskrit
sah	Yakut
sat	Santali
sc	Sardinian
scn	Sicilian
sd	Sindhi
se	Northern Sami
sg	Sango
shn	Shan
si	Sinhala
sk	Slovak
sl	Slovenian
sm	Samoan
sn	Shona
so	Somali
sq	Albanian
sr	Serbian
ss	Swati
st	Southern Sotho
su	Sundanese
sv	Swedish
sw	Swahili
szl	Silesian
ta	Tamil
tcy	Tulu
te	Telugu
tet	Tetum
tg	Tajik
th	Thai
ti	Tigrinya
tiv	Tiv
tk	Turkmen
tl	Tagalog
tn	Tswana
to	Tonga
tpi	Tok Pisin
tr	Turkish
trp	Kok Borok
ts	Tsonga
tt	Tatar
tum	Tumbuka
tw	Twi
ty	Tahitian
tyv	Tuvinian
udm	Udmurt
ug	Uighur
uk	Ukrainian
ur	Urdu
uz	Uzbek
ve	Venda
vec	Venetian
vi	Vietnamese
wa	Walloon
war	Waray
wo	Wolof
xh	Xhosa
yi	Yiddish
yo	Yoruba
yua	Yucatec Maya
yue	Cantonese
za	Zhuang
zh	Chinese
zu	Zulu

Billing

As an experimental feature, you won't be charged to use speech-to-speech translation.

For more information on pricing and billing, see Vertex AI pricing .

Using speech-to-speech translation Stay organized with collections Save and categorize content based on your preferences.

Supported models

Input audio requirements

Use speech-to-speech translation

Python

Python

Supported languages

Billing

Using speech-to-speech translation