Using speech-to-speech translation

The speech-to-speech translation feature uses AI to interpret language, enabling conversations between individuals and systems who speak different languages. Your application can use this feature to process an audio stream containing speech in one language and translate it into another language in real time.

Unlike other Live API features that support turn-based conversations, speech-to-speech translation continuously processes audio input and streams the following outputs as they become available:

  • Transcription:The recognized text from the input audio stream in the original language.
  • Translation:The translated text in the target language.
  • Synthesized audio:An audio stream of the translated text spoken in the target language that matches the original speaker's voice.

Supported models

You can use speech-to-speech translation with the following model:

Model version Availability level
gemini-2.5-flash-s2st-11-2025-exp Private experimental

Input audio requirements

speech-to-speech translation only supports audio input. For information on supported audio formats, codecs, and specifications like sample rate, see Supported audio formats .

Use speech-to-speech translation

To use speech-to-speech translation, see the following code examples:

Python

 # Set language_code to your desired language, in this case, Mandarin Chinese. 
 speech_config 
 = 
 SpeechConfig 
 ( 
 language_code 
 = 
 "cmn" 
 ) 
 config 
 = 
 LiveConnectConfig 
 ( 
 response_modalities 
 = 
 [ 
 "AUDIO" 
 ], 
 speech_config 
 = 
 speech_config 
 , 
 input_audio_transcription 
 = 
 input_transcription 
 , 
 output_audio_transcription 
 = 
 output_transcription 
 , 
 ) 
 audio_file 
 = 
 Part 
 . 
 from_uri 
 ( 
 file_uri 
 = 
 audio_url 
 , 
 mime_type 
 = 
 "audio/mpeg" 
 ) 
 contents 
 = 
 [ 
 audio_file 
 ] 
 response 
 = 
 client 
 . 
 models 
 . 
 generate_content 
 ( 
 model 
 = 
 MODEL_ID 
 , 
 contents 
 = 
 contents 
 ) 
 display 
 ( 
 Markdown 
 ( 
 response 
 . 
 text 
 )) 

Python

 import 
  
 asyncio 
 # Set model generation_config 
 CONFIG 
 = 
 { 
 "response_modalities" 
 : 
 [ 
 "AUDIO" 
 ], 
 "speech_config" 
 : 
 { 
 "language_code" 
 : 
 "cmn" 
 , 
 }, 
 } 
 headers 
 = 
 { 
 "Content-Type" 
 : 
 "application/json" 
 , 
 "Authorization" 
 : 
 f 
 "Bearer 
 { 
 bearer_token 
 [ 
 0 
 ] 
 } 
 " 
 , 
 } 
 # Connect to the server 
 async 
 with 
 connect 
 ( 
 SERVICE_URL 
 , 
 additional_headers 
 = 
 headers 
 ) 
 as 
 ws 
 : 
 # Setup the session 
 await 
 ws 
 . 
 send 
 ( 
 json 
 . 
 dumps 
 ( 
 { 
 "setup" 
 : 
 { 
 "model" 
 : 
 MODEL 
 , 
 "generation_config" 
 : 
 CONFIG 
 , 
 "input_audio_transcription" 
 : 
 {}, 
 "output_audio_transcription" 
 : 
 {}, 
 "enable_speech_to_speech_translation" 
 : 
 True 
 , 
 } 
 } 
 ) 
 ) 
 # Receive setup response 
 raw_response 
 = 
 await 
 ws 
 . 
 recv 
 ( 
 decode 
 = 
 False 
 ) 
 setup_response 
 = 
 json 
 . 
 loads 
 ( 
 raw_response 
 . 
 decode 
 ( 
 "ascii" 
 )) 
 print 
 ( 
 setup_response 
 ) 
 msg 
 = 
 { 
 "realtime_input" 
 : 
 { 
 "audio" 
 : 
 { 
 "mime_type" 
 : 
 "audio/pcm" 
 , 
 "data" 
 : 
 base64 
 . 
 b64encode 
 ( 
 wav_data 
 ) 
 . 
 decode 
 ( 
 'utf-8' 
 ), 
 } 
 } 
 } 
 await 
 ws 
 . 
 send 
 ( 
 json 
 . 
 dumps 
 ( 
 msg 
 )) 
 overall_responses 
 = 
 [] 
 timeout_seconds 
 = 
 10 
 # Set timeout to 3 seconds 
 # Receive chucks of server response with a timeout 
 try 
 : 
 while 
 True 
 : 
 try 
 : 
 raw_response 
 = 
 await 
 asyncio 
 . 
 wait_for 
 ( 
 ws 
 . 
 recv 
 ( 
 decode 
 = 
 False 
 ), 
 timeout_seconds 
 ) 
 response 
 = 
 json 
 . 
 loads 
 ( 
 raw_response 
 . 
 decode 
 ()) 
 server_content 
 = 
 response 
 . 
 pop 
 ( 
 "serverContent" 
 , 
 None 
 ) 
 if 
 server_content 
 is 
 None 
 : 
 break 
 # Input Transcription. 
 input_transcription 
 = 
 server_content 
 . 
 pop 
 ( 
 "inputTranscription" 
 , 
 None 
 ) 
 if 
 input_transcription 
 is 
 not 
 None 
 : 
 raw_text 
 = 
 input_transcription 
 . 
 pop 
 ( 
 "text" 
 , 
 None 
 ) 
 if 
 raw_text 
 is 
 not 
 None 
 : 
 display 
 ( 
 Markdown 
 ( 
 f 
 "**Input >** 
 { 
 raw_text 
 } 
 " 
 )) 
 # Output Transcription. 
 output_transcription 
 = 
 server_content 
 . 
 pop 
 ( 
 "outputTranscription" 
 , 
 None 
 ) 
 if 
 output_transcription 
 is 
 not 
 None 
 : 
 raw_text 
 = 
 output_transcription 
 . 
 pop 
 ( 
 "text" 
 , 
 None 
 ) 
 if 
 raw_text 
 is 
 not 
 None 
 : 
 display 
 ( 
 Markdown 
 ( 
 f 
 "**Response >** 
 { 
 raw_text 
 } 
 " 
 )) 
 model_turn 
 = 
 server_content 
 . 
 pop 
 ( 
 "modelTurn" 
 , 
 None 
 ) 
 if 
 model_turn 
 is 
 not 
 None 
 : 
 parts 
 = 
 model_turn 
 . 
 pop 
 ( 
 "parts" 
 , 
 None 
 ) 
 if 
 parts 
 is 
 not 
 None 
 : 
 for 
 part 
 in 
 parts 
 : 
 pcm_data 
 = 
 base64 
 . 
 b64decode 
 ( 
 part 
 [ 
 "inlineData" 
 ][ 
 "data" 
 ]) 
 overall_responses 
 . 
 append 
 ( 
 np 
 . 
 frombuffer 
 ( 
 pcm_data 
 , 
 dtype 
 = 
 np 
 . 
 int16 
 )) 
 # End of turn 
 # turn_complete = server_content.pop("turnComplete", None) 
 # if turn_complete: 
 #     break 
 except 
 asyncio 
 . 
 TimeoutError 
 : 
 print 
 ( 
 f 
 "Timeout: No response received from the websocket within 
 { 
 timeout_seconds 
 } 
 seconds." 
 ) 
 if 
 overall_responses 
 : 
 display 
 ( 
 Audio 
 ( 
 np 
 . 
 concatenate 
 ( 
 overall_responses 
 ), 
 rate 
 = 
 24000 
 , 
 autoplay 
 = 
 True 
 )) 
 break 
 # Exit the loop on timeout 
 except 
 websockets 
 . 
 exceptions 
 . 
 ConnectionClosed 
 as 
 e 
 : 
 print 
 ( 
 f 
 "Connection closed by exception, code: 
 { 
 e 
 . 
 code 
 } 
 , reason: 
 { 
 e 
 . 
 reason 
 } 
 " 
 ) 
 if 
 overall_responses 
 : 
 display 
 ( 
 Audio 
 ( 
 np 
 . 
 concatenate 
 ( 
 overall_responses 
 ), 
 rate 
 = 
 24000 
 , 
 autoplay 
 = 
 True 
 )) 
 break 
 # Exit the loop on connection closed 
 except 
 Exception 
 as 
 e 
 : 
 print 
 ( 
 f 
 "An unexpected error occurred: 
 { 
 e 
 } 
 " 
 ) 
 if 
 overall_responses 
 : 
 display 
 ( 
 Audio 
 ( 
 np 
 . 
 concatenate 
 ( 
 overall_responses 
 ), 
 rate 
 = 
 24000 
 , 
 autoplay 
 = 
 True 
 )) 
 break 
 # Exit the loop on other exceptions 
 finally 
 : 
 try 
 : 
 await 
 ws 
 . 
 close 
 ( 
 code 
 = 
 1000 
 , 
 reason 
 = 
 "Normal closure" 
 ) 
 #example close 
 except 
 websockets 
 . 
 exceptions 
 . 
 ConnectionClosed 
 as 
 e 
 : 
 print 
 ( 
 f 
 "Connection closed by exception, code: 
 { 
 e 
 . 
 code 
 } 
 , reason: 
 { 
 e 
 . 
 reason 
 } 
 " 
 ) 
 except 
 Exception 
 as 
 e 
 : 
 print 
 ( 
 f 
 "An unexpected error occurred: 
 { 
 e 
 } 
 " 
 ) 

Supported languages

Language Code Language
aa Afar
ab Abkhazian
ace Achinese
ach Acoli
af Afrikaans
ak Akan
alz Alur
am Amharic
an Aragonese
ar Arabic
as Assamese
av Avaric
awa Awadhi
ay Aymara
az Azerbaijani
ba Bashkir
bal Baluchi
ban Balinese
bbc Batak Toba
bci Baoulé
be Belarusian
bem Bemba
ber Berber
bew Betawi
bg Bulgarian
bgc Haryanvi
bho Bhojpuri
bi Bislama
bm Bambara
bn Bengali
bo Tibetan
br Breton
bs Bosnian
bts Batak Simalungun
btx Batak Karo
ca Catalan
ce Chechen
ceb Cebuano
cgg Chiga
ch Chamorro
chk Chuukese
cmn Mandarin Chinese
cnh Hakha Chin
co Corsican
cr Cree
crh Crimean Tatar
crs Seselwa Creole French
cs Czech
cv Chuvash
cy Welsh
da Danish
de German
din Dinka
doi Dogri
dov Dombe
dv Divehi
dyu Dyula
dz Dzongkha
ee Ewe
el Greek
en English
eo Esperanto
es Spanish
et Estonian
eu Basque
fa Farsi
ff Fulah
fi Finnish
fil Filipino
fj Fijian
fo Faroese
fon Fon
fr French
fur Friulian
fy Western Frisian
ga Irish
gaa Ga
gd Gaelic
gl Galician
gn Guarani
gu Gujarati
gv Manx
ha Hausa
haw Hawaiian
he Hebrew
hi Hindi
hil Hiligaynon
hmn Hmong
ho Hiri Motu
hr Croatian
hrx Hunsrik
ht Haitian, Haitian Creole
hu Hungarian
hy Armenian
hz Herero
iba Iban
id Indonesian
ig Igbo
ilo Iloko
is Icelandic
it Italian
iu Inuktitut
ja Japanese
jam Jamaican Creole English
jv Javanese
ka Georgian
kac Kachin
kek Kekchi
kg Kongo
kha Khasi
ki Kikuyu
kj Kuanyama
kk Kazakh
kl Greenlandic
km Central Khmer
kn Kannada
ko Korean
kok Konkani
kr Kanuri
kri Krio
ks Kashmiri
ktu Kituba
ku Kurdish
kv Komi
kw Cornish
ky Kyrgyz
la Latin
lb Luxembourgish
lg Ganda
li Limburgan
lij Ligurian
lmo Lombard
ln Lingala
lo Lao
lt Lithuanian
lu Luba-Katanga
lua Luba-Lulua
luo Dholuo
lus Mizo
lv Latvian
mad Madurese
mai Maithili
mak Makasar
mam Mam
mfe Morisyen
mg Malagasy
mh Marshallese
min Minangkabau
mk Macedonian
ml Malayalam
mn Mongolian
mr Marathi
ms Malay
mt Maltese
mwr Marwari
my Burmese
na Nauru
nb Norwegian Bokmål
nd North Ndebele
ndc Ndau
ne Nepali
new Newari
ng Ndonga
nhe Eastern Huasteca Nahuatl
nl Dutch
nn Norwegian Nynorsk
nr South Ndebele
nso Pedi
nus Nuer
nv Navajo
ny Chichewa
oc Occitan
oj Ojibwa
om Oromo
or Oriya
os Ossetian
pa Punjabi
pag Pangasinan
pam Pampanga
pap Papiamento
pl Polish
ps Pashto
pt Portuguese
qu Quechua
rm Romansh
rn Rundi
ro Romanian
ru Russian
rw Kinyarwanda
sa Sanskrit
sah Yakut
sat Santali
sc Sardinian
scn Sicilian
sd Sindhi
se Northern Sami
sg Sango
shn Shan
si Sinhala
sk Slovak
sl Slovenian
sm Samoan
sn Shona
so Somali
sq Albanian
sr Serbian
ss Swati
st Southern Sotho
su Sundanese
sv Swedish
sw Swahili
szl Silesian
ta Tamil
tcy Tulu
te Telugu
tet Tetum
tg Tajik
th Thai
ti Tigrinya
tiv Tiv
tk Turkmen
tl Tagalog
tn Tswana
to Tonga
tpi Tok Pisin
tr Turkish
trp Kok Borok
ts Tsonga
tt Tatar
tum Tumbuka
tw Twi
ty Tahitian
tyv Tuvinian
udm Udmurt
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
ve Venda
vec Venetian
vi Vietnamese
wa Walloon
war Waray
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
yua Yucatec Maya
yue Cantonese
za Zhuang
zh Chinese
zu Zulu

Billing

As an experimental feature, you won't be charged to use speech-to-speech translation.

For more information on pricing and billing, see Vertex AI pricing .

Create a Mobile Website
View Site in Mobile | Classic
Share by: