The speech-to-speech translation feature uses AI to interpret language, enabling conversations between individuals and systems who speak different languages. Your application can use this feature to process an audio stream containing speech in one language and translate it into another language in real time.
Unlike other Live API features that support turn-based conversations, speech-to-speech translation continuously processes audio input and streams the following outputs as they become available:
- Transcription:The recognized text from the input audio stream in the original language.
- Translation:The translated text in the target language.
- Synthesized audio:An audio stream of the translated text spoken in the target language that matches the original speaker's voice.
Supported models
You can use speech-to-speech translation with the following model:
| Model version | Availability level |
|---|---|
gemini-2.5-flash-s2st-11-2025-exp
|
Private experimental |
Input audio requirements
speech-to-speech translation only supports audio input. For information on supported audio formats, codecs, and specifications like sample rate, see Supported audio formats .
Use speech-to-speech translation
To use speech-to-speech translation, see the following code examples:
Python
# Set language_code to your desired language, in this case, Mandarin Chinese. speech_config = SpeechConfig ( language_code = "cmn" ) config = LiveConnectConfig ( response_modalities = [ "AUDIO" ], speech_config = speech_config , input_audio_transcription = input_transcription , output_audio_transcription = output_transcription , ) audio_file = Part . from_uri ( file_uri = audio_url , mime_type = "audio/mpeg" ) contents = [ audio_file ] response = client . models . generate_content ( model = MODEL_ID , contents = contents ) display ( Markdown ( response . text ))
Python
import asyncio # Set model generation_config CONFIG = { "response_modalities" : [ "AUDIO" ], "speech_config" : { "language_code" : "cmn" , }, } headers = { "Content-Type" : "application/json" , "Authorization" : f "Bearer { bearer_token [ 0 ] } " , } # Connect to the server async with connect ( SERVICE_URL , additional_headers = headers ) as ws : # Setup the session await ws . send ( json . dumps ( { "setup" : { "model" : MODEL , "generation_config" : CONFIG , "input_audio_transcription" : {}, "output_audio_transcription" : {}, "enable_speech_to_speech_translation" : True , } } ) ) # Receive setup response raw_response = await ws . recv ( decode = False ) setup_response = json . loads ( raw_response . decode ( "ascii" )) print ( setup_response ) msg = { "realtime_input" : { "audio" : { "mime_type" : "audio/pcm" , "data" : base64 . b64encode ( wav_data ) . decode ( 'utf-8' ), } } } await ws . send ( json . dumps ( msg )) overall_responses = [] timeout_seconds = 10 # Set timeout to 3 seconds # Receive chucks of server response with a timeout try : while True : try : raw_response = await asyncio . wait_for ( ws . recv ( decode = False ), timeout_seconds ) response = json . loads ( raw_response . decode ()) server_content = response . pop ( "serverContent" , None ) if server_content is None : break # Input Transcription. input_transcription = server_content . pop ( "inputTranscription" , None ) if input_transcription is not None : raw_text = input_transcription . pop ( "text" , None ) if raw_text is not None : display ( Markdown ( f "**Input >** { raw_text } " )) # Output Transcription. output_transcription = server_content . pop ( "outputTranscription" , None ) if output_transcription is not None : raw_text = output_transcription . pop ( "text" , None ) if raw_text is not None : display ( Markdown ( f "**Response >** { raw_text } " )) model_turn = server_content . pop ( "modelTurn" , None ) if model_turn is not None : parts = model_turn . pop ( "parts" , None ) if parts is not None : for part in parts : pcm_data = base64 . b64decode ( part [ "inlineData" ][ "data" ]) overall_responses . append ( np . frombuffer ( pcm_data , dtype = np . int16 )) # End of turn # turn_complete = server_content.pop("turnComplete", None) # if turn_complete: # break except asyncio . TimeoutError : print ( f "Timeout: No response received from the websocket within { timeout_seconds } seconds." ) if overall_responses : display ( Audio ( np . concatenate ( overall_responses ), rate = 24000 , autoplay = True )) break # Exit the loop on timeout except websockets . exceptions . ConnectionClosed as e : print ( f "Connection closed by exception, code: { e . code } , reason: { e . reason } " ) if overall_responses : display ( Audio ( np . concatenate ( overall_responses ), rate = 24000 , autoplay = True )) break # Exit the loop on connection closed except Exception as e : print ( f "An unexpected error occurred: { e } " ) if overall_responses : display ( Audio ( np . concatenate ( overall_responses ), rate = 24000 , autoplay = True )) break # Exit the loop on other exceptions finally : try : await ws . close ( code = 1000 , reason = "Normal closure" ) #example close except websockets . exceptions . ConnectionClosed as e : print ( f "Connection closed by exception, code: { e . code } , reason: { e . reason } " ) except Exception as e : print ( f "An unexpected error occurred: { e } " )
Supported languages
| Language Code | Language |
|---|---|
| aa | Afar |
| ab | Abkhazian |
| ace | Achinese |
| ach | Acoli |
| af | Afrikaans |
| ak | Akan |
| alz | Alur |
| am | Amharic |
| an | Aragonese |
| ar | Arabic |
| as | Assamese |
| av | Avaric |
| awa | Awadhi |
| ay | Aymara |
| az | Azerbaijani |
| ba | Bashkir |
| bal | Baluchi |
| ban | Balinese |
| bbc | Batak Toba |
| bci | Baoulé |
| be | Belarusian |
| bem | Bemba |
| ber | Berber |
| bew | Betawi |
| bg | Bulgarian |
| bgc | Haryanvi |
| bho | Bhojpuri |
| bi | Bislama |
| bm | Bambara |
| bn | Bengali |
| bo | Tibetan |
| br | Breton |
| bs | Bosnian |
| bts | Batak Simalungun |
| btx | Batak Karo |
| ca | Catalan |
| ce | Chechen |
| ceb | Cebuano |
| cgg | Chiga |
| ch | Chamorro |
| chk | Chuukese |
| cmn | Mandarin Chinese |
| cnh | Hakha Chin |
| co | Corsican |
| cr | Cree |
| crh | Crimean Tatar |
| crs | Seselwa Creole French |
| cs | Czech |
| cv | Chuvash |
| cy | Welsh |
| da | Danish |
| de | German |
| din | Dinka |
| doi | Dogri |
| dov | Dombe |
| dv | Divehi |
| dyu | Dyula |
| dz | Dzongkha |
| ee | Ewe |
| el | Greek |
| en | English |
| eo | Esperanto |
| es | Spanish |
| et | Estonian |
| eu | Basque |
| fa | Farsi |
| ff | Fulah |
| fi | Finnish |
| fil | Filipino |
| fj | Fijian |
| fo | Faroese |
| fon | Fon |
| fr | French |
| fur | Friulian |
| fy | Western Frisian |
| ga | Irish |
| gaa | Ga |
| gd | Gaelic |
| gl | Galician |
| gn | Guarani |
| gu | Gujarati |
| gv | Manx |
| ha | Hausa |
| haw | Hawaiian |
| he | Hebrew |
| hi | Hindi |
| hil | Hiligaynon |
| hmn | Hmong |
| ho | Hiri Motu |
| hr | Croatian |
| hrx | Hunsrik |
| ht | Haitian, Haitian Creole |
| hu | Hungarian |
| hy | Armenian |
| hz | Herero |
| iba | Iban |
| id | Indonesian |
| ig | Igbo |
| ilo | Iloko |
| is | Icelandic |
| it | Italian |
| iu | Inuktitut |
| ja | Japanese |
| jam | Jamaican Creole English |
| jv | Javanese |
| ka | Georgian |
| kac | Kachin |
| kek | Kekchi |
| kg | Kongo |
| kha | Khasi |
| ki | Kikuyu |
| kj | Kuanyama |
| kk | Kazakh |
| kl | Greenlandic |
| km | Central Khmer |
| kn | Kannada |
| ko | Korean |
| kok | Konkani |
| kr | Kanuri |
| kri | Krio |
| ks | Kashmiri |
| ktu | Kituba |
| ku | Kurdish |
| kv | Komi |
| kw | Cornish |
| ky | Kyrgyz |
| la | Latin |
| lb | Luxembourgish |
| lg | Ganda |
| li | Limburgan |
| lij | Ligurian |
| lmo | Lombard |
| ln | Lingala |
| lo | Lao |
| lt | Lithuanian |
| lu | Luba-Katanga |
| lua | Luba-Lulua |
| luo | Dholuo |
| lus | Mizo |
| lv | Latvian |
| mad | Madurese |
| mai | Maithili |
| mak | Makasar |
| mam | Mam |
| mfe | Morisyen |
| mg | Malagasy |
| mh | Marshallese |
| min | Minangkabau |
| mk | Macedonian |
| ml | Malayalam |
| mn | Mongolian |
| mr | Marathi |
| ms | Malay |
| mt | Maltese |
| mwr | Marwari |
| my | Burmese |
| na | Nauru |
| nb | Norwegian Bokmål |
| nd | North Ndebele |
| ndc | Ndau |
| ne | Nepali |
| new | Newari |
| ng | Ndonga |
| nhe | Eastern Huasteca Nahuatl |
| nl | Dutch |
| nn | Norwegian Nynorsk |
| nr | South Ndebele |
| nso | Pedi |
| nus | Nuer |
| nv | Navajo |
| ny | Chichewa |
| oc | Occitan |
| oj | Ojibwa |
| om | Oromo |
| or | Oriya |
| os | Ossetian |
| pa | Punjabi |
| pag | Pangasinan |
| pam | Pampanga |
| pap | Papiamento |
| pl | Polish |
| ps | Pashto |
| pt | Portuguese |
| qu | Quechua |
| rm | Romansh |
| rn | Rundi |
| ro | Romanian |
| ru | Russian |
| rw | Kinyarwanda |
| sa | Sanskrit |
| sah | Yakut |
| sat | Santali |
| sc | Sardinian |
| scn | Sicilian |
| sd | Sindhi |
| se | Northern Sami |
| sg | Sango |
| shn | Shan |
| si | Sinhala |
| sk | Slovak |
| sl | Slovenian |
| sm | Samoan |
| sn | Shona |
| so | Somali |
| sq | Albanian |
| sr | Serbian |
| ss | Swati |
| st | Southern Sotho |
| su | Sundanese |
| sv | Swedish |
| sw | Swahili |
| szl | Silesian |
| ta | Tamil |
| tcy | Tulu |
| te | Telugu |
| tet | Tetum |
| tg | Tajik |
| th | Thai |
| ti | Tigrinya |
| tiv | Tiv |
| tk | Turkmen |
| tl | Tagalog |
| tn | Tswana |
| to | Tonga |
| tpi | Tok Pisin |
| tr | Turkish |
| trp | Kok Borok |
| ts | Tsonga |
| tt | Tatar |
| tum | Tumbuka |
| tw | Twi |
| ty | Tahitian |
| tyv | Tuvinian |
| udm | Udmurt |
| ug | Uighur |
| uk | Ukrainian |
| ur | Urdu |
| uz | Uzbek |
| ve | Venda |
| vec | Venetian |
| vi | Vietnamese |
| wa | Walloon |
| war | Waray |
| wo | Wolof |
| xh | Xhosa |
| yi | Yiddish |
| yo | Yoruba |
| yua | Yucatec Maya |
| yue | Cantonese |
| za | Zhuang |
| zh | Chinese |
| zu | Zulu |
Billing
As an experimental feature, you won't be charged to use speech-to-speech translation.
For more information on pricing and billing, see Vertex AI pricing .

