Chirp 3 Transcription: Enhanced multilingual accuracy

Chirp 3 is the latest generation of Google's multilingual Automatic Speech Recognition (ASR)-specific generative models, designed to meet user needs based on feedback and experience. Chirp 3 provides enhanced accuracy and speed beyond previous Chirp models and provides diarization and automatic language detection.

Model details

Chirp 3: Transcription, is exclusively available within the Speech-to-Text API V2.

Model identifiers

You can use Chirp 3: Transcription just like any other model by specifying the appropriate model identifier in your recognition request when using the API or the model name while in the Google Cloud console. Specify the appropriate identifier in your recognition.

Model Model identifier
Chirp 3 chirp_3

API methods

Not all recognition methods support the same language availability sets, because Chirp 3 is available in the Speech-to-Text API V2, it supports the following recognition methods: Not all recognition methods support the same language availability sets, because Chirp 3 is available in the Speech-to-Text API V2, it supports the following recognition methods:

API API method support Support
v2
Speech.StreamingRecognize (good for streaming and real-time audio) Supported
v2
Speech.Recognize (good for audio shorter than one minute) Supported
v2
Speech. BatchRecognize (good for long audio 1 minute to 1 hour) Supported

Regional availability

Chirp 3 is available in the following Google Cloud regions, with more planned:

Google Cloud Zone Launch Readiness
us Public Preview

Using the locations API as explained here, you can find the latest list of supported Google Cloud regions, languages and locales, and features for each transcription model.

Language availability for transcription

Chirp 3 supports transcription in StreamingRecognize , Recognize , and BatchRecognize in the following languages:

Language BCP-47 Code
Arabic (United Arab Emirates) ar-AE
Arabic (Bahrain) ar-BH
Arabic (Algeria) ar-DZ
Arabic (Egypt) ar-EG
Arabic (Israel) ar-IL
Central Kurdish (Iraq) ar-IQ
Arabic (Jordan) ar-JO
Arabic (Kuwait) ar-KW
Arabic (Lebanon) ar-LB
Arabic (Morocco) ar-MA
Arabic (Mauritania) ar-MR
Arabic (Oman) ar-OM
Arabic (State of Palestine) ar-PS
Arabic (Qatar) ar-QA
Arabic (Saudi Arabia) ar-SA
Arabic (Syria) ar-SY
Arabic (Tunisia) ar-TN
Arabic ar-XA
Arabic (Yemen) ar-YE
Bulgarian (Bulgaria) bg-BG
Bengali (Bangladesh) bn-BD
Bengali (India) bn-IN
Catalan (Spain) ca-ES
Chinese (Simplified, China) cmn-Hans-CN
Chinese, Cantonese (Traditional Hong Kong) yue-Hant-HK
Chinese, Mandarin (Traditional, Taiwan) cmn-Hant-TW
Czech (Czech Republic) cs-CZ
Danish (Denmark) da-DK
German (Germany) de-DE
Greek (Greece) el-GR
English (Australia) en-AU
English (United Kingdom) en-GB
English (India) en-IN
English (Philippines) en-PH
English (United States) en-US
Spanish (Mexico) es-MX
Spanish (Spain) es-ES
Spanish (United States) es-US
Estonian (Estonia) et-EE
Persian (Iran) fa-IR
French (France) fr-FR
Finnish (Finland) fi-FI
Filipino (Philippines) fil-PH
French (Canada) fr-CA
Gujarati (India) gu-IN
Hindi (India) hi-IN
Croatian (Croatia) hr-HR
Hungarian (Hungary) hu-HU
Armenian (Armenia) hy-AM
Indonesian (Indonesia) id-ID
Italian (Italy) it-IT
Hebrew (Israel) iw-IL
Japanese (Japan) ja-JP
Khmer (Cambodia) km-KH
Kannada (India) kn-IN
Korean (Korea) ko-KR
Lao (Laos) lo-LA
Lithuanian (Lithuania) lt-LT
Latvian (Latvia) lv-LV
Malayalam (India) ml-IN
Marathi (India) mr-IN
Malay (Malaysia) ms-MY
Burmese (Myanmar) my-MM
Nepali (Nepal) ne-NP
Dutch (Netherlands) nl-NL
Norwegian (Norway) no-NO
Polish (Poland) pl-PL
Portuguese (Brazil) pt-BR
Portuguese (Portugal) pt-PT
Romanian (Romania) ro-RO
Russian (Russia) ru-RU
Slovak (Slovakia) sk-SK
Slovenian (Slovenia) sl-SI
Serbian (Serbia) sr-RS
Swedish (Sweden) sv-SE
Swahili sw
Tamil (India) ta-IN
Telugu (India) te-IN
Thai (Thailand) th-TH
Turkish (Turkey) tr-TR
Ukrainian (Ukraine) uk-UA
Uzbek (Uzbekistan) uz-UZ
Vietnamese (Vietnam) vi-VN

Language availability for diarization

Chirp 3 supports transcription and diarization only in BatchRecognize and Recognize in the following languages:

Language BCP-47 Code
Chinese (Simplified, China) cmn-Hans-CN
German (Germany) de-DE
English (United Kingdom) en-GB
English (India) en-IN
English (United States) en-US
Spanish (Spain) es-ES
Spanish (United States) es-US
French (Canada) fr-CA
French (France) fr-FR
Hindi (India) hi-IN
Italian (Italy) it-IT
Japanese (Japan) ja-JP
Korean (Korea) ko-KR
Portuguese (Brazil) pt-BR

Feature support and limitations

Chirp 3 supports the following features:

Feature Description Launch Stage
Automatic punctuation
Automatically generated by the model and can be optionally disabled. Preview
Automatic capitalization
Automatically generated by the model and can be optionally disabled. Preview
Speaker Diarization
Automatically identify the different speakers in a single-channel audio sample. Preview
Language-agnostic audio transcription.
The model automatically infers the spoken language in your audio file and transcribes in the most prevalent language. Preview

Chirp 3 doesn't support the following features:

Feature Description
Word-timings (Timestamps) Automatically generated by the model and can be optionally disabled.
Word-level confidence scores The API returns a value, but it isn't truly a confidence score.
Speech adaptation (Biasing) Provide hints to the model in the form of phrases or words to improve recognition accuracy for specific terms or proper nouns.

Using Chirp 3

Using Chirp 3 for transcription and diarization tasks.

Transcribe using Chirp 3 batch request with diarization

Discover how to use Chirp 3 for your transcription needs

Perform batch speech recognition

  import 
  
 os 
 from 
  
 google.cloud.speech_v2 
  
 import 
 SpeechClient 
 from 
  
 google.cloud.speech_v2.types 
  
 import 
 cloud_speech 
 from 
  
 google.api_core.client_options 
  
 import 
 ClientOptions 
 PROJECT_ID 
 = 
 os 
 . 
 getenv 
 ( 
 "GOOGLE_CLOUD_PROJECT" 
 ) 
 def 
  
 transcribe_batch_chirp3 
 ( 
 audio_uri 
 : 
 str 
 , 
 ) 
 - 
> cloud_speech 
 . 
 BatchRecognizeResults 
 : 
  
 """Transcribes an audio file from a Google Cloud Storage URI using the Chirp 3 model of Google Cloud Speech-to-Text V2 API. 
 Args: 
 audio_uri (str): The Google Cloud Storage URI of the input 
 audio file. E.g., gs://[BUCKET]/[FILE] 
 Returns: 
 cloud_speech.RecognizeResponse: The response from the 
 Speech-to-Text API containing the transcription results. 
 """ 
 # Instantiates a client 
 client 
 = 
 SpeechClient 
 ( 
 client_options 
 = 
 ClientOptions 
 ( 
 api_endpoint 
 = 
 "us-west1-speech.googleapis.com" 
 , 
 ) 
 ) 
 speaker_diarization_config 
 = 
 cloud_speech 
 . 
 SpeakerDiarizationConfig 
 ( 
 min_speaker_count 
 = 
 1 
 , 
 # minimum number of speakers 
 max_speaker_count 
 = 
 6 
 , 
 # maximum expected number of speakers 
 ) 
 config 
 = 
 cloud_speech 
 . 
 RecognitionConfig 
 ( 
 auto_decoding_config 
 = 
 cloud_speech 
 . 
 AutoDetectDecodingConfig 
 (), 
 language_codes 
 = 
 [ 
 "en-US" 
 ], 
 # Use "auto" to detect language 
 model 
 = 
 "chirp_3" 
 , 
 features 
 = 
 cloud_speech 
 . 
 RecognitionFeatures 
 ( 
 diarization_config 
 = 
 speaker_diarization_config 
 , 
 ), 
 ) 
 file_metadata 
 = 
 cloud_speech 
 . 
 BatchRecognizeFileMetadata 
 ( 
 uri 
 = 
 audio_uri 
 ) 
 request 
 = 
 cloud_speech 
 . 
 BatchRecognizeRequest 
 ( 
 recognizer 
 = 
 f 
 "projects/ 
 { 
 PROJECT_ID 
 } 
 /locations/us-west1/recognizers/_" 
 , 
 config 
 = 
 config 
 , 
 files 
 = 
 [ 
 file_metadata 
 ], 
 recognition_output_config 
 = 
 cloud_speech 
 . 
 RecognitionOutputConfig 
 ( 
 inline_response_config 
 = 
 cloud_speech 
 . 
 InlineOutputConfig 
 (), 
 ), 
 ) 
 # Transcribes the audio into text 
 operation 
 = 
 client 
 . 
 batch_recognize 
 ( 
 request 
 = 
 request 
 ) 
 print 
 ( 
 "Waiting for operation to complete..." 
 ) 
 response 
 = 
 operation 
 . 
 result 
 ( 
 timeout 
 = 
 120 
 ) 
 for 
 result 
 in 
 response 
 . 
 results 
 [ 
 audio_uri 
 ] 
 . 
 transcript 
 . 
 results 
 : 
 print 
 ( 
 f 
 "Transcript: 
 { 
 result 
 . 
 alternatives 
 [ 
 0 
 ] 
 . 
 transcript 
 } 
 " 
 ) 
 print 
 ( 
 f 
 "Detected Language: 
 { 
 result 
 . 
 language_code 
 } 
 " 
 ) 
 print 
 ( 
 f 
 "Speakers per word: 
 { 
 result 
 . 
 alternatives 
 [ 
 0 
 ] 
 . 
 words 
 } 
 " 
 ) 
 return 
 response 
 . 
 results 
 [ 
 audio_uri 
 ] 
 . 
 transcript 
 

Use Chirp 3 in the Google Cloud console

Design a Mobile Site
View Site in Mobile | Classic
Share by: