Chirp 3 is the latest generation of Google's multilingual Automatic Speech Recognition (ASR)-specific generative models, designed to meet user needs based on feedback and experience. Chirp 3 provides enhanced accuracy and speed beyond previous Chirp models and provides diarization and automatic language detection.
Model details
Chirp 3: Transcription, is exclusively available within the Speech-to-Text API V2.
Model identifiers
You can use Chirp 3: Transcription just like any other model by specifying the appropriate model identifier in your recognition request when using the API or the model name while in the Google Cloud console. Specify the appropriate identifier in your recognition.
Model | Model identifier |
---|---|
Chirp 3 | chirp_3 |
API methods
Not all recognition methods support the same language availability sets, because Chirp 3 is available in the Speech-to-Text API V2, it supports the following recognition methods: Not all recognition methods support the same language availability sets, because Chirp 3 is available in the Speech-to-Text API V2, it supports the following recognition methods:
API | API method support | Support |
---|---|---|
v2
|
Speech.StreamingRecognize (good for streaming and real-time audio) | Supported |
v2
|
Speech.Recognize (good for audio shorter than one minute) | Supported |
v2
|
Speech. BatchRecognize (good for long audio 1 minute to 1 hour) | Supported |
Regional availability
Chirp 3 is available in the following Google Cloud regions, with more planned:
Google Cloud Zone | Launch Readiness |
---|---|
us
|
Public Preview |
Using the locations API as explained here, you can find the latest list of supported Google Cloud regions, languages and locales, and features for each transcription model.
Language availability for transcription
Chirp 3 supports transcription in StreamingRecognize
, Recognize
, and BatchRecognize
in the
following languages:
Language | BCP-47 Code
|
---|---|
Arabic (United Arab Emirates) | ar-AE
|
Arabic (Bahrain) | ar-BH
|
Arabic (Algeria) | ar-DZ
|
Arabic (Egypt) | ar-EG
|
Arabic (Israel) | ar-IL
|
Central Kurdish (Iraq) | ar-IQ
|
Arabic (Jordan) | ar-JO
|
Arabic (Kuwait) | ar-KW
|
Arabic (Lebanon) | ar-LB
|
Arabic (Morocco) | ar-MA
|
Arabic (Mauritania) | ar-MR
|
Arabic (Oman) | ar-OM
|
Arabic (State of Palestine) | ar-PS
|
Arabic (Qatar) | ar-QA
|
Arabic (Saudi Arabia) | ar-SA
|
Arabic (Syria) | ar-SY
|
Arabic (Tunisia) | ar-TN
|
Arabic | ar-XA
|
Arabic (Yemen) | ar-YE
|
Bulgarian (Bulgaria) | bg-BG
|
Bengali (Bangladesh) | bn-BD
|
Bengali (India) | bn-IN
|
Catalan (Spain) | ca-ES
|
Chinese (Simplified, China) | cmn-Hans-CN
|
Chinese, Cantonese (Traditional Hong Kong) | yue-Hant-HK
|
Chinese, Mandarin (Traditional, Taiwan) | cmn-Hant-TW
|
Czech (Czech Republic) | cs-CZ
|
Danish (Denmark) | da-DK
|
German (Germany) | de-DE
|
Greek (Greece) | el-GR
|
English (Australia) | en-AU
|
English (United Kingdom) | en-GB
|
English (India) | en-IN
|
English (Philippines) | en-PH
|
English (United States) | en-US
|
Spanish (Mexico) | es-MX
|
Spanish (Spain) | es-ES
|
Spanish (United States) | es-US
|
Estonian (Estonia) | et-EE
|
Persian (Iran) | fa-IR
|
French (France) | fr-FR
|
Finnish (Finland) | fi-FI
|
Filipino (Philippines) | fil-PH
|
French (Canada) | fr-CA
|
Gujarati (India) | gu-IN
|
Hindi (India) | hi-IN
|
Croatian (Croatia) | hr-HR
|
Hungarian (Hungary) | hu-HU
|
Armenian (Armenia) | hy-AM
|
Indonesian (Indonesia) | id-ID
|
Italian (Italy) | it-IT
|
Hebrew (Israel) | iw-IL
|
Japanese (Japan) | ja-JP
|
Khmer (Cambodia) | km-KH
|
Kannada (India) | kn-IN
|
Korean (Korea) | ko-KR
|
Lao (Laos) | lo-LA
|
Lithuanian (Lithuania) | lt-LT
|
Latvian (Latvia) | lv-LV
|
Malayalam (India) | ml-IN
|
Marathi (India) | mr-IN
|
Malay (Malaysia) | ms-MY
|
Burmese (Myanmar) | my-MM
|
Nepali (Nepal) | ne-NP
|
Dutch (Netherlands) | nl-NL
|
Norwegian (Norway) | no-NO
|
Polish (Poland) | pl-PL
|
Portuguese (Brazil) | pt-BR
|
Portuguese (Portugal) | pt-PT
|
Romanian (Romania) | ro-RO
|
Russian (Russia) | ru-RU
|
Slovak (Slovakia) | sk-SK
|
Slovenian (Slovenia) | sl-SI
|
Serbian (Serbia) | sr-RS
|
Swedish (Sweden) | sv-SE
|
Swahili | sw
|
Tamil (India) | ta-IN
|
Telugu (India) | te-IN
|
Thai (Thailand) | th-TH
|
Turkish (Turkey) | tr-TR
|
Ukrainian (Ukraine) | uk-UA
|
Uzbek (Uzbekistan) | uz-UZ
|
Vietnamese (Vietnam) | vi-VN
|
Language availability for diarization
Chirp 3 supports transcription and diarization only in BatchRecognize
and Recognize
in the following languages:
Language | BCP-47 Code |
---|---|
Chinese (Simplified, China) | cmn-Hans-CN |
German (Germany) | de-DE |
English (United Kingdom) | en-GB |
English (India) | en-IN |
English (United States) | en-US |
Spanish (Spain) | es-ES |
Spanish (United States) | es-US |
French (Canada) | fr-CA |
French (France) | fr-FR |
Hindi (India) | hi-IN |
Italian (Italy) | it-IT |
Japanese (Japan) | ja-JP |
Korean (Korea) | ko-KR |
Portuguese (Brazil) | pt-BR |
Feature support and limitations
Chirp 3 supports the following features:
Feature | Description | Launch Stage |
---|---|---|
Automatic punctuation
|
Automatically generated by the model and can be optionally disabled. | Preview |
Automatic capitalization
|
Automatically generated by the model and can be optionally disabled. | Preview |
Speaker Diarization
|
Automatically identify the different speakers in a single-channel audio sample. | Preview |
Language-agnostic audio transcription.
|
The model automatically infers the spoken language in your audio file and transcribes in the most prevalent language. | Preview |
Chirp 3 doesn't support the following features:
Feature | Description |
---|---|
Word-timings (Timestamps) | Automatically generated by the model and can be optionally disabled. |
Word-level confidence scores | The API returns a value, but it isn't truly a confidence score. |
Speech adaptation (Biasing) | Provide hints to the model in the form of phrases or words to improve recognition accuracy for specific terms or proper nouns. |
Using Chirp 3
Using Chirp 3 for transcription and diarization tasks.
Transcribe using Chirp 3 batch request with diarization
Discover how to use Chirp 3 for your transcription needs
Perform batch speech recognition
import
os
from
google.cloud.speech_v2
import
SpeechClient
from
google.cloud.speech_v2.types
import
cloud_speech
from
google.api_core.client_options
import
ClientOptions
PROJECT_ID
=
os
.
getenv
(
"GOOGLE_CLOUD_PROJECT"
)
def
transcribe_batch_chirp3
(
audio_uri
:
str
,
)
-
> cloud_speech
.
BatchRecognizeResults
:
"""Transcribes an audio file from a Google Cloud Storage URI using the Chirp 3 model of Google Cloud Speech-to-Text V2 API.
Args:
audio_uri (str): The Google Cloud Storage URI of the input
audio file. E.g., gs://[BUCKET]/[FILE]
Returns:
cloud_speech.RecognizeResponse: The response from the
Speech-to-Text API containing the transcription results.
"""
# Instantiates a client
client
=
SpeechClient
(
client_options
=
ClientOptions
(
api_endpoint
=
"us-west1-speech.googleapis.com"
,
)
)
speaker_diarization_config
=
cloud_speech
.
SpeakerDiarizationConfig
(
min_speaker_count
=
1
,
# minimum number of speakers
max_speaker_count
=
6
,
# maximum expected number of speakers
)
config
=
cloud_speech
.
RecognitionConfig
(
auto_decoding_config
=
cloud_speech
.
AutoDetectDecodingConfig
(),
language_codes
=
[
"en-US"
],
# Use "auto" to detect language
model
=
"chirp_3"
,
features
=
cloud_speech
.
RecognitionFeatures
(
diarization_config
=
speaker_diarization_config
,
),
)
file_metadata
=
cloud_speech
.
BatchRecognizeFileMetadata
(
uri
=
audio_uri
)
request
=
cloud_speech
.
BatchRecognizeRequest
(
recognizer
=
f
"projects/
{
PROJECT_ID
}
/locations/us-west1/recognizers/_"
,
config
=
config
,
files
=
[
file_metadata
],
recognition_output_config
=
cloud_speech
.
RecognitionOutputConfig
(
inline_response_config
=
cloud_speech
.
InlineOutputConfig
(),
),
)
# Transcribes the audio into text
operation
=
client
.
batch_recognize
(
request
=
request
)
print
(
"Waiting for operation to complete..."
)
response
=
operation
.
result
(
timeout
=
120
)
for
result
in
response
.
results
[
audio_uri
]
.
transcript
.
results
:
print
(
f
"Transcript:
{
result
.
alternatives
[
0
]
.
transcript
}
"
)
print
(
f
"Detected Language:
{
result
.
language_code
}
"
)
print
(
f
"Speakers per word:
{
result
.
alternatives
[
0
]
.
words
}
"
)
return
response
.
results
[
audio_uri
]
.
transcript