Chirp 3, the latest generation of Google's multilingual Automatic Speech Recognition (ASR)-specific generative models offered by Google Cloud's Speech-to-Text (STT) API v2, is available for Voice transcription .
Set up
Follow these steps to enable transcription with Speech-to-Text Chirp 3.
Console
When you create or update a conversation profile using the Agent Assist console , follow these steps to configure Speech-to-Text settings to use the Chirp 3 model.
- Click Conversation profiles.
- Click the name of your profile.
- Navigate to the Speech to Text Configsection.
- Choose Chirp 3for the model.
- (Optional) Select Use Long Form Model for AA Telephony SipRec Integrationif the audio is transmitted through Telephony Integration.
- (Optional) Configure Language Codeand up to one Alternative Language Codesfor language-restricted transcription .
- (Optional) Configure autoas the language code for language-agnostic transcription .
- (Optional) Configure Phrases for speech adaptationto improve accuracy with model adaptation .
REST API
You can call the API directly to create or update a conversation profile. Enable STT V2 with the ConversationProfile.sttConfig.useSttV2
field, as shown in the following example.
Example Configuration:
{ "name" : "projects/PROJECT_ID/locations/global/conversationProfiles/CONVERSATION_PROFILE_ID" , f "displayName" : "CONVERSATION_PROFILE_NAME" , "automatedAgentConfig" : { }, "humanAgentAssistantConfig" : { "notificationConfig" : { "topic" : "projects/PROJECT_ID/topics/FEATURE_SUGGESTION_TOPIC_ID" , "messageFormat" : "JSON" }, "humanAgentSuggestionConfig" : { "featureConfigs" : [{ "enableEventBasedSuggestion" : true , "suggestionFeature" : { "type" : "ARTICLE_SUGGESTION" }, "conversationModelConfig" : { } }] }, "messageAnalysisConfig" : { } }, "sttConfig" : { "model" : "chirp_3" , "useSttV2" : true , }, "languageCode" : "en-US" }
Best practices
Follow these suggestions to get the most from voice transcription with Chirp 3 model.
Audio streaming
To maximize Chirp 3 performance, send audio in near real time. This means if you have X seconds of audio, stream it in roughly X seconds. Break your audio into small chunks, each with a frame size of 100 ms. For more audio streaming best practices, see the Speech-to-Text documentation .
Use speech adaptation
Use transcription with Chirp 3 speech adaptation only with inline phrases configured in the conversation profile.
Regional and language support
Chirp 3 is available for all Speech-to-Text languages
with different launch readiness, and in all Agent Assist regions
except northamerica-northeast1
, northamerica-northeast2
, and asia-south1
.
Quotas
The number of transcription requests using the Chirp 3 model is limited by the SttV2StreamingRequestsPerMinutePerResourceTypePerRegion
quota with chirp_3
labeled as the resource type. See the Google Cloud quotas guide
for information on quota usage and how to request a quota increase.
For quotas, transcription requests to the global Dialogflow endpoints are in the us-central1
region.

