Migrating from Speech-to-Text v1 to v2

Speech-to-Text API v2 brings the latest Google Cloud API design for customers to meet enterprise security and regulatory requirements out of the box.

These requirements are realized through the following:

  • Data Residency : Speech-to-Text v2 offers the broad range of our existing transcription models in Google Cloud regions such as Belgium or Singapore. This allows the invocation of our transcription models through a fully regionalized service.

  • Recognizer Resourcefulness : Recognizers are reusable recognition configurations that can contain a combination of model, language, and features. This resourceful implementation eliminates the need for dedicated service accounts for authentication and authorization.

  • Logging: Resource creation and transcriptions generate logs available in the Google Cloud console, allowing for better telemetry and debugging.

  • Encryption : Speech-to-Text v2 supports Customer-managed encryption keys for all resources as well as batch transcription.

  • Audio Auto-Detect : Speech-to-Text v2 can automatically detect the sample rate, channel count, and format of your audio files, without needing to provide that information in the request configuration.

Migrating from v1 to v2

Migration from the v1 API to the v2 API does not happen automatically. Minimal implementation changes are required to take advantage of the feature set.

Migrating in API

Similar to Speech-to-Text v1, to transcribe audio , you need to create a RecognitionConfig by selecting the language of your audio and the recognition model of your choice:

Python

  import 
  
 os 
 from 
  
 google.cloud.speech_v2 
  
 import 
 SpeechClient 
 from 
  
 google.cloud.speech_v2.types 
  
 import 
 cloud_speech 
 PROJECT_ID 
 = 
 os 
 . 
 getenv 
 ( 
 "GOOGLE_CLOUD_PROJECT" 
 ) 
 def 
  
 quickstart_v2 
 ( 
 audio_file 
 : 
 str 
 ) 
 - 
> cloud_speech 
 . 
 RecognizeResponse 
 : 
  
 """Transcribe an audio file. 
 Args: 
 audio_file (str): Path to the local audio file to be transcribed. 
 Returns: 
 cloud_speech.RecognizeResponse: The response from the recognize request, containing 
 the transcription results 
 """ 
 # Reads a file as bytes 
 with 
 open 
 ( 
 audio_file 
 , 
 "rb" 
 ) 
 as 
 f 
 : 
 audio_content 
 = 
 f 
 . 
 read 
 () 
 # Instantiates a client 
 client 
 = 
 SpeechClient 
 () 
 config 
 = 
 cloud_speech 
 . 
 RecognitionConfig 
 ( 
 auto_decoding_config 
 = 
 cloud_speech 
 . 
 AutoDetectDecodingConfig 
 (), 
 language_codes 
 = 
 [ 
 "en-US" 
 ], 
 model 
 = 
 "long" 
 , 
 ) 
 request 
 = 
 cloud_speech 
 . 
 RecognizeRequest 
 ( 
 recognizer 
 = 
 f 
 "projects/ 
 { 
 PROJECT_ID 
 } 
 /locations/global/recognizers/_" 
 , 
 config 
 = 
 config 
 , 
 content 
 = 
 audio_content 
 , 
 ) 
 # Transcribes the audio into text 
 response 
 = 
 client 
 . 
 recognize 
 ( 
 request 
 = 
 request 
 ) 
 for 
 result 
 in 
 response 
 . 
 results 
 : 
 print 
 ( 
 f 
 "Transcript: 
 { 
 result 
 . 
 alternatives 
 [ 
 0 
 ] 
 . 
 transcript 
 } 
 " 
 ) 
 return 
 response 
 

If needed, select a region in which you want to use the Speech-to-Text API, and check the language and model availability in that region:

Python

  import 
  
 os 
 from 
  
 google.api_core.client_options 
  
 import 
 ClientOptions 
 from 
  
 google.cloud.speech_v2 
  
 import 
 SpeechClient 
 from 
  
 google.cloud.speech_v2.types 
  
 import 
 cloud_speech 
 PROJECT_ID 
 = 
 os 
 . 
 getenv 
 ( 
 "GOOGLE_CLOUD_PROJECT" 
 ) 
 def 
  
 change_speech_v2_location 
 ( 
 audio_file 
 : 
 str 
 , 
 location 
 : 
 str 
 ) 
 - 
> cloud_speech 
 . 
 RecognizeResponse 
 : 
  
 """Transcribe an audio file in a specific region. It allows for specifying the location 
 to potentially reduce latency and meet data residency requirements. 
 Args: 
 audio_file (str): Path to the local audio file to be transcribed. 
 location (str): The region where the Speech API will be accessed. 
 E.g., "europe-west3" 
 Returns: 
 cloud_speech.RecognizeResponse: The full response object which includes the transcription results. 
 """ 
 # Reads a file as bytes 
 with 
 open 
 ( 
 audio_file 
 , 
 "rb" 
 ) 
 as 
 f 
 : 
 audio_content 
 = 
 f 
 . 
 read 
 () 
 # Instantiates a client to a regionalized Speech endpoint. 
 client 
 = 
 SpeechClient 
 ( 
 client_options 
 = 
 ClientOptions 
 ( 
 api_endpoint 
 = 
 f 
 " 
 { 
 location 
 } 
 -speech.googleapis.com" 
 , 
 ) 
 ) 
 config 
 = 
 cloud_speech 
 . 
 RecognitionConfig 
 ( 
 auto_decoding_config 
 = 
 cloud_speech 
 . 
 AutoDetectDecodingConfig 
 (), 
 language_codes 
 = 
 [ 
 "en-US" 
 ], 
 model 
 = 
 "long" 
 , 
 ) 
 request 
 = 
 cloud_speech 
 . 
 RecognizeRequest 
 ( 
 recognizer 
 = 
 f 
 "projects/ 
 { 
 PROJECT_ID 
 } 
 /locations/ 
 { 
 location 
 } 
 /recognizers/_" 
 , 
 config 
 = 
 config 
 , 
 content 
 = 
 audio_content 
 , 
 ) 
 # Transcribes the audio into text 
 response 
 = 
 client 
 . 
 recognize 
 ( 
 request 
 = 
 request 
 ) 
 for 
 result 
 in 
 response 
 . 
 results 
 : 
 print 
 ( 
 f 
 "Transcript: 
 { 
 result 
 . 
 alternatives 
 [ 
 0 
 ] 
 . 
 transcript 
 } 
 " 
 ) 
 return 
 response 
 

Optionally, create a recognizer resource if you need to reuse a specific recognition configuration across many transcription requests:

Python

  import 
  
 os 
 from 
  
 google.cloud.speech_v2 
  
 import 
 SpeechClient 
 from 
  
 google.cloud.speech_v2.types 
  
 import 
 cloud_speech 
 PROJECT_ID 
 = 
 os 
 . 
 getenv 
 ( 
 "GOOGLE_CLOUD_PROJECT" 
 ) 
 def 
  
 create_recognizer 
 ( 
 recognizer_id 
 : 
 str 
 ) 
 - 
> cloud_speech 
 . 
 Recognizer 
 : 
  
 """Сreates a recognizer with an unique ID and default recognition configuration. 
 Args: 
 recognizer_id (str): The unique identifier for the recognizer to be created. 
 Returns: 
 cloud_speech.Recognizer: The created recognizer object with configuration. 
 """ 
 # Instantiates a client 
 client 
 = 
 SpeechClient 
 () 
 request 
 = 
 cloud_speech 
 . 
 CreateRecognizerRequest 
 ( 
 parent 
 = 
 f 
 "projects/ 
 { 
 PROJECT_ID 
 } 
 /locations/global" 
 , 
 recognizer_id 
 = 
 recognizer_id 
 , 
 recognizer 
 = 
 cloud_speech 
 . 
 Recognizer 
 ( 
 default_recognition_config 
 = 
 cloud_speech 
 . 
 RecognitionConfig 
 ( 
 language_codes 
 = 
 [ 
 "en-US" 
 ], 
 model 
 = 
 "long" 
 ), 
 ), 
 ) 
 # Sends the request to create a recognizer and waits for the operation to complete 
 operation 
 = 
 client 
 . 
 create_recognizer 
 ( 
 request 
 = 
 request 
 ) 
 recognizer 
 = 
 operation 
 . 
 result 
 () 
 print 
 ( 
 "Created Recognizer:" 
 , 
 recognizer 
 . 
 name 
 ) 
 return 
 recognizer 
 

There are other differences in the requests and responses in the new v2 API. For more details, see the reference documentation .

Migrating in UI

To migrate through Speech Google Cloud console, follow these steps:

  1. Go to Speech Google Cloud console .

  2. Navigate to the TranscriptionsPage.

  3. Click New Transcriptionand select your audio in the Audio configurationtab.

  4. In the Transcription optionstab, select V2.

Design a Mobile Site
View Site in Mobile | Classic
Share by: