Select a transcription model

This page describes how to use a specific machine learning model for audio transcription requests to Speech-to-Text.

Transcription models

Speech-to-Text detects words in an audio clip by comparing input to one of many machine learning models . Each model has been trained by analyzing millions of examples—in this case, many, many audio recordings of people speaking.

Speech-to-Text has specialized models trained from audio from specific sources, for example phone calls or videos. Because of this training process, these specialized models provide better results when applied towards similar kinds of audio data.

For example, Speech-to-Text has a transcription model trained to recognize speech recorded over the phone. When Speech-to-Text uses the telephony or telephony_short model to transcribe phone audio, it produces more accurate transcription results than if it had transcribed phone audio using the latest_short or latest_long models.

The following table shows the transcriptions models available for use with Speech-to-Text.

Model name

Description

latest_long

Use this model for any kind of long form content such as media or spontaneous speech and conversations. Consider using this model in place of the video model, especially if the video model is not available in your target language. You can also use this in place of the default model.

latest_short

Use this model for short utterances that are a few seconds in length. It is useful for trying to capture commands or other single shot directed speech use cases. Consider using this model instead of the command and search model.

telephony

Improved version of the "phone_call" model, best for audio that originated from a phone call, typically recorded at an 8kHz sampling rate.

telephony_short

Dedicated version of the modern "telephony" model for short or even single-word utterances for audio that originated from a phone call, typically recorded at an 8kHz sampling rate.

medical_dictation

Use this model to transcribe notes dictated by a medical professional.

This is a premium model that costs more than the standard rate. See the pricing page for more details.

medical_conversation

Use this model to transcribe a conversation between a medical professional and a patient.

This is a premium model that costs more than the standard rate. See the pricing page for more details.

The following models are mostly based on classic non-conformer architectures and are primarily kept for legacy and backwards-compatibility reasons.

command_and_search

Best for short or single-word utterances like voice commands or voice search.

default

Best for audio that does not fit the other audio models, like long-form audio or dictation. The default model will produce transcription results for any type of audio, including audio such as video clips that have a separate model specifically tailored to it. However, recognizing video clip audio using the default model will likely yield lower-quality results than using the video model. Ideally the audio is high-fidelity, recorded at a 16kHz or greater sampling rate.

phone_call

Best for audio that originated from a phone call (typically recorded at an 8kHz sampling rate).

video

Best for audio from video clips or other sources (such as podcasts) that have multiple speakers. This model is also often the best choice for audio that was recorded with a high-quality microphone or that has lots of background noise. For best results, provide audio recorded at 16,000Hz or greater sampling rate.

Select a model for audio transcription

To specify a specific model to use for audio transcription, you must set the model field to one of the allowed values—such as latest_long , latest_short , telephony , or telephony_short —in the RecognitionConfig parameters for the request. Speech-to-Text supports model selection for all speech recognition methods: speech:recognize , speech:longrunningrecognize , and Streaming .

Perform transcription of a local audio file

Protocol

Refer to the speech:recognize API endpoint for complete details.

To perform synchronous speech recognition, make a POST request and provide the appropriate request body. The following shows an example of a POST request using curl . The example uses the Google Cloud CLI to generate an access token. For instructions on installing the gcloud CLI, see the quickstart .

curl  
-s  
-H  
 "Content-Type: application/json" 
  
 \ 
  
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
application-default  
print-access-token ) 
 " 
  
 \ 
  
https://speech.googleapis.com/v1/speech:recognize  
 \ 
  
--data  
 '{ 
 "config": { 
 "encoding": "LINEAR16", 
 "sampleRateHertz": 16000, 
 "languageCode": "en-US", 
  "model": "video" 
 }, 
 "audio": { 
 "uri": "gs://cloud-samples-tests/speech/Google_Gnome.wav" 
 } 
 }'

See the RecognitionConfig reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "OK Google stream stranger things from
            Netflix to my TV okay stranger things from
            Netflix playing on TV from the people that brought you
            Google home comes the next evolution of the smart home
            and it's just outside your window me Google know hi
            how can I help okay no what's the weather like outside
            the weather outside is sunny and 76 degrees he's right
            okay no turn on the hose I'm holding sure okay no I'm can
            I eat this lemon tree leaf yes what about this Daisy yes
            but I wouldn't recommend it but I could eat it okay
            Nomad milk to my shopping list I'm sorry that sounds like
            an indoor request I keep doing that sorry you do keep
            doing that okay no is this compost really we're all
            compost if you think about it pretty much everything is
            made up of organic matter and will return",
          "confidence": 0.9251011
        }
      ]
    }
  ]
}

Go

To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Go API reference documentation .

To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  func 
  
 modelSelection 
 ( 
 w 
  
 io 
 . 
 Writer 
 ) 
  
 error 
  
 { 
  
 ctx 
  
 := 
  
 context 
 . 
 Background 
 () 
  
 client 
 , 
  
 err 
  
 := 
  
 speech 
 . 
 NewClient 
 ( 
 ctx 
 ) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 fmt 
 . 
 Errorf 
 ( 
 "NewClient: %w" 
 , 
  
 err 
 ) 
  
 } 
  
 defer 
  
 client 
 . 
 Close 
 () 
  
 data 
 , 
  
 err 
  
 := 
  
 os 
 . 
 ReadFile 
 ( 
 "../testdata/Google_Gnome.wav" 
 ) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 fmt 
 . 
 Errorf 
 ( 
 "ReadFile: %w" 
 , 
  
 err 
 ) 
  
 } 
  
 req 
  
 := 
  
& speechpb 
 . 
 RecognizeRequest 
 { 
  
 Config 
 : 
  
& speechpb 
 . 
 RecognitionConfig 
 { 
  
 Encoding 
 : 
  
 speechpb 
 . 
 RecognitionConfig_LINEAR16 
 , 
  
 SampleRateHertz 
 : 
  
 16000 
 , 
  
 LanguageCode 
 : 
  
 "en-US" 
 , 
  
 Model 
 : 
  
 "video" 
 , 
  
 }, 
  
 Audio 
 : 
  
& speechpb 
 . 
 RecognitionAudio 
 { 
  
 AudioSource 
 : 
  
& speechpb 
 . 
 RecognitionAudio_Content 
 { 
 Content 
 : 
  
 data 
 }, 
  
 }, 
  
 } 
  
 resp 
 , 
  
 err 
  
 := 
  
 client 
 . 
 Recognize 
 ( 
 ctx 
 , 
  
 req 
 ) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 fmt 
 . 
 Errorf 
 ( 
 "recognize: %w" 
 , 
  
 err 
 ) 
  
 } 
  
 for 
  
 i 
 , 
  
 result 
  
 := 
  
 range 
  
 resp 
 . 
 Results 
  
 { 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "%s\n" 
 , 
  
 strings 
 . 
 Repeat 
 ( 
 "-" 
 , 
  
 20 
 )) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "Result %d\n" 
 , 
  
 i 
 + 
 1 
 ) 
  
 for 
  
 j 
 , 
  
 alternative 
  
 := 
  
 range 
  
 result 
 . 
 Alternatives 
  
 { 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "Alternative %d: %s\n" 
 , 
  
 j 
 + 
 1 
 , 
  
 alternative 
 . 
 Transcript 
 ) 
  
 } 
  
 } 
  
 return 
  
 nil 
 }

Java

To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Java API reference documentation .

To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  /** 
 * Performs transcription of the given audio file synchronously with the selected model. 
 * 
 * @param fileName the path to a audio file to transcribe 
 */ 
 public 
  
 static 
  
 void 
  
 transcribeModelSelection 
 ( 
 String 
  
 fileName 
 ) 
  
 throws 
  
 Exception 
  
 { 
  
 Path 
  
 path 
  
 = 
  
 Paths 
 . 
 get 
 ( 
 fileName 
 ); 
  
 byte 
 [] 
  
 content 
  
 = 
  
 Files 
 . 
 readAllBytes 
 ( 
 path 
 ); 
  
 try 
  
 ( 
 SpeechClient 
  
 speech 
  
 = 
  
 SpeechClient 
 . 
 create 
 ()) 
  
 { 
  
 // Configure request with video media type 
  
 RecognitionConfig 
  
 recConfig 
  
 = 
  
 RecognitionConfig 
 . 
 newBuilder 
 () 
  
 // encoding may either be omitted or must match the value in the file header 
  
 . 
 setEncoding 
 ( 
 AudioEncoding 
 . 
 LINEAR16 
 ) 
  
 . 
 setLanguageCode 
 ( 
 "en-US" 
 ) 
  
 // sample rate hertz may be either be omitted or must match the value in the file 
  
 // header 
  
 . 
 setSampleRateHertz 
 ( 
 16000 
 ) 
  
 . 
 setModel 
 ( 
 "video" 
 ) 
  
 . 
 build 
 (); 
  
 RecognitionAudio 
  
 recognitionAudio 
  
 = 
  
 RecognitionAudio 
 . 
 newBuilder 
 (). 
 setContent 
 ( 
 ByteString 
 . 
 copyFrom 
 ( 
 content 
 )). 
 build 
 (); 
  
 RecognizeResponse 
  
 recognizeResponse 
  
 = 
  
 speech 
 . 
 recognize 
 ( 
 recConfig 
 , 
  
 recognitionAudio 
 ); 
  
 // Just print the first result here. 
  
 SpeechRecognitionResult 
  
 result 
  
 = 
  
 recognizeResponse 
 . 
 getResultsList 
 (). 
 get 
 ( 
 0 
 ); 
  
 // There can be several alternative transcripts for a given chunk of speech. Just use the 
  
 // first (most likely) one here. 
  
 SpeechRecognitionAlternative 
  
 alternative 
  
 = 
  
 result 
 . 
 getAlternativesList 
 (). 
 get 
 ( 
 0 
 ); 
  
 System 
 . 
 out 
 . 
 printf 
 ( 
 "Transcript : %s\n" 
 , 
  
 alternative 
 . 
 getTranscript 
 ()); 
  
 } 
 }

Node.js

To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Node.js API reference documentation .

To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  // Imports the Google Cloud client library for Beta API 
 /** 
 * TODO(developer): Update client library import to use new 
 * version of API when desired features become available 
 */ 
 const 
  
 speech 
  
 = 
  
 require 
 ( 
 ' @google-cloud/speech 
' 
 ). 
 v1p1beta1 
 ; 
 const 
  
 fs 
  
 = 
  
 require 
 ( 
 'fs' 
 ); 
 // Creates a client 
 const 
  
 client 
  
 = 
  
 new 
  
 speech 
 . 
  SpeechClient 
 
 (); 
 /** 
 * TODO(developer): Uncomment the following lines before running the sample. 
 */ 
 // const filename = 'Local path to audio file, e.g. /path/to/audio.raw'; 
 // const model = 'Model to use, e.g. phone_call, video, default'; 
 // const encoding = 'Encoding of the audio file, e.g. LINEAR16'; 
 // const sampleRateHertz = 16000; 
 // const languageCode = 'BCP-47 language code, e.g. en-US'; 
 const 
  
 config 
  
 = 
  
 { 
  
 encoding 
 : 
  
 encoding 
 , 
  
 sampleRateHertz 
 : 
  
 sampleRateHertz 
 , 
  
 languageCode 
 : 
  
 languageCode 
 , 
  
 model 
 : 
  
 model 
 , 
 }; 
 const 
  
 audio 
  
 = 
  
 { 
  
 content 
 : 
  
 fs 
 . 
 readFileSync 
 ( 
 filename 
 ). 
 toString 
 ( 
 'base64' 
 ), 
 }; 
 const 
  
 request 
  
 = 
  
 { 
  
 config 
 : 
  
 config 
 , 
  
 audio 
 : 
  
 audio 
 , 
 }; 
 // Detects speech in the audio file 
 const 
  
 [ 
 response 
 ] 
  
 = 
  
 await 
  
 client 
 . 
 recognize 
 ( 
 request 
 ); 
 const 
  
 transcription 
  
 = 
  
 response 
 . 
 results 
  
 . 
 map 
 ( 
 result 
  
 = 
>  
 result 
 . 
 alternatives 
 [ 
 0 
 ]. 
 transcript 
 ) 
  
 . 
 join 
 ( 
 '\n' 
 ); 
 console 
 . 
 log 
 ( 
 'Transcription: ' 
 , 
  
 transcription 
 );

Python

To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Python API reference documentation .

To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  from 
  
 google.cloud 
  
 import 
 speech 
 # Instantiates a client 
 client 
 = 
 speech 
 . 
 SpeechClient 
 () 
 # Reads a file as bytes 
 with 
 open 
 ( 
 "resources/Google_Gnome.wav" 
 , 
 "rb" 
 ) 
 as 
 f 
 : 
 audio_content 
 = 
 f 
 . 
 read 
 () 
 audio 
 = 
 speech 
 . 
  RecognitionAudio 
 
 ( 
 content 
 = 
 audio_content 
 ) 
 config 
 = 
 speech 
 . 
  RecognitionConfig 
 
 ( 
 encoding 
 = 
 speech 
 . 
 RecognitionConfig 
 . 
 AudioEncoding 
 . 
 LINEAR16 
 , 
 sample_rate_hertz 
 = 
 16000 
 , 
 language_code 
 = 
 "en-US" 
 , 
 model 
 = 
 "video" 
 , 
 # Chosen model 
 ) 
 response 
 = 
 client 
 . 
  recognize 
 
 ( 
 config 
 = 
 config 
 , 
 audio 
 = 
 audio 
 ) 
 for 
 i 
 , 
 result 
 in 
 enumerate 
 ( 
 response 
 . 
 results 
 ): 
 alternative 
 = 
 result 
 . 
 alternatives 
 [ 
 0 
 ] 
 print 
 ( 
 "-" 
 * 
 20 
 ) 
 print 
 ( 
 f 
 "First alternative of result 
 { 
 i 
 } 
 " 
 ) 
 print 
 ( 
 f 
 "Transcript: 
 { 
 alternative 
 . 
 transcript 
 } 
 " 
 )

Additional languages

C#: Please follow the C# setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for .NET.

PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for PHP.

Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for Ruby.

Perform transcription of a Cloud Storage audio file

Go

To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Go API reference documentation .

To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  import 
  
 ( 
  
 "context" 
  
 "fmt" 
  
 "io" 
  
 "strings" 
  
 speech 
  
 "cloud.google.com/go/speech/apiv1" 
  
 "cloud.google.com/go/speech/apiv1/speechpb" 
 ) 
 // transcribe_model_selection_gcs Transcribes the given audio file asynchronously with 
 // the selected model. 
 func 
  
 transcribe_model_selection_gcs 
 ( 
 w 
  
 io 
 . 
 Writer 
 ) 
  
 error 
  
 { 
  
 ctx 
  
 := 
  
 context 
 . 
 Background 
 () 
  
 client 
 , 
  
 err 
  
 := 
  
 speech 
 . 
  NewClient 
 
 ( 
 ctx 
 ) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 fmt 
 . 
 Errorf 
 ( 
 "NewClient: %w" 
 , 
  
 err 
 ) 
  
 } 
  
 defer 
  
 client 
 . 
 Close 
 () 
  
 audio 
  
 := 
  
& speechpb 
 . 
 RecognitionAudio 
 { 
  
 AudioSource 
 : 
  
& speechpb 
 . 
 RecognitionAudio_Uri 
 { 
 Uri 
 : 
  
 "gs://cloud-samples-tests/speech/Google_Gnome.wav" 
 }, 
  
 } 
  
 // The speech recognition model to use 
  
 // See, https://cloud.google.com/speech-to-text/docs/speech-to-text-requests#select-model 
  
 recognitionConfig 
  
 := 
  
& speechpb 
 . 
 RecognitionConfig 
 { 
  
 Encoding 
 : 
  
 speechpb 
 . 
  RecognitionConfig_LINEAR16 
 
 , 
  
 SampleRateHertz 
 : 
  
 16000 
 , 
  
 LanguageCode 
 : 
  
 "en-US" 
 , 
  
 Model 
 : 
  
 "video" 
 , 
  
 } 
  
 longRunningRecognizeRequest 
  
 := 
  
& speechpb 
 . 
 LongRunningRecognizeRequest 
 { 
  
 Config 
 : 
  
 recognitionConfig 
 , 
  
 Audio 
 : 
  
 audio 
 , 
  
 } 
  
 operation 
 , 
  
 err 
  
 := 
  
 client 
 . 
 LongRunningRecognize 
 ( 
 ctx 
 , 
  
 longRunningRecognizeRequest 
 ) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 fmt 
 . 
 Errorf 
 ( 
 "error running recognize %w" 
 , 
  
 err 
 ) 
  
 } 
  
 response 
 , 
  
 err 
  
 := 
  
 operation 
 . 
  Wait 
 
 ( 
 ctx 
 ) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 err 
  
 } 
  
 for 
  
 i 
 , 
  
 result 
  
 := 
  
 range 
  
 response 
 . 
 Results 
  
 { 
  
 alternative 
  
 := 
  
 result 
 . 
 Alternatives 
 [ 
 0 
 ] 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "%s\n" 
 , 
  
 strings 
 . 
 Repeat 
 ( 
 "-" 
 , 
  
 20 
 )) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "First alternative of result %d" 
 , 
  
 i 
 ) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "Transcript: %s" 
 , 
  
 alternative 
 . 
 Transcript 
 ) 
  
 } 
  
 return 
  
 nil 
 }

Java

To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Java API reference documentation .

To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  /** 
 * Performs transcription of the remote audio file asynchronously with the selected model. 
 * 
 * @param gcsUri the path to the remote audio file to transcribe. 
 */ 
 public 
  
 static 
  
 void 
  
 transcribeModelSelectionGcs 
 ( 
 String 
  
 gcsUri 
 ) 
  
 throws 
  
 Exception 
  
 { 
  
 try 
  
 ( 
 SpeechClient 
  
 speech 
  
 = 
  
 SpeechClient 
 . 
 create 
 ()) 
  
 { 
  
 // Configure request with video media type 
  
 RecognitionConfig 
  
 config 
  
 = 
  
 RecognitionConfig 
 . 
 newBuilder 
 () 
  
 // encoding may either be omitted or must match the value in the file header 
  
 . 
 setEncoding 
 ( 
 AudioEncoding 
 . 
 LINEAR16 
 ) 
  
 . 
 setLanguageCode 
 ( 
 "en-US" 
 ) 
  
 // sample rate hertz may be either be omitted or must match the value in the file 
  
 // header 
  
 . 
 setSampleRateHertz 
 ( 
 16000 
 ) 
  
 . 
 setModel 
 ( 
 "video" 
 ) 
  
 . 
 build 
 (); 
  
 RecognitionAudio 
  
 audio 
  
 = 
  
 RecognitionAudio 
 . 
 newBuilder 
 (). 
 setUri 
 ( 
 gcsUri 
 ). 
 build 
 (); 
  
 // Use non-blocking call for getting file transcription 
  
 OperationFuture<LongRunningRecognizeResponse 
 , 
  
 LongRunningRecognizeMetadata 
>  
 response 
  
 = 
  
 speech 
 . 
 longRunningRecognizeAsync 
 ( 
 config 
 , 
  
 audio 
 ); 
  
 while 
  
 ( 
 ! 
 response 
 . 
 isDone 
 ()) 
  
 { 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Waiting for response..." 
 ); 
  
 Thread 
 . 
 sleep 
 ( 
 10000 
 ); 
  
 } 
  
 List<SpeechRecognitionResult> 
  
 results 
  
 = 
  
 response 
 . 
 get 
 (). 
 getResultsList 
 (); 
  
 // Just print the first result here. 
  
 SpeechRecognitionResult 
  
 result 
  
 = 
  
 results 
 . 
 get 
 ( 
 0 
 ); 
  
 // There can be several alternative transcripts for a given chunk of speech. Just use the 
  
 // first (most likely) one here. 
  
 SpeechRecognitionAlternative 
  
 alternative 
  
 = 
  
 result 
 . 
 getAlternativesList 
 (). 
 get 
 ( 
 0 
 ); 
  
 System 
 . 
 out 
 . 
 printf 
 ( 
 "Transcript : %s\n" 
 , 
  
 alternative 
 . 
 getTranscript 
 ()); 
  
 } 
 }

Node.js

To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Node.js API reference documentation .

To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  // Imports the Google Cloud client library for Beta API 
 /** 
 * TODO(developer): Update client library import to use new 
 * version of API when desired features become available 
 */ 
 const 
  
 speech 
  
 = 
  
 require 
 ( 
 ' @google-cloud/speech 
' 
 ). 
 v1p1beta1 
 ; 
 // Creates a client 
 const 
  
 client 
  
 = 
  
 new 
  
 speech 
 . 
  SpeechClient 
 
 (); 
 /** 
 * TODO(developer): Uncomment the following lines before running the sample. 
 */ 
 // const gcsUri = 'gs://my-bucket/audio.raw'; 
 // const model = 'Model to use, e.g. phone_call, video, default'; 
 // const encoding = 'Encoding of the audio file, e.g. LINEAR16'; 
 // const sampleRateHertz = 16000; 
 // const languageCode = 'BCP-47 language code, e.g. en-US'; 
 const 
  
 config 
  
 = 
  
 { 
  
 encoding 
 : 
  
 encoding 
 , 
  
 sampleRateHertz 
 : 
  
 sampleRateHertz 
 , 
  
 languageCode 
 : 
  
 languageCode 
 , 
  
 model 
 : 
  
 model 
 , 
 }; 
 const 
  
 audio 
  
 = 
  
 { 
  
 uri 
 : 
  
 gcsUri 
 , 
 }; 
 const 
  
 request 
  
 = 
  
 { 
  
 config 
 : 
  
 config 
 , 
  
 audio 
 : 
  
 audio 
 , 
 }; 
 // Detects speech in the audio file. 
 const 
  
 [ 
 response 
 ] 
  
 = 
  
 await 
  
 client 
 . 
 recognize 
 ( 
 request 
 ); 
 const 
  
 transcription 
  
 = 
  
 response 
 . 
 results 
  
 . 
 map 
 ( 
 result 
  
 = 
>  
 result 
 . 
 alternatives 
 [ 
 0 
 ]. 
 transcript 
 ) 
  
 . 
 join 
 ( 
 '\n' 
 ); 
 console 
 . 
 log 
 ( 
 'Transcription: ' 
 , 
  
 transcription 
 );

Additional languages

C#: Please follow the C# setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for .NET.

PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for PHP.

Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for Ruby.

Select a transcription model Stay organized with collections Save and categorize content based on your preferences.

Transcription models

Select a model for audio transcription

Perform transcription of a local audio file

Protocol

Go

Java

Node.js

Python

Additional languages

Perform transcription of a Cloud Storage audio file

Go

Java

Node.js

Additional languages

Select a transcription model