Transcribe speech to text by using client libraries

This page shows you how to send a speech recognition request to Speech-to-Text in your favorite programming language using the Google Cloud Client Libraries.

Speech-to-Text enables easy integration of Google speech recognition technologies into developer applications. You can send audio data to the Speech-to-Text API, which then returns a text transcription of that audio file. For more information about the service, see Speech-to-Text basics .

Before you begin

Before you can send a request to the Speech-to-Text API, you must have completed the following actions. See the before you begin page for details.

Enable Speech-to-Text on a Google Cloud project.
Make sure billing is enabled for Speech-to-Text.
Install the Google Cloud CLI. After installation, initialize the Google Cloud CLI by running the following command:
```
gcloud  
init
```
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity .
If you're using a local shell, then create local authentication credentials for your user account:
```
gcloud  
auth  
application-default  
login
```
You don't need to do this if you're using Cloud Shell.

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity .
(Optional) Create a new Google Cloud Storage bucket to store your audio data.

Install the client library

Go

go get cloud.google.com/go/speech/apiv1

Java

If you are using Maven , add the following to your pom.xml file. For more information about BOMs, see The Google Cloud Platform Libraries BOM .

 < dependencyManagement 
>  
< dependencies 
>  
< dependency 
>  
< groupId>com 
 . 
 google 
 . 
 cloud 
< / 
 groupId 
>  
< artifactId>libraries 
 - 
 bom 
< / 
 artifactId 
>  
< version>26 
 .66.0 
< / 
 version 
>  
< type>pom 
< / 
 type 
>  
< scope>import 
< / 
 scope 
>  
< / 
 dependency 
>  
< / 
 dependencies 
>
< / 
 dependencyManagement 
>

< dependencies 
>  
< dependency 
>  
< groupId>com 
 . 
 google 
 . 
 cloud 
< / 
 groupId 
>  
< artifactId>google 
 - 
 cloud 
 - 
 speech 
< / 
 artifactId 
>  
< / 
 dependency 
>
< / 
 dependencies 
>

If you are using Gradle , add the following to your dependencies:

  implementation 
  
 ' 
 com 
 . 
 google 
 . 
 cloud 
 : 
 google 
 - 
 cloud 
 - 
 speech 
 : 
 4.67.0 
 '

If you are using sbt , add the following to your dependencies:

  libraryDependencies 
  
 += 
  
 "com.google.cloud" 
  
 % 
  
 "google-cloud-speech" 
  
 % 
  
 "4.67.0"

If you're using Visual Studio Code, IntelliJ, or Eclipse, you can add client libraries to your project using the following IDE plugins:

The plugins provide additional functionality, such as key management for service accounts. Refer to each plugin's documentation for details.

Node.js

Before installing the library, make sure you've prepared your environment for Node.js development .

npm install @google-cloud/speech

Python

Before installing the library, make sure you've prepared your environment for Python development .

pip install --upgrade google-cloud-speech

Make an audio transcription request

Now you can use Speech-to-Text to transcribe an audio file to text. Use the following code to send a recognize request to the Speech-to-Text API.

Go

  // Sample speech-quickstart uses the Google Cloud Speech API to transcribe 
 // audio. 
 package 
  
 main 
 import 
  
 ( 
  
 "context" 
  
 "fmt" 
  
 "log" 
  
 speech 
  
 "cloud.google.com/go/speech/apiv1" 
  
 "cloud.google.com/go/speech/apiv1/speechpb" 
 ) 
 func 
  
 main 
 () 
  
 { 
  
 ctx 
  
 := 
  
 context 
 . 
 Background 
 () 
  
 // Creates a client. 
  
 client 
 , 
  
 err 
  
 := 
  
 speech 
 . 
  NewClient 
 
 ( 
 ctx 
 ) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 log 
 . 
 Fatalf 
 ( 
 "Failed to create client: %v" 
 , 
  
 err 
 ) 
  
 } 
  
 defer 
  
 client 
 . 
 Close 
 () 
  
 // The path to the remote audio file to transcribe. 
  
 fileURI 
  
 := 
  
 "gs://cloud-samples-data/speech/brooklyn_bridge.raw" 
  
 // Detects speech in the audio file. 
  
 resp 
 , 
  
 err 
  
 := 
  
 client 
 . 
 Recognize 
 ( 
 ctx 
 , 
  
& speechpb 
 . 
 RecognizeRequest 
 { 
  
 Config 
 : 
  
& speechpb 
 . 
 RecognitionConfig 
 { 
  
 Encoding 
 : 
  
 speechpb 
 . 
  RecognitionConfig_LINEAR16 
 
 , 
  
 SampleRateHertz 
 : 
  
 16000 
 , 
  
 LanguageCode 
 : 
  
 "en-US" 
 , 
  
 }, 
  
 Audio 
 : 
  
& speechpb 
 . 
 RecognitionAudio 
 { 
  
 AudioSource 
 : 
  
& speechpb 
 . 
 RecognitionAudio_Uri 
 { 
 Uri 
 : 
  
 fileURI 
 }, 
  
 }, 
  
 }) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 log 
 . 
 Fatalf 
 ( 
 "failed to recognize: %v" 
 , 
  
 err 
 ) 
  
 } 
  
 // Prints the results. 
  
 for 
  
 _ 
 , 
  
 result 
  
 := 
  
 range 
  
 resp 
 . 
 Results 
  
 { 
  
 for 
  
 _ 
 , 
  
 alt 
  
 := 
  
 range 
  
 result 
 . 
 Alternatives 
  
 { 
  
 fmt 
 . 
 Printf 
 ( 
 "\"%v\" (confidence=%3f)\n" 
 , 
  
 alt 
 . 
 Transcript 
 , 
  
 alt 
 . 
 Confidence 
 ) 
  
 } 
  
 } 
 }

Java

  // Imports the Google Cloud client library 
 import 
  
 com.google.cloud.speech.v1. RecognitionAudio 
 
 ; 
 import 
  
 com.google.cloud.speech.v1. RecognitionConfig 
 
 ; 
 import 
  
 com.google.cloud.speech.v1. RecognitionConfig 
. AudioEncoding 
 
 ; 
 import 
  
 com.google.cloud.speech.v1. RecognizeResponse 
 
 ; 
 import 
  
 com.google.cloud.speech.v1. SpeechClient 
 
 ; 
 import 
  
 com.google.cloud.speech.v1. SpeechRecognitionAlternative 
 
 ; 
 import 
  
 com.google.cloud.speech.v1. SpeechRecognitionResult 
 
 ; 
 import 
  
 java.util.List 
 ; 
 public 
  
 class 
 QuickstartSample 
  
 { 
  
 /** Demonstrates using the Speech API to transcribe an audio file. */ 
  
 public 
  
 static 
  
 void 
  
 main 
 ( 
 String 
 ... 
  
 args 
 ) 
  
 throws 
  
 Exception 
  
 { 
  
 // Instantiates a client 
  
 try 
  
 ( 
  SpeechClient 
 
  
 speechClient 
  
 = 
  
  SpeechClient 
 
 . 
 create 
 ()) 
  
 { 
  
 // The path to the audio file to transcribe 
  
 String 
  
 gcsUri 
  
 = 
  
 "gs://cloud-samples-data/speech/brooklyn_bridge.raw" 
 ; 
  
 // Builds the sync recognize request 
  
  RecognitionConfig 
 
  
 config 
  
 = 
  
  RecognitionConfig 
 
 . 
 newBuilder 
 () 
  
 . 
  setEncoding 
 
 ( 
  AudioEncoding 
 
 . 
 LINEAR16 
 ) 
  
 . 
  setSampleRateHertz 
 
 ( 
 16000 
 ) 
  
 . 
 setLanguageCode 
 ( 
 "en-US" 
 ) 
  
 . 
 build 
 (); 
  
  RecognitionAudio 
 
  
 audio 
  
 = 
  
  RecognitionAudio 
 
 . 
 newBuilder 
 (). 
 setUri 
 ( 
 gcsUri 
 ). 
 build 
 (); 
  
 // Performs speech recognition on the audio file 
  
  RecognizeResponse 
 
  
 response 
  
 = 
  
 speechClient 
 . 
 recognize 
 ( 
 config 
 , 
  
 audio 
 ); 
  
 List<SpeechRecognitionResult> 
  
 results 
  
 = 
  
 response 
 . 
  getResultsList 
 
 (); 
  
 for 
  
 ( 
  SpeechRecognitionResult 
 
  
 result 
  
 : 
  
 results 
 ) 
  
 { 
  
 // There can be several alternative transcripts for a given chunk of speech. Just use the 
  
 // first (most likely) one here. 
  
  SpeechRecognitionAlternative 
 
  
 alternative 
  
 = 
  
 result 
 . 
 getAlternativesList 
 (). 
 get 
 ( 
 0 
 ); 
  
 System 
 . 
 out 
 . 
 printf 
 ( 
 "Transcription: %s%n" 
 , 
  
 alternative 
 . 
  getTranscript 
 
 ()); 
  
 } 
  
 } 
  
 } 
 }

Node.js

Before running the example, make sure you've prepared your environment for Node.js development .

  // Imports the Google Cloud client library 
 const 
  
 speech 
  
 = 
  
 require 
 ( 
 ' @google-cloud/speech 
' 
 ); 
 // Creates a client 
 const 
  
 client 
  
 = 
  
 new 
  
 speech 
 . 
  SpeechClient 
 
 (); 
 async 
  
 function 
  
 quickstart 
 () 
  
 { 
  
 // The path to the remote LINEAR16 file 
  
 const 
  
 gcsUri 
  
 = 
  
 'gs://cloud-samples-data/speech/brooklyn_bridge.raw' 
 ; 
  
 // The audio file's encoding, sample rate in hertz, and BCP-47 language code 
  
 const 
  
 audio 
  
 = 
  
 { 
  
 uri 
 : 
  
 gcsUri 
 , 
  
 }; 
  
 const 
  
 config 
  
 = 
  
 { 
  
 encoding 
 : 
  
 'LINEAR16' 
 , 
  
 sampleRateHertz 
 : 
  
 16000 
 , 
  
 languageCode 
 : 
  
 'en-US' 
 , 
  
 }; 
  
 const 
  
 request 
  
 = 
  
 { 
  
 audio 
 : 
  
 audio 
 , 
  
 config 
 : 
  
 config 
 , 
  
 }; 
  
 // Detects speech in the audio file 
  
 const 
  
 [ 
 response 
 ] 
  
 = 
  
 await 
  
 client 
 . 
 recognize 
 ( 
 request 
 ); 
  
 const 
  
 transcription 
  
 = 
  
 response 
 . 
 results 
  
 . 
 map 
 ( 
 result 
  
 = 
>  
 result 
 . 
 alternatives 
 [ 
 0 
 ]. 
 transcript 
 ) 
  
 . 
 join 
 ( 
 '\n' 
 ); 
  
 console 
 . 
 log 
 ( 
 `Transcription: 
 ${ 
 transcription 
 } 
 ` 
 ); 
 } 
 quickstart 
 ();

Python

Before running the example, make sure you've prepared your environment for Python development .

  # Imports the Google Cloud client library 
 from 
  
 google.cloud 
  
 import 
 speech 
 def 
  
 run_quickstart 
 () 
 - 
> speech 
 . 
 RecognizeResponse 
 : 
 # Instantiates a client 
 client 
 = 
 speech 
 . 
 SpeechClient 
 () 
 # The name of the audio file to transcribe 
 gcs_uri 
 = 
 "gs://cloud-samples-data/speech/brooklyn_bridge.raw" 
 audio 
 = 
 speech 
 . 
  RecognitionAudio 
 
 ( 
 uri 
 = 
 gcs_uri 
 ) 
 config 
 = 
 speech 
 . 
  RecognitionConfig 
 
 ( 
 encoding 
 = 
 speech 
 . 
 RecognitionConfig 
 . 
 AudioEncoding 
 . 
 LINEAR16 
 , 
 sample_rate_hertz 
 = 
 16000 
 , 
 language_code 
 = 
 "en-US" 
 , 
 ) 
 # Detects speech in the audio file 
 response 
 = 
 client 
 . 
  recognize 
 
 ( 
 config 
 = 
 config 
 , 
 audio 
 = 
 audio 
 ) 
 for 
 result 
 in 
 response 
 . 
 results 
 : 
 print 
 ( 
 f 
 "Transcript: 
 { 
 result 
 . 
 alternatives 
 [ 
 0 
 ] 
 . 
 transcript 
 } 
 " 
 )

Congratulations! You've sent your first request to Speech-to-Text.

If you receive an error or an empty response from Speech-to-Text, take a look at the troubleshooting and error mitigation steps.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

Use the Google Cloud console to delete your project if you do not need it.

What's next

Practice transcribing short audio files .
Learn how to batch long audio files for speech recognition .
Learn how to transcribe streaming audio like from a microphone.
Get started with the Speech-to-Text in your language of choice by using a Speech-to-Text client library .
Work through the sample applications .
For best performance, accuracy, and other tips, see the best practices documentation.