This page describes how to use a specific machine learning model for audio transcription requests to Speech-to-Text.
Transcription models
Speech-to-Text detects words in an audio clip by comparing input to one of many machine learning models . Each model has been trained by analyzing millions of examples—in this case, many, many audio recordings of people speaking.
Speech-to-Text has specialized models trained from audio from specific sources, for example phone calls or videos. Because of this training process, these specialized models provide better results when applied towards similar kinds of audio data.
For example, Speech-to-Text has a transcription model trained
to recognize speech recorded over the phone. When Speech-to-Text
uses the telephony 
or telephony_short 
model to transcribe phone audio, it
produces more accurate transcription results than if it had transcribed phone
audio using the latest_short 
or latest_long 
models.
The following table shows the transcriptions models available for use with Speech-to-Text.
latest_long 
latest_short 
telephony 
telephony_short 
medical_dictation 
This is a premium model that costs more than the standard rate. See the pricing page for more details.
medical_conversation 
This is a premium model that costs more than the standard rate. See the pricing page for more details.
command_and_search 
default 
phone_call 
video 
Best for audio from video clips or other sources (such as podcasts) that have multiple speakers. This model is also often the best choice for audio that was recorded with a high-quality microphone or that has lots of background noise. For best results, provide audio recorded at 16,000Hz or greater sampling rate.
Select a model for audio transcription
To specify a specific model to use for audio transcription, you
must set the model 
field to one of the allowed values—such as latest_long 
, latest_short 
, telephony 
, or telephony_short 
—in the  RecognitionConfig 
 
parameters for the request.
Speech-to-Text supports model selection for all speech
recognition methods:  speech:recognize 
 
,  speech:longrunningrecognize 
 
,
and Streaming 
.
Perform transcription of a local audio file
Protocol
Refer to the  speech:recognize 
 
API endpoint for
complete details.
To perform synchronous speech recognition, make a POST 
request and provide the
appropriate request body. The following shows an example of a POST 
request using curl 
. The example uses the Google Cloud CLI 
to generate an access
token. For instructions on installing the gcloud CLI,
see the quickstart 
.
curl -s -H "Content-Type: application/json" \ -H "Authorization: Bearer $( gcloud auth application-default print-access-token ) " \ https://speech.googleapis.com/v1/speech:recognize \ --data '{ "config": { "encoding": "LINEAR16", "sampleRateHertz": 16000, "languageCode": "en-US", "model": "video" }, "audio": { "uri": "gs://cloud-samples-tests/speech/Google_Gnome.wav" } }'
See the  RecognitionConfig 
 
reference
documentation for more information on configuring the request body.
If the request is successful, the server returns a 200 OK 
HTTP
status code and the response in JSON format:
{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "OK Google stream stranger things from
            Netflix to my TV okay stranger things from
            Netflix playing on TV from the people that brought you
            Google home comes the next evolution of the smart home
            and it's just outside your window me Google know hi
            how can I help okay no what's the weather like outside
            the weather outside is sunny and 76 degrees he's right
            okay no turn on the hose I'm holding sure okay no I'm can
            I eat this lemon tree leaf yes what about this Daisy yes
            but I wouldn't recommend it but I could eat it okay
            Nomad milk to my shopping list I'm sorry that sounds like
            an indoor request I keep doing that sorry you do keep
            doing that okay no is this compost really we're all
            compost if you think about it pretty much everything is
            made up of organic matter and will return",
          "confidence": 0.9251011
        }
      ]
    }
  ]
} 
Go
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Go API reference documentation .
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Java
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Java API reference documentation .
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Node.js
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Node.js API reference documentation .
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Python
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Python API reference documentation .
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for Ruby.
Perform transcription of a Cloud Storage audio file
Go
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Go API reference documentation .
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Java
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Java API reference documentation .
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Node.js
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Node.js API reference documentation .
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for Ruby.

