This page describes how to use a specific machine learning model for audio transcription requests to Speech-to-Text.
Transcription models
Speech-to-Text detects words in an audio clip by comparing input to one of many machine learning models . Each model has been trained by analyzing millions of examples—in this case, many, many audio recordings of people speaking.
Speech-to-Text has specialized models trained from audio from specific sources, for example phone calls or videos. Because of this training process, these specialized models provide better results when applied towards similar kinds of audio data.
For example, Speech-to-Text has a transcription model trained
to recognize speech recorded over the phone. When Speech-to-Text
uses the telephony
or telephony_short
model to transcribe phone audio, it
produces more accurate transcription results than if it had transcribed phone
audio using the latest_short
or latest_long
models.
The following table shows the transcriptions models available for use with Speech-to-Text.
latest_long
latest_short
telephony
telephony_short
medical_dictation
This is a premium model that costs more than the standard rate. See the pricing page for more details.
medical_conversation
This is a premium model that costs more than the standard rate. See the pricing page for more details.
command_and_search
default
phone_call
video
Best for audio from video clips or other sources (such as podcasts) that have multiple speakers. This model is also often the best choice for audio that was recorded with a high-quality microphone or that has lots of background noise. For best results, provide audio recorded at 16,000Hz or greater sampling rate.
Select a model for audio transcription
To specify a specific model to use for audio transcription, you
must set the model
field to one of the allowed values—such as latest_long
, latest_short
, telephony
, or telephony_short
—in the RecognitionConfig
parameters for the request.
Speech-to-Text supports model selection for all speech
recognition methods: speech:recognize
, speech:longrunningrecognize
,
and Streaming
.
Perform transcription of a local audio file
Protocol
Refer to the speech:recognize
API endpoint for
complete details.
To perform synchronous speech recognition, make a POST
request and provide the
appropriate request body. The following shows an example of a POST
request using curl
. The example uses the Google Cloud CLI
to generate an access
token. For instructions on installing the gcloud CLI,
see the quickstart
.
curl -s -H "Content-Type: application/json" \ -H "Authorization: Bearer $( gcloud auth application-default print-access-token ) " \ https://speech.googleapis.com/v1/speech:recognize \ --data '{ "config": { "encoding": "LINEAR16", "sampleRateHertz": 16000, "languageCode": "en-US", "model": "video" }, "audio": { "uri": "gs://cloud-samples-tests/speech/Google_Gnome.wav" } }'
See the RecognitionConfig
reference
documentation for more information on configuring the request body.
If the request is successful, the server returns a 200 OK
HTTP
status code and the response in JSON format:
{ "results": [ { "alternatives": [ { "transcript": "OK Google stream stranger things from Netflix to my TV okay stranger things from Netflix playing on TV from the people that brought you Google home comes the next evolution of the smart home and it's just outside your window me Google know hi how can I help okay no what's the weather like outside the weather outside is sunny and 76 degrees he's right okay no turn on the hose I'm holding sure okay no I'm can I eat this lemon tree leaf yes what about this Daisy yes but I wouldn't recommend it but I could eat it okay Nomad milk to my shopping list I'm sorry that sounds like an indoor request I keep doing that sorry you do keep doing that okay no is this compost really we're all compost if you think about it pretty much everything is made up of organic matter and will return", "confidence": 0.9251011 } ] } ] }
Go
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Go API reference documentation .
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Java
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Java API reference documentation .
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Node.js
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Node.js API reference documentation .
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Python
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Python API reference documentation .
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for Ruby.
Perform transcription of a Cloud Storage audio file
Go
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Go API reference documentation .
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Java
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Java API reference documentation .
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Node.js
To learn how to install and use the client library for Speech-to-Text, see Speech-to-Text client libraries . For more information, see the Speech-to-Text Node.js API reference documentation .
To authenticate to Speech-to-Text, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Additional languages
C#: Please follow the C# setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for .NET.
PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for PHP.
Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Speech-to-Text reference documentation for Ruby.