Stay organized with collectionsSave and categorize content based on your preferences.
Speech-to-Text API v2 brings the latest Google Cloud API design for
customers to meet enterprise security and regulatory requirements out of
the box.
These requirements are realized through the following:
Data Residency: Speech-to-Text v2 offers the broad
range of our existing transcription models inGoogle Cloud regionssuch as Belgium or Singapore. This allows the invocation of our
transcription models through a fully regionalized service.
Recognizer Resourcefulness: Recognizers are reusable
recognition configurations that can contain a combination of model,
language, and features. This resourceful implementation eliminates the need
for dedicated service accounts for authentication and authorization.
Logging: Resource creation and transcriptions generate logs available
in the Google Cloud console, allowing for better telemetry and debugging.
Audio Auto-Detect: Speech-to-Text v2 can automatically
detect the sample rate, channel count, and format of your audio files,
without needing to provide that information in the request configuration.
Migrating from v1 to v2
Migration from the v1 API to the v2 API does not happen automatically. Minimal
implementation changes are required to take advantage of the feature set.
Migrating in API
Similar to Speech-to-Text v1, totranscribe audio,
you need to create aRecognitionConfigby
selecting the language of your audio and the recognition model of your
choice:
Python
importosfromgoogle.cloud.speech_v2importSpeechClientfromgoogle.cloud.speech_v2.typesimportcloud_speechPROJECT_ID=os.getenv("GOOGLE_CLOUD_PROJECT")defquickstart_v2(audio_file:str)->cloud_speech.RecognizeResponse:"""Transcribe an audio file.Args:audio_file (str): Path to the local audio file to be transcribed.Returns:cloud_speech.RecognizeResponse: The response from the recognize request, containingthe transcription results"""# Reads a file as byteswithopen(audio_file,"rb")asf:audio_content=f.read()# Instantiates a clientclient=SpeechClient()config=cloud_speech.RecognitionConfig(auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),language_codes=["en-US"],model="long",)request=cloud_speech.RecognizeRequest(recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",config=config,content=audio_content,)# Transcribes the audio into textresponse=client.recognize(request=request)forresultinresponse.results:print(f"Transcript:{result.alternatives[0].transcript}")returnresponse
importosfromgoogle.api_core.client_optionsimportClientOptionsfromgoogle.cloud.speech_v2importSpeechClientfromgoogle.cloud.speech_v2.typesimportcloud_speechPROJECT_ID=os.getenv("GOOGLE_CLOUD_PROJECT")defchange_speech_v2_location(audio_file:str,location:str)->cloud_speech.RecognizeResponse:"""Transcribe an audio file in a specific region. It allows for specifying the locationto potentially reduce latency and meet data residency requirements.Args:audio_file (str): Path to the local audio file to be transcribed.location (str): The region where the Speech API will be accessed.E.g., "europe-west3"Returns:cloud_speech.RecognizeResponse: The full response object which includes the transcription results."""# Reads a file as byteswithopen(audio_file,"rb")asf:audio_content=f.read()# Instantiates a client to a regionalized Speech endpoint.client=SpeechClient(client_options=ClientOptions(api_endpoint=f"{location}-speech.googleapis.com",))config=cloud_speech.RecognitionConfig(auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),language_codes=["en-US"],model="long",)request=cloud_speech.RecognizeRequest(recognizer=f"projects/{PROJECT_ID}/locations/{location}/recognizers/_",config=config,content=audio_content,)# Transcribes the audio into textresponse=client.recognize(request=request)forresultinresponse.results:print(f"Transcript:{result.alternatives[0].transcript}")returnresponse
Optionally,create a recognizer resourceif you need to reuse a
specific recognition configuration across many transcription requests:
Python
importosfromgoogle.cloud.speech_v2importSpeechClientfromgoogle.cloud.speech_v2.typesimportcloud_speechPROJECT_ID=os.getenv("GOOGLE_CLOUD_PROJECT")defcreate_recognizer(recognizer_id:str)->cloud_speech.Recognizer:"""Сreates a recognizer with an unique ID and default recognition configuration.Args:recognizer_id (str): The unique identifier for the recognizer to be created.Returns:cloud_speech.Recognizer: The created recognizer object with configuration."""# Instantiates a clientclient=SpeechClient()request=cloud_speech.CreateRecognizerRequest(parent=f"projects/{PROJECT_ID}/locations/global",recognizer_id=recognizer_id,recognizer=cloud_speech.Recognizer(default_recognition_config=cloud_speech.RecognitionConfig(language_codes=["en-US"],model="long"),),)# Sends the request to create a recognizer and waits for the operation to completeoperation=client.create_recognizer(request=request)recognizer=operation.result()print("Created Recognizer:",recognizer.name)returnrecognizer
There are other differences in the requests and responses in the new v2 API.
For more details, see thereference documentation.
Migrating in UI
To migrate through Speech Google Cloud console, follow these steps:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[],[],null,["# Migrating from Speech-to-Text v1 to v2\n\nSpeech-to-Text API v2 brings the latest Google Cloud API design for\ncustomers to meet enterprise security and regulatory requirements out of\nthe box.\n\nThese requirements are realized through the following:\n\n- [**Data Residency**](/speech-to-text/v2/docs/locations): Speech-to-Text v2 offers the broad\n range of our existing transcription models in\n [Google Cloud regions](https://cloud.google.com/about/locations)\n such as Belgium or Singapore. This allows the invocation of our\n transcription models through a fully regionalized service.\n\n- [**Recognizer Resourcefulness**](/speech-to-text/v2/docs/recognizers): Recognizers are reusable\n recognition configurations that can contain a combination of model,\n language, and features. This resourceful implementation eliminates the need\n for dedicated service accounts for authentication and authorization.\n\n- **Logging**: Resource creation and transcriptions generate logs available\n in the Google Cloud console, allowing for better telemetry and debugging.\n\n- [**Encryption**](/speech-to-text/v2/docs/encryption): Speech-to-Text v2 supports\n [Customer-managed encryption keys](/kms/docs/cmek) for all resources as well\n as batch transcription.\n\n- [**Audio Auto-Detect**](/speech-to-text/v2/docs/encoding): Speech-to-Text v2 can automatically\n detect the sample rate, channel count, and format of your audio files,\n without needing to provide that information in the request configuration.\n\nMigrating from v1 to v2\n-----------------------\n\nMigration from the v1 API to the v2 API does not happen automatically. Minimal\nimplementation changes are required to take advantage of the feature set.\n\n### Migrating in API\n\nSimilar to Speech-to-Text v1, to [transcribe audio](/speech-to-text/v2/docs/transcribe-client-libraries),\nyou need to create a [`RecognitionConfig`](/speech-to-text/v2/docs/reference/rpc/google.cloud.speech.v2#recognitionconfig) by\nselecting the language of your audio and the recognition model of your\nchoice:\n**Note:** The difference between the v1 and v2 versions of the Speech-to-Text API in the definition of `RecognitionConfig` message is the addition of the [`AutoDetectDecodingConfig`](/speech-to-text/v2/docs/reference/rpc/google.cloud.speech.v2#autodetectdecodingconfig) message, which automatically detects the audio specifications. \n\n### Python\n\n import os\n\n from google.cloud.speech_v2 import SpeechClient\n from google.cloud.speech_v2.types import cloud_speech\n\n PROJECT_ID = os.getenv(\"GOOGLE_CLOUD_PROJECT\")\n\n\n def quickstart_v2(audio_file: str) -\u003e cloud_speech.RecognizeResponse:\n \"\"\"Transcribe an audio file.\n Args:\n audio_file (str): Path to the local audio file to be transcribed.\n Returns:\n cloud_speech.RecognizeResponse: The response from the recognize request, containing\n the transcription results\n \"\"\"\n # Reads a file as bytes\n with open(audio_file, \"rb\") as f:\n audio_content = f.read()\n\n # Instantiates a client\n client = SpeechClient()\n\n config = cloud_speech.RecognitionConfig(\n auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),\n language_codes=[\"en-US\"],\n model=\"long\",\n )\n\n request = cloud_speech.RecognizeRequest(\n recognizer=f\"projects/{PROJECT_ID}/locations/global/recognizers/_\",\n config=config,\n content=audio_content,\n )\n\n # Transcribes the audio into text\n response = client.recognize(request=request)\n\n for result in response.results:\n print(f\"Transcript: {result.alternatives[0].transcript}\")\n\n return response\n\nIf needed, [select a region](/speech-to-text/v2/docs/locations) in which you want to use the Speech-to-Text API,\nand check the [language and model availability](/speech-to-text/v2/docs/speech-to-text-supported-languages) in that region:\n\n### Python\n\n import os\n\n from google.api_core.client_options import ClientOptions\n from google.cloud.speech_v2 import SpeechClient\n from google.cloud.speech_v2.types import cloud_speech\n\n PROJECT_ID = os.getenv(\"GOOGLE_CLOUD_PROJECT\")\n\n\n def change_speech_v2_location(\n audio_file: str, location: str\n ) -\u003e cloud_speech.RecognizeResponse:\n \"\"\"Transcribe an audio file in a specific region. It allows for specifying the location\n to potentially reduce latency and meet data residency requirements.\n Args:\n audio_file (str): Path to the local audio file to be transcribed.\n location (str): The region where the Speech API will be accessed.\n E.g., \"europe-west3\"\n Returns:\n cloud_speech.RecognizeResponse: The full response object which includes the transcription results.\n \"\"\"\n # Reads a file as bytes\n with open(audio_file, \"rb\") as f:\n audio_content = f.read()\n\n # Instantiates a client to a regionalized Speech endpoint.\n client = SpeechClient(\n client_options=ClientOptions(\n api_endpoint=f\"{location}-speech.googleapis.com\",\n )\n )\n\n config = cloud_speech.RecognitionConfig(\n auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),\n language_codes=[\"en-US\"],\n model=\"long\",\n )\n\n request = cloud_speech.RecognizeRequest(\n recognizer=f\"projects/{PROJECT_ID}/locations/{location}/recognizers/_\",\n config=config,\n content=audio_content,\n )\n\n # Transcribes the audio into text\n response = client.recognize(request=request)\n\n for result in response.results:\n print(f\"Transcript: {result.alternatives[0].transcript}\")\n return response\n\nOptionally, [create a recognizer resource](/speech-to-text/v2/docs/recognizers) if you need to reuse a\nspecific recognition configuration across many transcription requests: \n\n### Python\n\n import os\n\n from google.cloud.speech_v2 import SpeechClient\n from google.cloud.speech_v2.types import cloud_speech\n\n PROJECT_ID = os.getenv(\"GOOGLE_CLOUD_PROJECT\")\n\n\n def create_recognizer(recognizer_id: str) -\u003e cloud_speech.Recognizer:\n \"\"\"Сreates a recognizer with an unique ID and default recognition configuration.\n Args:\n recognizer_id (str): The unique identifier for the recognizer to be created.\n Returns:\n cloud_speech.Recognizer: The created recognizer object with configuration.\n \"\"\"\n # Instantiates a client\n client = SpeechClient()\n\n request = cloud_speech.CreateRecognizerRequest(\n parent=f\"projects/{PROJECT_ID}/locations/global\",\n recognizer_id=recognizer_id,\n recognizer=cloud_speech.Recognizer(\n default_recognition_config=cloud_speech.RecognitionConfig(\n language_codes=[\"en-US\"], model=\"long\"\n ),\n ),\n )\n # Sends the request to create a recognizer and waits for the operation to complete\n operation = client.create_recognizer(request=request)\n recognizer = operation.result()\n\n print(\"Created Recognizer:\", recognizer.name)\n return recognizer\n\nThere are other differences in the requests and responses in the new v2 API.\nFor more details, see the [reference documentation](/speech-to-text/v2/docs/apis).\n\n### Migrating in UI\n\nTo migrate through Speech Google Cloud console, follow these steps:\n\n1. Go to [Speech Google Cloud console](https://console.cloud.google.com/speech).\n\n2. Navigate to the **Transcriptions** Page.\n\n3. Click **New Transcription** and select your audio in the **Audio configuration** tab.\n\n4. In the **Transcription options** tab, select **V2**."]]