Synthesize speech with bidirectional streaming

This document walks you through the process of synthesizing audio using bidirectional streaming.

Bidirectional streaming lets you send text input and receive audio data simultaneously. This means that you can start synthesizing speech before the complete input text is sent, which reduces latency and enables real-time interactions. Voice assistants and interactive games use bidirectional streaming to create more dynamic and responsive applications.

To learn more about the fundamental concepts in Cloud Text-to-Speech, read Cloud Text-to-Speech Basics .

Before you begin

Before you can send a request to the Cloud Text-to-Speech API, you must have completed the following actions. See the before you begin page for details.

Enable Cloud Text-to-Speech on a Google Cloud project.
1. Make sure billing is enabled for Cloud Text-to-Speech.
Install the Google Cloud CLI. After installation, initialize the Google Cloud CLI by running the following command:
```
gcloud  
init
```
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity .

Synthesize speech with bidirectional streaming

Install the client library

Python

Before installing the library, make sure you've prepared your environment for Python development .

pip install --upgrade google-cloud-texttospeech

Send a stream of text and receive a stream of audio

The API accepts a stream of requests with type StreamingSynthesizeRequest , which contain either StreamingSynthesisInput or StreamingSynthesizeConfig .

Before sending a stream StreamingSynthesizeRequest with StreamingSynthesisInput , which provides text input, send exactly one StreamingSynthesizeRequest with a StreamingSynthesizeConfig .

Streaming Cloud Text-to-Speech is only compatible with Chirp 3: HD voices .

Python

Before running the example, make sure you've prepared your environment for Python development .

  #!/usr/bin/env python 
 # Copyright 2024 Google LLC 
 # 
 # Licensed under the Apache License, Version 2.0 (the "License"); 
 # you may not use this file except in compliance with the License. 
 # You may obtain a copy of the License at 
 # 
 #      http://www.apache.org/licenses/LICENSE-2.0 
 # 
 # Unless required by applicable law or agreed to in writing, software 
 # distributed under the License is distributed on an "AS IS" BASIS, 
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
 # See the License for the specific language governing permissions and 
 # limitations under the License. 
 # 
 """Google Cloud Text-To-Speech API streaming sample application . 
 Example usage: 
 python streaming_tts_quickstart.py 
 """ 
 def 
  
 run_streaming_tts_quickstart 
 (): 
  
 """Synthesizes speech from a stream of input text.""" 
 from 
  
 google.cloud 
  
 import 
 texttospeech 
 client 
 = 
 texttospeech 
 . 
  TextToSpeechClient 
 
 () 
 # See https://cloud.google.com/text-to-speech/docs/voices for all voices. 
 streaming_config 
 = 
 texttospeech 
 . 
  StreamingSynthesizeConfig 
 
 ( 
 voice 
 = 
 texttospeech 
 . 
  VoiceSelectionParams 
 
 ( 
 name 
 = 
 "en-US-Chirp3-HD-Charon" 
 , 
 language_code 
 = 
 "en-US" 
 , 
 ) 
 ) 
 # Set the config for your stream. The first request must contain your config, and then each subsequent request must contain text. 
 config_request 
 = 
 texttospeech 
 . 
  StreamingSynthesizeRequest 
 
 ( 
 streaming_config 
 = 
 streaming_config 
 ) 
 text_iterator 
 = 
 [ 
 "Hello there. " 
 , 
 "How are you " 
 , 
 "today? It's " 
 , 
 "such nice weather outside." 
 , 
 ] 
 # Request generator. Consider using Gemini or another LLM with output streaming as a generator. 
 def 
  
 request_generator 
 (): 
 yield 
 config_request 
 for 
 text 
 in 
 text_iterator 
 : 
 yield 
 texttospeech 
 . 
  StreamingSynthesizeRequest 
 
 ( 
 input 
 = 
 texttospeech 
 . 
  StreamingSynthesisInput 
 
 ( 
 text 
 = 
 text 
 ) 
 ) 
 streaming_responses 
 = 
 client 
 . 
  streaming_synthesize 
 
 ( 
 request_generator 
 ()) 
 for 
 response 
 in 
 streaming_responses 
 : 
 print 
 ( 
 f 
 "Audio content size in bytes is: 
 { 
 len 
 ( 
 response 
 . 
 audio_content 
 ) 
 } 
 " 
 ) 
 if 
 __name__ 
 == 
 "__main__" 
 : 
 run_streaming_tts_quickstart 
 ()

Clean up

To avoid unnecessary Google Cloud Platform charges, use the Google Cloud console to delete your project if you do not need it.

What's next

Learn more about Cloud Text-to-Speech by reading the basics .
Review the list of available voices you can use for synthetic speech.