Method: text.synthesize

Synthesizes speech synchronously: receive results after all text input has been processed.

HTTP request

POST https://texttospeech.googleapis.com/v1beta1/text:synthesize

The URL uses gRPC Transcoding syntax.

Request body

The request body contains data with the following structure:

JSON representation
 { 
 "input" 
 : 
 { 
 object (  SynthesisInput 
 
) 
 } 
 , 
 "voice" 
 : 
 { 
 object ( VoiceSelectionParams 
) 
 } 
 , 
 "audioConfig" 
 : 
 { 
 object (  AudioConfig 
 
) 
 } 
 , 
 "enableTimePointing" 
 : 
 [ 
 enum (  TimepointType 
 
) 
 ] 
 , 
 "advancedVoiceOptions" 
 : 
 { 
 object (  AdvancedVoiceOptions 
 
) 
 } 
 } 
Fields
input

object ( SynthesisInput )

Required. The Synthesizer requires either plain text or SSML as input.

voice

object ( VoiceSelectionParams )

Required. The desired voice of the synthesized audio.

audioConfig

object ( AudioConfig )

Required. The configuration of the synthesized audio.

enableTimePointing[]

enum ( TimepointType )

Whether and what timepoints are returned in the response.

advancedVoiceOptions

object ( AdvancedVoiceOptions )

Advanced voice options.

Response body

The message returned to the client by the text.synthesize method.

If successful, the response body contains data with the following structure:

JSON representation
 { 
 "audioContent" 
 : 
 string 
 , 
 "timepoints" 
 : 
 [ 
 { 
 object (  Timepoint 
 
) 
 } 
 ] 
 , 
 "audioConfig" 
 : 
 { 
 object (  AudioConfig 
 
) 
 } 
 } 
Fields
audioContent

string ( bytes format)

The audio data bytes encoded as specified in the request, including the header for encodings that are wrapped in containers (e.g. MP3, OGG_OPUS). For LINEAR16 audio, we include the WAV header. Note: as with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64.

A base64-encoded string.

timepoints[]

object ( Timepoint )

A link between a position in the original request input and a corresponding time in the output audio. It's only supported via <mark> of SSML input.

audioConfig

object ( AudioConfig )

The audio metadata of audioContent .

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

TimepointType

The type of timepoint information that is returned in the response.

Enums
TIMEPOINT_TYPE_UNSPECIFIED Not specified. No timepoint information will be returned.
SSML_MARK Timepoint information of <mark> tags in SSML input will be returned.

AdvancedVoiceOptions

Used for advanced voice options.

JSON representation
 { 
 "lowLatencyJourneySynthesis" 
 : 
 boolean 
 } 
Fields
lowLatencyJourneySynthesis

boolean

Only for Journey voices. If false, the synthesis is context aware and has a higher latency.

Timepoint

This contains a mapping between a certain point in the input text and a corresponding time in the output audio.

JSON representation
 { 
 "markName" 
 : 
 string 
 , 
 "timeSeconds" 
 : 
 number 
 } 
Fields
markName

string

Timepoint name as received from the client within <mark> tag.

timeSeconds

number

Time offset in seconds from the start of the synthesized audio.

Create a Mobile Website
View Site in Mobile | Classic
Share by: