Method: text.synthesize

HTTP request
Request body
- JSON representation
Response body
- JSON representation
Authorization scopes
TimepointType
AdvancedVoiceOptions
- JSON representation
Timepoint
- JSON representation
Try it!

Synthesizes speech synchronously: receive results after all text input has been processed.

HTTP request

POST https://texttospeech.googleapis.com/v1beta1/text:synthesize

Request body

The request body contains data with the following structure:

JSON representation

JSON representation
{ "input" : { object ( `SynthesisInput` ) } , "voice" : { object ( `VoiceSelectionParams` ) } , "audioConfig" : { object ( `AudioConfig` ) } , "enableTimePointing" : [ enum ( `TimepointType` ) ] , "advancedVoiceOptions" : { object ( `AdvancedVoiceOptions` ) } }

 { 
 "input" 
 : 
 { 
 object (  SynthesisInput 
 
) 
 } 
 , 
 "voice" 
 : 
 { 
 object ( VoiceSelectionParams 
) 
 } 
 , 
 "audioConfig" 
 : 
 { 
 object (  AudioConfig 
 
) 
 } 
 , 
 "enableTimePointing" 
 : 
 [ 
 enum (  TimepointType 
 
) 
 ] 
 , 
 "advancedVoiceOptions" 
 : 
 { 
 object (  AdvancedVoiceOptions 
 
) 
 } 
 }

Fields
`input`	`object ( SynthesisInput )` Required. The Synthesizer requires either plain text or SSML as input.
`voice`	`object ( VoiceSelectionParams )` Required. The desired voice of the synthesized audio.
`audioConfig`	`object ( AudioConfig )` Required. The configuration of the synthesized audio.
`enableTimePointing[]`	`enum ( TimepointType )` Whether and what timepoints are returned in the response.
`advancedVoiceOptions`	`object ( AdvancedVoiceOptions )` Advanced voice options.

Response body

The message returned to the client by the text.synthesize method.

If successful, the response body contains data with the following structure:

JSON representation
{ "audioContent" : string , "timepoints" : [ { object ( `Timepoint` ) } ] , "audioConfig" : { object ( `AudioConfig` ) } }

Fields

Fields
`audioContent`	`string ( bytes format)` The audio data bytes encoded as specified in the request, including the header for encodings that are wrapped in containers (e.g. MP3, OGG_OPUS). For LINEAR16 audio, we include the WAV header. Note: as with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64. A base64-encoded string.
`timepoints[]`	`object ( Timepoint )` A link between a position in the original request input and a corresponding time in the output audio. It's only supported via `<mark>` of SSML input.
`audioConfig`	`object ( AudioConfig )` The audio metadata of `audioContent` .

audioContent

string ( bytes format)

The audio data bytes encoded as specified in the request, including the header for encodings that are wrapped in containers (e.g. MP3, OGG_OPUS). For LINEAR16 audio, we include the WAV header. Note: as with all bytes fields, protobuffers use a pure binary representation, whereas JSON representations use base64.

A base64-encoded string.

timepoints[]

object ( Timepoint )

A link between a position in the original request input and a corresponding time in the output audio. It's only supported via <mark> of SSML input.

audioConfig

object ( AudioConfig )

The audio metadata of audioContent .

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview .

TimepointType

The type of timepoint information that is returned in the response.

Enums
`TIMEPOINT_TYPE_UNSPECIFIED`	Not specified. No timepoint information will be returned.
`SSML_MARK`	Timepoint information of `<mark>` tags in SSML input will be returned.

AdvancedVoiceOptions

Used for advanced voice options.

JSON representation
{ "lowLatencyJourneySynthesis" : boolean }

Fields

Fields
`lowLatencyJourneySynthesis`	`boolean` Only for Journey voices. If false, the synthesis is context aware and has a higher latency.

lowLatencyJourneySynthesis

boolean

Only for Journey voices. If false, the synthesis is context aware and has a higher latency.

Timepoint

This contains a mapping between a certain point in the input text and a corresponding time in the output audio.

JSON representation
{ "markName" : string , "timeSeconds" : number }

Fields

Fields
`markName`	`string` Timepoint name as received from the client within `<mark>` tag.
`timeSeconds`	`number` Time offset in seconds from the start of the synthesized audio.

markName

string

Timepoint name as received from the client within <mark> tag.

timeSeconds

number

Time offset in seconds from the start of the synthesized audio.

Method: text.synthesize Stay organized with collections Save and categorize content based on your preferences.

HTTP request

Request body

Response body

Authorization scopes

TimepointType

AdvancedVoiceOptions

Timepoint

Method: text.synthesize