SynthesisInput

JSON representation
MultiSpeakerMarkup
- JSON representation
Turn
- JSON representation

Contains text input to be synthesized. Either text or ssml must be supplied. Supplying both or neither returns google.rpc.Code.INVALID_ARGUMENT . The input size is limited to 5000 bytes.

JSON representation

JSON representation
{ "customPronunciations" : { object ( `CustomPronunciations` ) } , // Union field `input_source` can be only one of the following: "text" : string , "markup" : string , "ssml" : string , "multiSpeakerMarkup" : { object ( `MultiSpeakerMarkup` ) } // End of list of possible types for union field `input_source` . "prompt" : string }

 { 
 "customPronunciations" 
 : 
 { 
 object (  CustomPronunciations 
 
) 
 } 
 , 
 // Union field input_source 
can be only one of the following: 
 "text" 
 : 
 string 
 , 
 "markup" 
 : 
 string 
 , 
 "ssml" 
 : 
 string 
 , 
 "multiSpeakerMarkup" 
 : 
 { 
 object (  MultiSpeakerMarkup 
 
) 
 } 
 // End of list of possible types for union field input_source 
. 
 "prompt" 
 : 
 string 
 }

Fields

customPronunciations

object ( CustomPronunciations )

Optional. The pronunciation customizations are applied to the input. If this is set, the input is synthesized using the given pronunciation customizations.

The initial support is for en-us, with plans to expand to other locales in the future. Instant Clone voices aren't supported.

In order to customize the pronunciation of a phrase, there must be an exact match of the phrase in the input types. If using SSML, the phrase must not be inside a phoneme tag.

Union field input_source . The input source, which is either plain text or SSML. input_source can be only one of the following:

text

string

The raw text to be synthesized.

markup

string

Markup for HD voices specifically. This field may not be used with any other voices.

ssml

string

The SSML document to be synthesized. The SSML document must be valid and well-formed. Otherwise the RPC will fail and return google.rpc.Code.INVALID_ARGUMENT . For more information, see SSML .

multiSpeakerMarkup

object ( MultiSpeakerMarkup )

The multi-speaker input to be synthesized. Only applicable for multi-speaker synthesis.

prompt

string

This system instruction is supported only for controllable/promptable voice models. If this system instruction is used, we pass the unedited text to Gemini-TTS. Otherwise, a default system instruction is used. AI Studio calls this system instruction, Style Instructions.

MultiSpeakerMarkup

A collection of turns for multi-speaker synthesis.

JSON representation
{ "turns" : [ { object ( `Turn` ) } ] }

Fields

Fields
`turns[]`	`object ( Turn )` Required. Speaker turns.

turns[]

object ( Turn )

Required. Speaker turns.

Turn

A multi-speaker turn.

JSON representation
{ "speaker" : string , "text" : string }

Fields

Fields
`speaker`	`string` Required. The speaker of the turn, for example, 'O' or 'Q'. Please refer to documentation for available speakers.
`text`	`string` Required. The text to speak.

speaker

string

Required. The speaker of the turn, for example, 'O' or 'Q'. Please refer to documentation for available speakers.

text

string

Required. The text to speak.

SynthesisInput Stay organized with collections Save and categorize content based on your preferences.

MultiSpeakerMarkup

Turn

SynthesisInput