SynthesisInput

Contains text input to be synthesized. Either text or ssml must be supplied. Supplying both or neither returns google.rpc.Code.INVALID_ARGUMENT . The input size is limited to 5000 bytes.

JSON representation
 { 
 "customPronunciations" 
 : 
 { 
 object (  CustomPronunciations 
 
) 
 } 
 , 
 // Union field input_source 
can be only one of the following: 
 "text" 
 : 
 string 
 , 
 "markup" 
 : 
 string 
 , 
 "ssml" 
 : 
 string 
 , 
 "multiSpeakerMarkup" 
 : 
 { 
 object (  MultiSpeakerMarkup 
 
) 
 } 
 // End of list of possible types for union field input_source 
. 
 "prompt" 
 : 
 string 
 } 
Fields
customPronunciations

object ( CustomPronunciations )

Optional. The pronunciation customizations are applied to the input. If this is set, the input is synthesized using the given pronunciation customizations.

The initial support is for en-us, with plans to expand to other locales in the future. Instant Clone voices aren't supported.

In order to customize the pronunciation of a phrase, there must be an exact match of the phrase in the input types. If using SSML, the phrase must not be inside a phoneme tag.

Union field input_source . The input source, which is either plain text or SSML. input_source can be only one of the following:
text

string

The raw text to be synthesized.

markup

string

Markup for HD voices specifically. This field may not be used with any other voices.

ssml

string

The SSML document to be synthesized. The SSML document must be valid and well-formed. Otherwise the RPC will fail and return google.rpc.Code.INVALID_ARGUMENT . For more information, see SSML .

multiSpeakerMarkup

object ( MultiSpeakerMarkup )

The multi-speaker input to be synthesized. Only applicable for multi-speaker synthesis.

prompt

string

This system instruction is supported only for controllable/promptable voice models. If this system instruction is used, we pass the unedited text to Gemini-TTS. Otherwise, a default system instruction is used. AI Studio calls this system instruction, Style Instructions.

MultiSpeakerMarkup

A collection of turns for multi-speaker synthesis.

JSON representation
 { 
 "turns" 
 : 
 [ 
 { 
 object (  Turn 
 
) 
 } 
 ] 
 } 
Fields
turns[]

object ( Turn )

Required. Speaker turns.

Turn

A multi-speaker turn.

JSON representation
 { 
 "speaker" 
 : 
 string 
 , 
 "text" 
 : 
 string 
 } 
Fields
speaker

string

Required. The speaker of the turn, for example, 'O' or 'Q'. Please refer to documentation for available speakers.

text

string

Required. The text to speak.

Design a Mobile Site
View Site in Mobile | Classic
Share by: