Contains text input to be synthesized. Either text
or ssml
must be supplied. Supplying both or neither returns google.rpc.Code.INVALID_ARGUMENT
. The input size is limited to 5000 bytes.
JSON representation |
---|
{ "customPronunciations" : { object ( |
customPronunciations
object (
CustomPronunciations
)
Optional. The pronunciation customizations are applied to the input. If this is set, the input is synthesized using the given pronunciation customizations.
The initial support is for en-us, with plans to expand to other locales in the future. Instant Clone voices aren't supported.
In order to customize the pronunciation of a phrase, there must be an exact match of the phrase in the input types. If using SSML, the phrase must not be inside a phoneme tag.
input_source
. The input source, which is either plain text or SSML. input_source
can be only one of the following:text
string
The raw text to be synthesized.
markup
string
Markup for HD voices specifically. This field may not be used with any other voices.
ssml
string
The SSML document to be synthesized. The SSML document must be valid and well-formed. Otherwise the RPC will fail and return google.rpc.Code.INVALID_ARGUMENT
. For more information, see SSML
.
multiSpeakerMarkup
object (
MultiSpeakerMarkup
)
The multi-speaker input to be synthesized. Only applicable for multi-speaker synthesis.
prompt
string
This system instruction is supported only for controllable/promptable voice models. If this system instruction is used, we pass the unedited text to Gemini-TTS. Otherwise, a default system instruction is used. AI Studio calls this system instruction, Style Instructions.
MultiSpeakerMarkup
A collection of turns for multi-speaker synthesis.
JSON representation |
---|
{
"turns"
:
[
{
object (
|
Fields | |
---|---|
turns[]
|
Required. Speaker turns. |
Turn
A multi-speaker turn.
JSON representation |
---|
{ "speaker" : string , "text" : string } |
Fields | |
---|---|
speaker
|
Required. The speaker of the turn, for example, 'O' or 'Q'. Please refer to documentation for available speakers. |
text
|
Required. The text to speak. |