Limits and specifications of the Live API


This page describes various limits and specifications for using the Live API and its models.

Session-related limits

For the Live API , a session refers to a persistent connection where input and output are streamed continuously over the same connection.

If the session exceeds any of the following limits, the connection is terminated.

  • Connection lengthis limited to around 10 minutes.

  • Session lengthdepends on the input modalities:

    • Audio-only input sessions are limited to 15 minutes.
    • Video + audio input are limited to 2 minutes.
  • Session context windowis limited to 128k tokens.

Rate limits

The Live API has rate limits for both concurrent sessions per Firebase project as well as tokens per minute (TPM).

  • Gemini Developer API :

  • Vertex AI Gemini API :

    • 1,000 concurrent sessions per Firebase project
    • 4M tokens per minute

Audio formats

The Live API supports the following audio formats:

  • Input audio format:Raw 16 bit PCM audio at 16kHz little-endian
  • Output audio format:Raw 16 bit PCM audio at 24kHz little-endian

  • Supported MIME types: audio/x-aac , audio/flac , audio/mp3 , audio/m4a , audio/mpeg , audio/mpga , audio/mp4 , audio/ogg , audio/pcm , audio/wav , audio/webm

To convey the sample rate of input audio, set the MIME type of each audio-containing Blob to a value like audio/pcm;rate=16000 .

Video formats

The Live API expects a sequence of discrete image frames and supports video frames input at 1 frame per second (FPS).

  • Recommended input: native 768x768 resolution at 1 FPS.

  • Supported MIME types: video/x-flv , video/quicktime , video/mpeg , video/mpegs , video/mpg , video/mp4 , video/webm , video/wmv , video/3gpp

Note that this specification makes the Live API unsuitable for use cases that require analyzing fast-changing video, such as play-by-play in high-speed sports.

Response voices

The Live API supports the following response voice options. For demos of what each voice sounds like, see Chirp 3: HD voices .

If you don't specify a response voice, the default is Puck .

Learn how to specify the response voice .

Zephyr -- Bright
Kore -- Firm
Orus -- Firm
Autonoe -- Bright
Umbriel -- Easy-going
Erinome -- Clear
Laomedeia -- Upbeat
Schedar -- Even
Achird -- Friendly
Sadachbia -- Lively
Puck -- Upbeat
Fenrir -- Excitable
Aoede -- Breezy
Enceladus -- Breathy
Algieba -- Smooth
Algenib -- Gravelly
Achernar -- Soft
Gacrux -- Mature
Zubenelgenubi -- Casual
Sadaltager -- Knowledgeable
Charon -- Informative
Leda -- Youthful
Callirrhoe -- Easy-going
Iapetus -- Clear
Despina -- Smooth
Rasalgethi -- Informative
Alnilam -- Firm
Pulcherrima -- Forward
Vindemiatrix -- Gentle
Sulafat -- Warm

Languages

The Live API supports the following languages. Learn how to influence the response language .

Language BCP-47 Code Language BCP-47 Code
Arabic (Egyptian)
ar-EG German (Germany) de-DE
English (US)
en-US Spanish (US) es-US
French (France)
fr-FR Hindi (India) hi-IN
Indonesian (Indonesia)
id-ID Italian (Italy) it-IT
Japanese (Japan)
ja-JP Korean (Korea) ko-KR
Portuguese (Brazil)
pt-BR Russian (Russia) ru-RU
Dutch (Netherlands)
nl-NL Polish (Poland) pl-PL
Thai (Thailand)
th-TH Turkish (Turkey) tr-TR
Vietnamese (Vietnam)
vi-VN Romanian (Romania) ro-RO
Ukrainian (Ukraine)
uk-UA Bengali (Bangladesh) bn-BD
English (India)
en-IN & hi-IN bundle Marathi (India) mr-IN
Tamil (India)
ta-IN Telugu (India) te-IN
Design a Mobile Site
View Site in Mobile | Classic
Share by: