Gemini 2.5 Flash with Gemini Live API

Gemini 2.5 Flash with Gemini Live API native audio features our cutting-edge native audio functionality for Gemini Live API . In addition to the standard Gemini Live API features, this model includes:

Enhanced audio quality: Experience dramatically improved audio quality that feels like speaking with a person.
Enhanced voice quality and adaptability: Gemini Live API native audio provides richer, more natural voice interactions with 30 HD voices in 24 languages .
Introducing Proactive Audio : (Preview) When Proactive Audio is enabled, the model only responds when it's relevant. The model generates text transcripts and audio responses proactively only for queries directed to the device, and does not respond to non-device directed queries.
Introducing Affective Dialog: Models using Gemini Live API native audio can understand and respond appropriately to users' emotional expressions for more nuanced conversations.
Improved barge-in: Interrupt Gemini more naturally and reliably, even in loud and noisy environments.
Robust function calling: We've improved the triggering rate, allowing Gemini to successfully execute the functions you define to support your use cases.
Accurate transcription: The accuracy of audio-to-text transcription has been significantly enhanced.
Seamless multilingual support: Speak to Gemini in multiple languages, and it will effortlessly switch between them without any pre-configuration. Language is no longer a barrier.

For more information on Gemini Live API, see:

Our standalone Gemini Live API documentation .
Our Gemini Live API supported audio formats .
Our Gemini Live API concurrent session limits .

Live 2.5 Flash Native Audio

Try in Vertex AI

Model ID

gemini-live-2.5-flash-native-audio

Supported inputs & outputs

Inputs:
Text , Images , Audio , Video
Outputs:
Text , Audio

Token limits

Maximum input tokens: 32K (default), upgradable to 128K
Maximum output tokens: 64K

Capabilities

Supported

Not supported

Usage types

Supported

Up to 1000 concurrent sessions

Not supported

Technical specifications

Images

Maximum images per prompt: 3,000
Maximum file size per file for inline data or direct uploads through the console: 7 MB
Maximum file size per file from Google Cloud Storage: 30 MB
Supported MIME types:
image/png , image/jpeg , image/webp , image/heic , image/heif

Video

Standard resolution: 768 x 768
Supported MIME types:
video/x-flv , video/quicktime , video/mpeg , video/mpegs , video/mpg , video/mp4 , video/webm , video/wmv , video/3gpp

Audio

Maximum conversation length: Default 10 minutes that can be extended.
Required audio input format: Raw 16-bit PCM audio at 16kHz, little-endian
Required audio output format: Raw 16-bit PCM audio at 24kHz, little-endian
Supported MIME types:
audio/x-aac , audio/flac , audio/mp3 , audio/m4a , audio/mpeg , audio/mpga , audio/mp4 , audio/ogg , audio/pcm , audio/wav , audio/webm

Parameter defaults

Start of speech sensitivity: Low
End of speech sensitivity: High
Prefix padding: 0
Max context size: 128K

Supported regions

Model availability

United States

us-central1
us-east1
us-east4
us-east5
us-south1
us-west1
us-west4

Europe

europe-central2
europe-north1
europe-southwest1
europe-west1
europe-west4
europe-west8

See Deployments and endpoints for more information.

Versions

gemini-live-2.5-flash-native-audio

Launch stage: GA
Release date: December 12, 2025
Discontinuation date: December 13, 2026

Security controls

Online prediction

Data residency (at rest) Supported
Customer-managed encryption keys (CMEK) Not supported
VPC Service Controls Supported
Access Transparency (AXT) Supported

See Security controls for more information.

Supported languages

See Supported languages .

Pricing

See Pricing .

Live 2.5 Flash Native Audio Preview

Try in Vertex AI

Model ID

gemini-live-2.5-flash-preview-native-audio-09-2025

Supported inputs & outputs

Inputs:
Text , Images , Audio , Video
Outputs:
Text , Audio

Token limits

Maximum input tokens: 128K
Maximum output tokens: 64K
Context window: 32K (default), upgradable to 128K

Capabilities

Supported

Not supported

Usage types

Supported

Not supported

Technical specifications

Images

Maximum images per prompt: 3,000
Maximum file size per file for inline data or direct uploads through the console: 7 MB
Maximum file size per file from Google Cloud Storage: 30 MB
Supported MIME types:
image/png , image/jpeg , image/webp , image/heic , image/heif

Video

Standard resolution: 768 x 768
Supported MIME types:
video/x-flv , video/quicktime , video/mpeg , video/mpegs , video/mpg , video/mp4 , video/webm , video/wmv , video/3gpp

Audio

Maximum conversation length: Default 10 minutes that can be extended.
Required audio input format: Raw 16-bit PCM audio at 16kHz, little-endian
Required audio output format: Raw 16-bit PCM audio at 24kHz, little-endian
Supported MIME types:
audio/x-aac , audio/flac , audio/mp3 , audio/m4a , audio/mpeg , audio/mpga , audio/mp4 , audio/ogg , audio/pcm , audio/wav , audio/webm

Parameter defaults

Start of speech sensitivity: Low
End of speech sensitivity: High
Prefix padding: 0
Max context size: 128K

Supported regions

Model availability

United States

us-central1

See Deployments and endpoints for more information.

Knowledge cutoff date

August 2025

Versions

gemini-live-2.5-flash-preview-native-audio-09-2025

Launch stage: Public preview
Release date: September 25, 2025

Security controls

See Security controls for more information.

Supported languages

See Supported languages .

Pricing

See Pricing .

Gemini 2.5 Flash with Gemini Live API Stay organized with collections Save and categorize content based on your preferences.

Live 2.5 Flash Native Audio

Live 2.5 Flash Native Audio Preview

Gemini 2.5 Flash with Gemini Live API