Skip to main content
The latest Gemini models, like Gemini 3.1 Flash Image
( Nano Banana 2
), are available to use with Firebase AI Logic! Learn more.
Gemini 2.0 Flash and Flash-Lite models will shut down on June 1, 2026
. To avoid service disruption, update to a newer model like gemini-3.1-flash-lite
. Learn more.
All Imagen models will shut down on June 24, 2026
. Learn about migrating your apps to use Nano Banana.
Send feedback
Configuration options for the Live API Stay organized with collections
Save and categorize content based on your preferences.
Preview
: Using the Firebase AI Logic
SDKs with the Gemini Live API
is a feature
that's in Preview, which means that it isn't subject to any SLA or deprecation
policy and could change in backwards-incompatible ways.
Even with the basic implementation for the Live API
, you can build engaging
and powerful interactions for your users.
You can optionally customize the experience even more by using the following
configuration options:
Note: Firebase AI Logic
does not yet
support the following when using the Live API
(but they're coming soon!): Adding a thinking configuration,
setting the input media resolution, or
configuring and disabling VAD.
Response voice and language
You can make the model respond in a specific voice
and
influence the model to respond in different languages
.
Specify a response voice
Click your Gemini API
provider to view provider-specific content
and code on this page.
Gemini Developer API
Vertex AI Gemini API
The Live API
uses Chirp 3 to support synthesized speech responses in
HD voices.
If you don't specify a response voice, the default is Puck
.
View list of response voice options
For demos of what each voice sounds like, see Chirp 3: HD voices
.
Zephyr
-- Bright
Kore
-- Firm
Orus
-- Firm
Autonoe
-- Bright
Umbriel
-- Easy-going
Erinome
-- Clear
Laomedeia
-- Upbeat
Schedar
-- Even
Achird
-- Friendly
Sadachbia
-- Lively
Puck
-- Upbeat
Fenrir
-- Excitable
Aoede
-- Breezy
Enceladus
-- Breathy
Algieba
-- Smooth
Algenib
-- Gravelly
Achernar
-- Soft
Gacrux
-- Mature
Zubenelgenubi
-- Casual
Sadaltager
-- Knowledgeable
Charon
-- Informative
Leda
-- Youthful
Callirrhoe
-- Easy-going
Iapetus
-- Clear
Despina
-- Smooth
Rasalgethi
-- Informative
Alnilam
-- Firm
Pulcherrima
-- Forward
Vindemiatrix
-- Gentle
Sulafat
-- Warm
To specify a response voice, set the voice name within the speechConfig
object
as part of the model configuration
.
Swift
// ...
let
liveModel
=
FirebaseAI
.
firebaseAI
(
backend
:
.
googleAI
()).
liveModel
(
modelName
:
"gemini-2.5-flash-native-audio-preview-12-2025"
,
// Configure the model to use a specific voice for its audio response
generationConfig
:
LiveGenerationConfig
(
responseModalities
:
[.
audio
],
speech
:
SpeechConfig
(
voiceName
:
" VOICE_NAME
"
)
)
)
// ...
Kotlin
// ...
val
model
=
Firebase
.
ai
(
backend
=
GenerativeBackend
.
googleAI
()).
liveModel
(
modelName
=
"gemini-2.5-flash-native-audio-preview-12-2025"
,
// Configure the model to use a specific voice for its audio response
generationConfig
=
liveGenerationConfig
{
responseModality
=
ResponseModality
.
AUDIO
speechConfig
=
SpeechConfig
(
voice
=
Voice
(
" VOICE_NAME
"
))
}
)
// ...
Java
// ...
LiveGenerativeModel
lm
=
FirebaseAI
.
getInstance
(
GenerativeBackend
.
googleAI
()).
liveModel
(
"gemini-2.5-flash-native-audio-preview-12-2025"
,
// Configure the model to use a specific voice for its audio response
new
LiveGenerationConfig
.
Builder
()
.
setResponseModality
(
ResponseModality
.
AUDIO
)
.
setSpeechConfig
(
new
SpeechConfig
(
new
Voice
(
" VOICE_NAME
"
)))
.
build
()
);
// ...
Web
// ...
const
ai
=
getAI
(
firebaseApp
,
{
backend
:
new
GoogleAIBackend
()
});
const
liveModel
=
getLiveGenerativeModel
(
ai
,
{
model
:
"gemini-2.5-flash-native-audio-preview-12-2025"
,
// Configure the model to use a specific voice for its audio response
generationConfig
:
{
responseModalities
:
[
ResponseModality
.
AUDIO
],
speechConfig
:
{
voiceConfig
:
{
prebuiltVoiceConfig
:
{
voiceName
:
" VOICE_NAME
"
},
},
},
},
});
// ...
Dart
// ...
final
_liveModel
=
FirebaseAI
.
googleAI
().
liveGenerativeModel
(
model:
'gemini-2.5-flash-native-audio-preview-12-2025'
,
// Configure the model to use a specific voice for its audio response
liveGenerationConfig:
LiveGenerationConfig
(
responseModalities:
[
ResponseModalities
.
audio
],
speechConfig:
SpeechConfig
(
voiceName:
' VOICE_NAME
'
),
),
);
// ...
Unity
// ...
var
liveModel
=
FirebaseAI
.
GetInstance
(
FirebaseAI
.
Backend
.
GoogleAI
()).
GetLiveModel
(
modelName
:
"gemini-2.5-flash-native-audio-preview-12-2025"
,
// Configure the model to use a specific voice for its audio response
liveGenerationConfig
:
new
LiveGenerationConfig
(
responseModalities
:
new
[]
{
ResponseModality
.
Audio
},
speechConfig
:
SpeechConfig
.
UsePrebuiltVoice
(
" VOICE_NAME
"
)
)
);
// ...
Influence the response language
Important: The Live API
models do not
support explicitly setting a
response language
in the configuration. Instead, these models automatically
choose the appropriate language for their responses.
The Live API
models automatically choose the appropriate language for their
responses.
View list of supported languages
If you want the model to respond in a non-English language or always in a
specific language, you can use influence the model's responses by using system instructions
like these examples:
Transcriptions for audio input and output
Click your Gemini API
provider to view provider-specific content
and code on this page.
Gemini Developer API
Vertex AI Gemini API
As part of the model's response, you can receive transcriptions of the
audio input and the model's audio response. You set this configuration as part
of the model configuration
.
For transcription of the audio input, add inputAudioTranscription
.
For transcription of the model's audio response, add outputAudioTranscription
.
Note the following:
You can configure the model to return transcriptions of both input and output
(as shown in the following example), or you can configure it to return only
one or the other.
The transcripts are streamed along with the audio, so it's best to collect
them like you do text parts with each turn.
The transcription language is inferred from the audio input and the model's
audio response.
Swift
// ...
let
liveModel
=
FirebaseAI
.
firebaseAI
(
backend
:
.
googleAI
()).
liveModel
(
modelName
:
"gemini-2.5-flash-native-audio-preview-12-2025"
,
// Configure the model to return transcriptions of the audio input and output
generationConfig
:
LiveGenerationConfig
(
responseModalities
:
[.
audio
],
inputAudioTranscription
:
AudioTranscriptionConfig
(),
outputAudioTranscription
:
AudioTranscriptionConfig
()
)
)
var
inputTranscript
:
String
=
""
var
outputTranscript
:
String
=
""
do
{
let
session
=
try
await
liveModel
.
connect
()
for
try
await
response
in
session
.
responses
{
if
case
let
.
content
(
content
)
=
response
.
payload
{
if
let
inputText
=
content
.
inputAudioTranscription
?.
text
{
// Handle transcription text of the audio input
inputTranscript
+=
inputText
}
if
let
outputText
=
content
.
outputAudioTranscription
?.
text
{
// Handle transcription text of the audio output
outputTranscript
+=
outputText
}
if
content
.
isTurnComplete
{
// Log the transcripts after the current turn is complete
print
(
"Input audio:
\(
inputTranscript
)
"
)
print
(
"Output audio:
\(
outputTranscript
)
"
)
// Reset the transcripts for the next turn
inputTranscript
=
""
outputTranscript
=
""
}
}
}
}
catch
{
// Handle error
}
// ...
Kotlin
// ...
val
liveModel
=
Firebase
.
ai
(
backend
=
GenerativeBackend
.
googleAI
()).
liveModel
(
modelName
=
"gemini-2.5-flash-native-audio-preview-12-2025"
,
// Configure the model to return transcriptions of the audio input and output
generationConfig
=
liveGenerationConfig
{
responseModality
=
ResponseModality
.
AUDIO
inputAudioTranscription
=
AudioTranscriptionConfig
()
outputAudioTranscription
=
AudioTranscriptionConfig
()
}
)
val
liveSession
=
liveModel
.
connect
()
fun
handleTranscription
(
input
:
Transcription?,
output
:
Transcription?)
{
input
?.
text
?.
let
{
text
->
// Handle transcription text of the audio input
println
(
"Input Transcription:
$
text
"
)
}
output
?.
text
?.
let
{
text
->
// Handle transcription text of the audio output
println
(
"Output Transcription:
$
text
"
)
}
}
liveSession
.
startAudioConversation
(
null
,
::
handleTranscription
)
// ...
Java
// ...
ExecutorService
executor
=
Executors
.
newFixedThreadPool
(
1
);
LiveGenerativeModel
lm
=
FirebaseAI
.
getInstance
(
GenerativeBackend
.
googleAI
()).
liveModel
(
"gemini-2.5-flash-native-audio-preview-12-2025"
,
// Configure the model to return transcriptions of the audio input and output
new
LiveGenerationConfig
.
Builder
()
.
setResponseModality
(
ResponseModality
.
AUDIO
)
.
setInputAudioTranscription
(
new
AudioTranscriptionConfig
())
.
setOutputAudioTranscription
(
new
AudioTranscriptionConfig
())
.
build
()
);
LiveModelFutures
liveModel
=
LiveModelFutures
.
from
(
lm
);
ListenableFuture
sessionFuture
=
liveModel
.
connect
();
Futures
.
addCallback
(
sessionFuture
,
new
FutureCallback
()
{
@Override
public
void
onSuccess
(
LiveSessionFutures
ses
)
{
LiveSessionFutures
session
=
ses
;
session
.
startAudioConversation
((
Transcription
input
,
Transcription
output
)
->
{
if
(
input
!=
null
)
{
// Handle transcription text of the audio input
System
.
out
.
println
(
"Input Transcription: "
+
input
.
getText
());
}
if
(
output
!=
null
)
{
// Handle transcription text of the audio output
System
.
out
.
println
(
"Output Transcription: "
+
output
.
getText
());
}
return
null
;
});
}
@Override
public
void
onFailure
(
Throwable
t
)
{
// Handle exceptions
t
.
printStackTrace
();
}
},
executor
);
// ...
Web
// ...
const
ai
=
getAI
(
firebaseApp
,
{
backend
:
new
GoogleAIBackend
()
});
const
liveModel
=
getLiveGenerativeModel
(
ai
,
{
model
:
'gemini-2.5-flash-native-audio-preview-12-2025'
,
// Configure the model to return transcriptions of the audio input and output
generationConfig
:
{
responseModalities
:
[
ResponseModality
.
AUDIO
],
inputAudioTranscription
:
{},
outputAudioTranscription
:
{},
},
});
const
liveSession
=
await
liveModel
.
connect
();
liveSession
.
sendAudioRealtime
({
data
,
mimeType
:
"audio/pcm"
});
const
messages
=
liveSession
.
receive
();
for
await
(
const
message
of
messages
)
{
switch
(
message
.
type
)
{
case
'serverContent'
:
if
(
message
.
inputTranscription
)
{
// Handle transcription text of the audio input
console
.
log
(
`Input transcription:
${
message
.
inputTranscription
.
text
}
`
);
}
if
(
message
.
outputTranscription
)
{
// Handle transcription text of the audio output
console
.
log
(
`Output transcription:
${
message
.
outputTranscription
.
text
}
`
);
}
else
{
// Handle other message types (modelTurn, turnComplete, interruption)
}
default
:
// Handle other message types (toolCall, toolCallCancellation)
}
}
// ...
Dart
// ...
final
_liveModel
=
FirebaseAI
.
googleAI
().
liveGenerativeModel
(
model:
'gemini-2.5-flash-native-audio-preview-12-2025'
,
// Configure the model to return transcriptions of the audio input and output
liveGenerationConfig:
LiveGenerationConfig
(
responseModalities:
[
ResponseModalities
.
audio
],
inputAudioTranscription:
AudioTranscriptionConfig
(),
outputAudioTranscription:
AudioTranscriptionConfig
(),
),
);
final
LiveSession
_session
=
_liveModel
.
connect
();
await
for
(
final
response
in
_session
.
receive
())
{
LiveServerContent
message
=
response
.
message
;
if
(
message
.
inputTranscription
?
.
text
case
final
inputText
?
)
{
// Handle transcription text of the audio input
print
(
'Input:
$
inputText
'
);
}
if
(
message
.
outputTranscription
?
.
text
case
final
outputText
?
)
{
// Handle transcription text of the audio output
print
(
'Output:
$
outputText
'
);
}
}
// ...
Unity
// ...
var
liveModel
=
FirebaseAI
.
GetInstance
(
FirebaseAI
.
Backend
.
GoogleAI
()).
GetLiveModel
(
modelName
:
"gemini-2.5-flash-native-audio-preview-12-2025"
,
// Configure the model to return transcriptions of the audio input and output
liveGenerationConfig
:
new
LiveGenerationConfig
(
responseModalities
:
new
[]
{
ResponseModality
.
Audio
},
inputAudioTranscription
:
new
AudioTranscriptionConfig
(),
outputAudioTranscription
:
new
AudioTranscriptionConfig
()
)
);
try
{
var
session
=
await
liveModel
.
ConnectAsync
();
var
stream
=
session
.
ReceiveAsync
();
await
foreach
(
var
response
in
stream
)
{
if
(
response
.
Message
is
LiveSessionContent
sessionContent
)
{
if
(
!
string
.
IsNullOrEmpty
(
sessionContent
.
InputTranscription
?.
Text
))
{
// handle transcription text of input audio
}
if
(
!
string
.
IsNullOrEmpty
(
sessionContent
.
OutputTranscription
?.
Text
))
{
// handle transcription text of output audio
}
}
}
}
catch
(
Exception
e
)
{
// Handle error
}
// ...
Voice activity detection (VAD)
The model automatically performs voice activity detection (VAD) on a continuous
audio input stream. VAD is enabled by default.
Note: Firebase AI Logic
does not yet
support disabling VAD or configuring
VAD parameters. Check back soon!
Session management
Learn about the following sessions-related topics:
Advanced capabilities, including:
Session-related limits
,
including connection and session length limits,
session context window limits, and
rate limits.
Options for handling session limits, including:
Note: Firebase AI Logic
does not yet
support handling interruptions.
Check back soon!
Send feedback
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
, and code samples are licensed under the Apache 2.0 License
. For details, see the Google Developers Site Policies
. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-05-27 UTC.
Need to tell us more?
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-05-27 UTC."],[],[]]