Gemini 3 Pro & Flash, Gemini 3 Pro Image (nano banana pro), and the latest Gemini Live API native audio models are now available to use with Firebase AI Logic on all platforms!
Configuration options for the Live APIStay organized with collectionsSave and categorize content based on your preferences.
Even with the basic implementation for theLive API, you can build engaging
and powerful interactions for your users.
You can optionally customize the experience even more by using the following
configuration options:
To specify a response voice, set the voice name within thespeechConfigobject
as part of themodel configuration.
Swift
// ...letliveModel=FirebaseAI.firebaseAI(backend:.googleAI()).liveModel(modelName:"gemini-2.5-flash-native-audio-preview-12-2025",// Configure the model to use a specific voice for its audio responsegenerationConfig:LiveGenerationConfig(responseModalities:[.audio],speech:SpeechConfig(voiceName:"VOICE_NAME")))// ...
Kotlin
// ...valmodel=Firebase.ai(backend=GenerativeBackend.googleAI()).liveModel(modelName="gemini-2.5-flash-native-audio-preview-12-2025",// Configure the model to use a specific voice for its audio responsegenerationConfig=liveGenerationConfig{responseModality=ResponseModality.AUDIOspeechConfig=SpeechConfig(voice=Voice("VOICE_NAME"))})// ...
Java
// ...LiveGenerativeModellm=FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel("gemini-2.5-flash-native-audio-preview-12-2025",// Configure the model to use a specific voice for its audio responsenewLiveGenerationConfig.Builder().setResponseModality(ResponseModality.AUDIO).setSpeechConfig(newSpeechConfig(newVoice("VOICE_NAME"))).build());// ...
Web
// ...constai=getAI(firebaseApp,{backend:newGoogleAIBackend()});constliveModel=getLiveGenerativeModel(ai,{model:"gemini-2.5-flash-native-audio-preview-12-2025",// Configure the model to use a specific voice for its audio responsegenerationConfig:{responseModalities:[ResponseModality.AUDIO],speechConfig:{voiceConfig:{prebuiltVoiceConfig:{voiceName:"VOICE_NAME"},},},},});// ...
Dart
// ...final_liveModel=FirebaseAI.googleAI().liveGenerativeModel(model:'gemini-2.5-flash-native-audio-preview-12-2025',// Configure the model to use a specific voice for its audio responseliveGenerationConfig:LiveGenerationConfig(responseModalities:[ResponseModalities.audio],speechConfig:SpeechConfig(voiceName:'VOICE_NAME'),),);// ...
Unity
// ...varliveModel=FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(modelName:"gemini-2.5-flash-native-audio-preview-12-2025",// Configure the model to use a specific voice for its audio responseliveGenerationConfig:newLiveGenerationConfig(responseModalities:new[]{ResponseModality.Audio},speechConfig:SpeechConfig.UsePrebuiltVoice("VOICE_NAME")));// ...
Influence the response language
TheLive APImodels automatically choose the appropriate language for their
responses.
View list of supported languages
Language
BCP-47 Code
Language
BCP-47 Code
Arabic (Egyptian)
ar-EG
German (Germany)
de-DE
English (US)
en-US
Spanish (US)
es-US
French (France)
fr-FR
Hindi (India)
hi-IN
Indonesian (Indonesia)
id-ID
Italian (Italy)
it-IT
Japanese (Japan)
ja-JP
Korean (Korea)
ko-KR
Portuguese (Brazil)
pt-BR
Russian (Russia)
ru-RU
Dutch (Netherlands)
nl-NL
Polish (Poland)
pl-PL
Thai (Thailand)
th-TH
Turkish (Turkey)
tr-TR
Vietnamese (Vietnam)
vi-VN
Romanian (Romania)
ro-RO
Ukrainian (Ukraine)
uk-UA
Bengali (Bangladesh)
bn-BD
English (India)
en-IN & hi-IN bundle
Marathi (India)
mr-IN
Tamil (India)
ta-IN
Telugu (India)
te-IN
If you want the model to respond in a non-English language or always in a
specific language, you can use influence the model's responses by usingsystem instructionslike these examples:
Reinforce to the model that a non-English language may be appropriate
Listen to the speaker carefully. If you detect a non-English language, respond
in the language you hear from the speaker. You must respond unmistakably in the
speaker's language.
Tell the model to always respond in a specific language
RESPOND INLANGUAGE. YOU MUST RESPOND UNMISTAKABLY INLANGUAGE.
Transcriptions for audio input and output
Click yourGemini APIprovider to view provider-specific content
and code on this page.
As part of the model's response, you can receive transcriptions of the
audio input and the model's audio response. You set this configuration as part
of themodel configuration.
For transcription of the audio input, addinputAudioTranscription.
For transcription of the model's audio response, addoutputAudioTranscription.
Note the following:
You can configure the model to return transcriptions of both input and output
(as shown in the following example), or you can configure it to return only
one or the other.
The transcripts are streamed along with the audio, so it's best to collect
them like you do text parts with each turn.
The transcription language is inferred from the audio input and the model's
audio response.
Swift
// ...letliveModel=FirebaseAI.firebaseAI(backend:.googleAI()).liveModel(modelName:"gemini-2.5-flash-native-audio-preview-12-2025",// Configure the model to return transcriptions of the audio input and outputgenerationConfig:LiveGenerationConfig(responseModalities:[.audio],inputAudioTranscription:AudioTranscriptionConfig(),outputAudioTranscription:AudioTranscriptionConfig()))varinputTranscript:String=""varoutputTranscript:String=""do{letsession=tryawaitliveModel.connect()fortryawaitresponseinsession.responses{ifcaselet.content(content)=response.payload{ifletinputText=content.inputAudioTranscription?.text{// Handle transcription text of the audio inputinputTranscript+=inputText}ifletoutputText=content.outputAudioTranscription?.text{// Handle transcription text of the audio outputoutputTranscript+=outputText}ifcontent.isTurnComplete{// Log the transcripts after the current turn is completeprint("Input audio:\(inputTranscript)")print("Output audio:\(outputTranscript)")// Reset the transcripts for the next turninputTranscript=""outputTranscript=""}}}}catch{// Handle error}// ...
Kotlin
// ...valliveModel=Firebase.ai(backend=GenerativeBackend.googleAI()).liveModel(modelName="gemini-2.5-flash-native-audio-preview-12-2025",// Configure the model to return transcriptions of the audio input and outputgenerationConfig=liveGenerationConfig{responseModality=ResponseModality.AUDIOinputAudioTranscription=AudioTranscriptionConfig()outputAudioTranscription=AudioTranscriptionConfig()})valliveSession=liveModel.connect()funhandleTranscription(input:Transcription?,output:Transcription?){input?.text?.let{text->// Handle transcription text of the audio inputprintln("Input Transcription:$text")}output?.text?.let{text->// Handle transcription text of the audio outputprintln("Output Transcription:$text")}}liveSession.startAudioConversation(null,::handleTranscription)// ...
Java
// ...ExecutorServiceexecutor=Executors.newFixedThreadPool(1);LiveGenerativeModellm=FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel("gemini-2.5-flash-native-audio-preview-12-2025",// Configure the model to return transcriptions of the audio input and outputnewLiveGenerationConfig.Builder().setResponseModality(ResponseModality.AUDIO).setInputAudioTranscription(newAudioTranscriptionConfig()).setOutputAudioTranscription(newAudioTranscriptionConfig()).build());LiveModelFuturesliveModel=LiveModelFutures.from(lm);ListenableFuturesessionFuture=liveModel.connect();Futures.addCallback(sessionFuture,newFutureCallback(){@OverridepublicvoidonSuccess(LiveSessionFuturesses){LiveSessionFuturessession=ses;session.startAudioConversation((Transcriptioninput,Transcriptionoutput)->{if(input!=null){// Handle transcription text of the audio inputSystem.out.println("Input Transcription: "+input.getText());}if(output!=null){// Handle transcription text of the audio outputSystem.out.println("Output Transcription: "+output.getText());}returnnull;});}@OverridepublicvoidonFailure(Throwablet){// Handle exceptionst.printStackTrace();}},executor);// ...
Web
// ...constai=getAI(firebaseApp,{backend:newGoogleAIBackend()});constliveModel=getLiveGenerativeModel(ai,{model:'gemini-2.5-flash-native-audio-preview-12-2025',// Configure the model to return transcriptions of the audio input and outputgenerationConfig:{responseModalities:[ResponseModality.AUDIO],inputAudioTranscription:{},outputAudioTranscription:{},},});constliveSession=awaitliveModel.connect();liveSession.sendAudioRealtime({data,mimeType:"audio/pcm"});constmessages=liveSession.receive();forawait(constmessageofmessages){switch(message.type){case'serverContent':if(message.inputTranscription){// Handle transcription text of the audio inputconsole.log(`Input transcription:${message.inputTranscription.text}`);}if(message.outputTranscription){// Handle transcription text of the audio outputconsole.log(`Output transcription:${message.outputTranscription.text}`);}else{// Handle other message types (modelTurn, turnComplete, interruption)}default:// Handle other message types (toolCall, toolCallCancellation)}}// ...
Dart
// ...final_liveModel=FirebaseAI.googleAI().liveGenerativeModel(model:'gemini-2.5-flash-native-audio-preview-12-2025',// Configure the model to return transcriptions of the audio input and outputliveGenerationConfig:LiveGenerationConfig(responseModalities:[ResponseModalities.audio],inputAudioTranscription:AudioTranscriptionConfig(),outputAudioTranscription:AudioTranscriptionConfig(),),);finalLiveSession_session=_liveModel.connect();awaitfor(finalresponsein_session.receive()){LiveServerContentmessage=response.message;if(message.inputTranscription?.textcasefinalinputText?){// Handle transcription text of the audio inputprint('Input:$inputText');}if(message.outputTranscription?.textcasefinaloutputText?){// Handle transcription text of the audio outputprint('Output:$outputText');}}// ...
Unity
// ...varliveModel=FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(modelName:"gemini-2.5-flash-native-audio-preview-12-2025",// Configure the model to return transcriptions of the audio input and outputliveGenerationConfig:newLiveGenerationConfig(responseModalities:new[]{ResponseModality.Audio},inputAudioTranscription:newAudioTranscriptionConfig(),outputAudioTranscription:newAudioTranscriptionConfig()));try{varsession=awaitliveModel.ConnectAsync();varstream=session.ReceiveAsync();awaitforeach(varresponseinstream){if(response.MessageisLiveSessionContentsessionContent){if(!string.IsNullOrEmpty(sessionContent.InputTranscription?.Text)){// handle transcription text of input audio}if(!string.IsNullOrEmpty(sessionContent.OutputTranscription?.Text)){// handle transcription text of output audio}}}}catch(Exceptione){// Handle error}// ...
Voice activity detection (VAD)
The model automatically performs voice activity detection (VAD) on a continuous
audio input stream. VAD is enabled by default.
Session management
Learn about the following sessions-related topics:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-12-12 UTC."],[],[]]