Stay organized with collectionsSave and categorize content based on your preferences.
You can ask aGeminimodel to analyze image files that you provide
either inline (base64-encoded) or via URL. When you useFirebase AI Logic,
you can make this request directly from your app.
With this capability, you can do things like:
Create captions or answer questions about images
Write a short story or a poem about an image
Detect objects in an image and return bounding box coordinates for them
Label or categorize a set of images for sentiment, style, or other
characteristic
Click yourGemini APIprovider to view provider-specific content
and code on this page.
If you haven't already, complete thegetting started guide, which describes how to
set up your Firebase project, connect your app to Firebase, add the SDK,
initialize the backend service for your chosenGemini APIprovider, and
create aGenerativeModelinstance.
For testing and iterating on your prompts and even
getting a generated code snippet, we recommend usingGoogle AI Studio.
Need a sample image file?
You can use this publicly available file with a MIME type ofimage/jpeg(view or download file).https://storage.googleapis.com/cloud-samples-data/generative-ai/image/scones.jpg
Generate text from image files (base64-encoded)
Before trying this sample, complete theBefore you beginsection of this guide
to set up your project and app. In that section, you'll also click a button for your chosenGemini APIprovider so that you see provider-specific content
on this page.
You can ask aGeminimodel to
generate text by prompting with text and images—providing each
input file'smimeTypeand the file itself. Findrequirements and recommendations for input fileslater on this page.
Swift
You can callgenerateContent()to generate text from multimodal input of text and images.
Single file input
importFirebaseAI// Initialize the Gemini Developer API backend serviceletai=FirebaseAI.firebaseAI(backend:.googleAI())// Create a `GenerativeModel` instance with a model that supports your use caseletmodel=ai.generativeModel(modelName:"gemini-2.5-flash")guardletimage=UIImage(systemName:"bicycle")else{fatalError()}// Provide a text prompt to include with the imageletprompt="What's in this picture?"// To generate text output, call generateContent and pass in the promptletresponse=tryawaitmodel.generateContent(image,prompt)print(response.text??"No text in response.")
Multiple file input
importFirebaseAI// Initialize the Gemini Developer API backend serviceletai=FirebaseAI.firebaseAI(backend:.googleAI())// Create a `GenerativeModel` instance with a model that supports your use caseletmodel=ai.generativeModel(modelName:"gemini-2.5-flash")guardletimage1=UIImage(systemName:"car")else{fatalError()}guardletimage2=UIImage(systemName:"car.2")else{fatalError()}// Provide a text prompt to include with the imagesletprompt="What's different between these pictures?"// To generate text output, call generateContent and pass in the promptletresponse=tryawaitmodel.generateContent(image1,image2,prompt)print(response.text??"No text in response.")
Kotlin
You can callgenerateContent()to generate text from multimodal input of text and images.
For Kotlin, the methods in this SDK are suspend functions and need to be called
from aCoroutine scope.
Single file input
// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use casevalmodel=Firebase.ai(backend=GenerativeBackend.googleAI()).generativeModel("gemini-2.5-flash")// Loads an image from the app/res/drawable/ directoryvalbitmap:Bitmap=BitmapFactory.decodeResource(resources,R.drawable.sparky)// Provide a prompt that includes the image specified above and textvalprompt=content{image(bitmap)text("What developer tool is this mascot from?")}// To generate text output, call generateContent with the promptvalresponse=generativeModel.generateContent(prompt)print(response.text)
Multiple file input
For Kotlin, the methods in this SDK are suspend functions and need to be called
from aCoroutine scope.
// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use casevalmodel=Firebase.ai(backend=GenerativeBackend.googleAI()).generativeModel("gemini-2.5-flash")// Loads an image from the app/res/drawable/ directoryvalbitmap1:Bitmap=BitmapFactory.decodeResource(resources,R.drawable.sparky)valbitmap2:Bitmap=BitmapFactory.decodeResource(resources,R.drawable.sparky_eats_pizza)// Provide a prompt that includes the images specified above and textvalprompt=content{image(bitmap1)image(bitmap2)text("What is different between these pictures?")}// To generate text output, call generateContent with the promptvalresponse=generativeModel.generateContent(prompt)print(response.text)
Java
You can callgenerateContent()to generate text from multimodal input of text and images.
// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use caseGenerativeModelai=FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel("gemini-2.5-flash");// Use the GenerativeModelFutures Java compatibility layer which offers// support for ListenableFuture and Publisher APIsGenerativeModelFuturesmodel=GenerativeModelFutures.from(ai);Bitmapbitmap=BitmapFactory.decodeResource(getResources(),R.drawable.sparky);// Provide a prompt that includes the image specified above and textContentcontent=newContent.Builder().addImage(bitmap).addText("What developer tool is this mascot from?").build();// To generate text output, call generateContent with the promptListenableFuture<GenerateContentResponse>response=model.generateContent(content);Futures.addCallback(response,newFutureCallback<GenerateContentResponse>(){@OverridepublicvoidonSuccess(GenerateContentResponseresult){StringresultText=result.getText();System.out.println(resultText);}@OverridepublicvoidonFailure(Throwablet){t.printStackTrace();}},executor);
Multiple file input
// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use caseGenerativeModelai=FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel("gemini-2.5-flash");// Use the GenerativeModelFutures Java compatibility layer which offers// support for ListenableFuture and Publisher APIsGenerativeModelFuturesmodel=GenerativeModelFutures.from(ai);Bitmapbitmap1=BitmapFactory.decodeResource(getResources(),R.drawable.sparky);Bitmapbitmap2=BitmapFactory.decodeResource(getResources(),R.drawable.sparky_eats_pizza);// Provide a prompt that includes the images specified above and textContentprompt=newContent.Builder().addImage(bitmap1).addImage(bitmap2).addText("What's different between these pictures?").build();// To generate text output, call generateContent with the promptListenableFuture<GenerateContentResponse>response=model.generateContent(prompt);Futures.addCallback(response,newFutureCallback<GenerateContentResponse>(){@OverridepublicvoidonSuccess(GenerateContentResponseresult){StringresultText=result.getText();System.out.println(resultText);}@OverridepublicvoidonFailure(Throwablet){t.printStackTrace();}},executor);
Web
You can callgenerateContent()to generate text from multimodal input of text and images.
Single file input
import{initializeApp}from"firebase/app";import{getAI,getGenerativeModel,GoogleAIBackend}from"firebase/ai";// TODO(developer) Replace the following with your app's Firebase configuration// See: https://firebase.google.com/docs/web/learn-more#config-objectconstfirebaseConfig={// ...};// Initialize FirebaseAppconstfirebaseApp=initializeApp(firebaseConfig);// Initialize the Gemini Developer API backend serviceconstai=getAI(firebaseApp,{backend:newGoogleAIBackend()});// Create a `GenerativeModel` instance with a model that supports your use caseconstmodel=getGenerativeModel(ai,{model:"gemini-2.5-flash"});// Converts a File object to a Part object.asyncfunctionfileToGenerativePart(file){constbase64EncodedDataPromise=newPromise((resolve)=>{constreader=newFileReader();reader.onloadend=()=>resolve(reader.result.split(',')[1]);reader.readAsDataURL(file);});return{inlineData:{data:awaitbase64EncodedDataPromise,mimeType:file.type},};}asyncfunctionrun(){// Provide a text prompt to include with the imageconstprompt="What do you see?";constfileInputEl=document.querySelector("input[type=file]");constimagePart=awaitfileToGenerativePart(fileInputEl.files[0]);// To generate text output, call generateContent with the text and imageconstresult=awaitmodel.generateContent([prompt,imagePart]);constresponse=result.response;consttext=response.text();console.log(text);}run();
Multiple file input
import{initializeApp}from"firebase/app";import{getAI,getGenerativeModel,GoogleAIBackend}from"firebase/ai";// TODO(developer) Replace the following with your app's Firebase configuration// See: https://firebase.google.com/docs/web/learn-more#config-objectconstfirebaseConfig={// ...};// Initialize FirebaseAppconstfirebaseApp=initializeApp(firebaseConfig);// Initialize the Gemini Developer API backend serviceconstai=getAI(firebaseApp,{backend:newGoogleAIBackend()});// Create a `GenerativeModel` instance with a model that supports your use caseconstmodel=getGenerativeModel(ai,{model:"gemini-2.5-flash"});// Converts a File object to a Part object.asyncfunctionfileToGenerativePart(file){constbase64EncodedDataPromise=newPromise((resolve)=>{constreader=newFileReader();reader.onloadend=()=>resolve(reader.result.split(',')[1]);reader.readAsDataURL(file);});return{inlineData:{data:awaitbase64EncodedDataPromise,mimeType:file.type},};}asyncfunctionrun(){// Provide a text prompt to include with the imagesconstprompt="What's different between these pictures?";// Prepare images for inputconstfileInputEl=document.querySelector("input[type=file]");constimageParts=awaitPromise.all([...fileInputEl.files].map(fileToGenerativePart));// To generate text output, call generateContent with the text and imagesconstresult=awaitmodel.generateContent([prompt,...imageParts]);constresponse=result.response;consttext=response.text();console.log(text);}run();
Dart
You can callgenerateContent()to generate text from multimodal input of text and images.
Single file input
import'package:firebase_ai/firebase_ai.dart';import'package:firebase_core/firebase_core.dart';import'firebase_options.dart';// Initialize FirebaseAppawaitFirebase.initializeApp(options:DefaultFirebaseOptions.currentPlatform,);// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use casefinalmodel=FirebaseAI.googleAI().generativeModel(model:'gemini-2.5-flash');// Provide a text prompt to include with the imagefinalprompt=TextPart("What's in the picture?");// Prepare images for inputfinalimage=awaitFile('image0.jpg').readAsBytes();finalimagePart=InlineDataPart('image/jpeg',image);// To generate text output, call generateContent with the text and imagefinalresponse=awaitmodel.generateContent([Content.multi([prompt,imagePart])]);print(response.text);
Multiple file input
import'package:firebase_ai/firebase_ai.dart';import'package:firebase_core/firebase_core.dart';import'firebase_options.dart';// Initialize FirebaseAppawaitFirebase.initializeApp(options:DefaultFirebaseOptions.currentPlatform,);// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use casefinalmodel=FirebaseAI.googleAI().generativeModel(model:'gemini-2.5-flash');final(firstImage,secondImage)=await(File('image0.jpg').readAsBytes(),File('image1.jpg').readAsBytes()).wait;// Provide a text prompt to include with the imagesfinalprompt=TextPart("What's different between these pictures?");// Prepare images for inputfinalimageParts=[InlineDataPart('image/jpeg',firstImage),InlineDataPart('image/jpeg',secondImage),];// To generate text output, call generateContent with the text and imagesfinalresponse=awaitmodel.generateContent([Content.multi([prompt,...imageParts])]);print(response.text);
Unity
You can callGenerateContentAsync()to generate text from multimodal input of text and images.
Single file input
usingFirebase;usingFirebase.AI;// Initialize the Gemini Developer API backend servicevarai=FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());// Create a `GenerativeModel` instance with a model that supports your use casevarmodel=ai.GetGenerativeModel(modelName:"gemini-2.5-flash");// Convert a Texture2D into InlineDataPartsvargrayImage=ModelContent.InlineData("image/png",UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.grayTexture));// Provide a text prompt to include with the imagevarprompt=ModelContent.Text("What's in this picture?");// To generate text output, call GenerateContentAsync and pass in the promptvarresponse=awaitmodel.GenerateContentAsync(new[]{grayImage,prompt});UnityEngine.Debug.Log(response.Text??"No text in response.");
Multiple file input
usingFirebase;usingFirebase.AI;// Initialize the Gemini Developer API backend servicevarai=FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());// Create a `GenerativeModel` instance with a model that supports your use casevarmodel=ai.GetGenerativeModel(modelName:"gemini-2.5-flash");// Convert Texture2Ds into InlineDataPartsvarblackImage=ModelContent.InlineData("image/png",UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.blackTexture));varwhiteImage=ModelContent.InlineData("image/png",UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.whiteTexture));// Provide a text prompt to include with the imagesvarprompt=ModelContent.Text("What's different between these pictures?");// To generate text output, call GenerateContentAsync and pass in the promptvarresponse=awaitmodel.GenerateContentAsync(new[]{blackImage,whiteImage,prompt});UnityEngine.Debug.Log(response.Text??"No text in response.");
Learn how to choose amodelappropriate for your use case and app.
Stream the response
Before trying this sample, complete theBefore you beginsection of this guide
to set up your project and app. In that section, you'll also click a button for your chosenGemini APIprovider so that you see provider-specific content
on this page.
You can achieve faster interactions by not waiting for the entire result from
the model generation, and instead use streaming to handle partial results.
To stream the response, callgenerateContentStream.
View example: Stream generated text from image files
Swift
You can callgenerateContentStream()to stream generated text from multimodal input of text and images.
Single file input
importFirebaseAI// Initialize the Gemini Developer API backend serviceletai=FirebaseAI.firebaseAI(backend:.googleAI())// Create a `GenerativeModel` instance with a model that supports your use caseletmodel=ai.generativeModel(modelName:"gemini-2.5-flash")guardletimage=UIImage(systemName:"bicycle")else{fatalError()}// Provide a text prompt to include with the imageletprompt="What's in this picture?"// To stream generated text output, call generateContentStream and pass in the promptletcontentStream=trymodel.generateContentStream(image,prompt)fortryawaitchunkincontentStream{iflettext=chunk.text{print(text)}}
Multiple file input
importFirebaseAI// Initialize the Gemini Developer API backend serviceletai=FirebaseAI.firebaseAI(backend:.googleAI())// Create a `GenerativeModel` instance with a model that supports your use caseletmodel=ai.generativeModel(modelName:"gemini-2.5-flash")guardletimage1=UIImage(systemName:"car")else{fatalError()}guardletimage2=UIImage(systemName:"car.2")else{fatalError()}// Provide a text prompt to include with the imagesletprompt="What's different between these pictures?"// To stream generated text output, call generateContentStream and pass in the promptletcontentStream=trymodel.generateContentStream(image1,image2,prompt)fortryawaitchunkincontentStream{iflettext=chunk.text{print(text)}}
Kotlin
You can callgenerateContentStream()to stream generated text from multimodal input of text and images.
For Kotlin, the methods in this SDK are suspend functions and need to be called
from aCoroutine scope.
Single file input
// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use casevalmodel=Firebase.ai(backend=GenerativeBackend.googleAI()).generativeModel("gemini-2.5-flash")// Loads an image from the app/res/drawable/ directoryvalbitmap:Bitmap=BitmapFactory.decodeResource(resources,R.drawable.sparky)// Provide a prompt that includes the image specified above and textvalprompt=content{image(bitmap)text("What developer tool is this mascot from?")}// To stream generated text output, call generateContentStream with the promptvarfullResponse=""generativeModel.generateContentStream(prompt).collect{chunk->print(chunk.text)fullResponse+=chunk.text}
Multiple file input
// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use casevalmodel=Firebase.ai(backend=GenerativeBackend.googleAI()).generativeModel("gemini-2.5-flash")// Loads an image from the app/res/drawable/ directoryvalbitmap1:Bitmap=BitmapFactory.decodeResource(resources,R.drawable.sparky)valbitmap2:Bitmap=BitmapFactory.decodeResource(resources,R.drawable.sparky_eats_pizza)// Provide a prompt that includes the images specified above and textvalprompt=content{image(bitmap1)image(bitmap2)text("What's different between these pictures?")}// To stream generated text output, call generateContentStream with the promptvarfullResponse=""generativeModel.generateContentStream(prompt).collect{chunk->print(chunk.text)fullResponse+=chunk.text}
Java
You can callgenerateContentStream()to stream generated text from multimodal input of text and images.
For Java, the streaming methods in this SDK return aPublishertype from theReactive Streams library.
Single file input
// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use caseGenerativeModelai=FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel("gemini-2.5-flash");// Use the GenerativeModelFutures Java compatibility layer which offers// support for ListenableFuture and Publisher APIsGenerativeModelFuturesmodel=GenerativeModelFutures.from(ai);Bitmapbitmap=BitmapFactory.decodeResource(getResources(),R.drawable.sparky);// Provide a prompt that includes the image specified above and textContentprompt=newContent.Builder().addImage(bitmap).addText("What developer tool is this mascot from?").build();// To stream generated text output, call generateContentStream with the promptPublisher<GenerateContentResponse>streamingResponse=model.generateContentStream(prompt);finalString[]fullResponse={""};streamingResponse.subscribe(newSubscriber<GenerateContentResponse>(){@OverridepublicvoidonNext(GenerateContentResponsegenerateContentResponse){Stringchunk=generateContentResponse.getText();fullResponse[0]+=chunk;}@OverridepublicvoidonComplete(){System.out.println(fullResponse[0]);}@OverridepublicvoidonError(Throwablet){t.printStackTrace();}@OverridepublicvoidonSubscribe(Subscriptions){}});
Multiple file input
// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use caseGenerativeModelai=FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel("gemini-2.5-flash");// Use the GenerativeModelFutures Java compatibility layer which offers// support for ListenableFuture and Publisher APIsGenerativeModelFuturesmodel=GenerativeModelFutures.from(ai);Bitmapbitmap1=BitmapFactory.decodeResource(getResources(),R.drawable.sparky);Bitmapbitmap2=BitmapFactory.decodeResource(getResources(),R.drawable.sparky_eats_pizza);// Provide a prompt that includes the images specified above and textContentprompt=newContent.Builder().addImage(bitmap1).addImage(bitmap2).addText("What's different between these pictures?").build();// To stream generated text output, call generateContentStream with the promptPublisher<GenerateContentResponse>streamingResponse=model.generateContentStream(prompt);finalString[]fullResponse={""};streamingResponse.subscribe(newSubscriber<GenerateContentResponse>(){@OverridepublicvoidonNext(GenerateContentResponsegenerateContentResponse){Stringchunk=generateContentResponse.getText();fullResponse[0]+=chunk;}@OverridepublicvoidonComplete(){System.out.println(fullResponse[0]);}@OverridepublicvoidonError(Throwablet){t.printStackTrace();}@OverridepublicvoidonSubscribe(Subscriptions){}});
Web
You can callgenerateContentStream()to stream generated text from multimodal input of text and images.
Single file input
import{initializeApp}from"firebase/app";import{getAI,getGenerativeModel,GoogleAIBackend}from"firebase/ai";// TODO(developer) Replace the following with your app's Firebase configuration// See: https://firebase.google.com/docs/web/learn-more#config-objectconstfirebaseConfig={// ...};// Initialize FirebaseAppconstfirebaseApp=initializeApp(firebaseConfig);// Initialize the Gemini Developer API backend serviceconstai=getAI(firebaseApp,{backend:newGoogleAIBackend()});// Create a `GenerativeModel` instance with a model that supports your use caseconstmodel=getGenerativeModel(ai,{model:"gemini-2.5-flash"});// Converts a File object to a Part object.asyncfunctionfileToGenerativePart(file){constbase64EncodedDataPromise=newPromise((resolve)=>{constreader=newFileReader();reader.onloadend=()=>resolve(reader.result.split(',')[1]);reader.readAsDataURL(file);});return{inlineData:{data:awaitbase64EncodedDataPromise,mimeType:file.type},};}asyncfunctionrun(){// Provide a text prompt to include with the imageconstprompt="What do you see?";// Prepare image for inputconstfileInputEl=document.querySelector("input[type=file]");constimagePart=awaitfileToGenerativePart(fileInputEl.files[0]);// To stream generated text output, call generateContentStream with the text and imageconstresult=awaitmodel.generateContentStream([prompt,imagePart]);forawait(constchunkofresult.stream){constchunkText=chunk.text();console.log(chunkText);}}run();
Multiple file input
import{initializeApp}from"firebase/app";import{getAI,getGenerativeModel,GoogleAIBackend}from"firebase/ai";// TODO(developer) Replace the following with your app's Firebase configuration// See: https://firebase.google.com/docs/web/learn-more#config-objectconstfirebaseConfig={// ...};// Initialize FirebaseAppconstfirebaseApp=initializeApp(firebaseConfig);// Initialize the Gemini Developer API backend serviceconstai=getAI(firebaseApp,{backend:newGoogleAIBackend()});// Create a `GenerativeModel` instance with a model that supports your use caseconstmodel=getGenerativeModel(ai,{model:"gemini-2.5-flash"});// Converts a File object to a Part object.asyncfunctionfileToGenerativePart(file){constbase64EncodedDataPromise=newPromise((resolve)=>{constreader=newFileReader();reader.onloadend=()=>resolve(reader.result.split(',')[1]);reader.readAsDataURL(file);});return{inlineData:{data:awaitbase64EncodedDataPromise,mimeType:file.type},};}asyncfunctionrun(){// Provide a text prompt to include with the imagesconstprompt="What's different between these pictures?";constfileInputEl=document.querySelector("input[type=file]");constimageParts=awaitPromise.all([...fileInputEl.files].map(fileToGenerativePart));// To stream generated text output, call generateContentStream with the text and imagesconstresult=awaitmodel.generateContentStream([prompt,...imageParts]);forawait(constchunkofresult.stream){constchunkText=chunk.text();console.log(chunkText);}}run();
Dart
You can callgenerateContentStream()to stream generated text from multimodal input of text and images.
Single file input
import'package:firebase_ai/firebase_ai.dart';import'package:firebase_core/firebase_core.dart';import'firebase_options.dart';// Initialize FirebaseAppawaitFirebase.initializeApp(options:DefaultFirebaseOptions.currentPlatform,);// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use casefinalmodel=FirebaseAI.googleAI().generativeModel(model:'gemini-2.5-flash');// Provide a text prompt to include with the imagefinalprompt=TextPart("What's in the picture?");// Prepare images for inputfinalimage=awaitFile('image0.jpg').readAsBytes();finalimagePart=InlineDataPart('image/jpeg',image);// To stream generated text output, call generateContentStream with the text and imagefinalresponse=awaitmodel.generateContentStream([Content.multi([prompt,imagePart])]);awaitfor(finalchunkinresponse){print(chunk.text);}
Multiple file input
import'package:firebase_ai/firebase_ai.dart';import'package:firebase_core/firebase_core.dart';import'firebase_options.dart';// Initialize FirebaseAppawaitFirebase.initializeApp(options:DefaultFirebaseOptions.currentPlatform,);// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use casefinalmodel=FirebaseAI.googleAI().generativeModel(model:'gemini-2.5-flash');final(firstImage,secondImage)=await(File('image0.jpg').readAsBytes(),File('image1.jpg').readAsBytes()).wait;// Provide a text prompt to include with the imagesfinalprompt=TextPart("What's different between these pictures?");// Prepare images for inputfinalimageParts=[InlineDataPart('image/jpeg',firstImage),InlineDataPart('image/jpeg',secondImage),];// To stream generated text output, call generateContentStream with the text and imagesfinalresponse=awaitmodel.generateContentStream([Content.multi([prompt,...imageParts])]);awaitfor(finalchunkinresponse){print(chunk.text);}
usingFirebase;usingFirebase.AI;// Initialize the Gemini Developer API backend servicevarai=FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());// Create a `GenerativeModel` instance with a model that supports your use casevarmodel=ai.GetGenerativeModel(modelName:"gemini-2.5-flash");// Convert a Texture2D into InlineDataPartsvargray=ModelContent.InlineData("image/png",UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.grayTexture));// Provide a text prompt to include with the imagevarprompt=ModelContent.Text("What's in this picture?");// To stream generated text output, call GenerateContentStreamAsync and pass in the promptvarresponseStream=model.GenerateContentStreamAsync(new[]{gray,prompt});awaitforeach(varresponseinresponseStream){if(!string.IsNullOrWhiteSpace(response.Text)){UnityEngine.Debug.Log(response.Text);}}
Multiple file input
usingFirebase;usingFirebase.AI;// Initialize the Gemini Developer API backend servicevarai=FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());// Create a `GenerativeModel` instance with a model that supports your use casevarmodel=ai.GetGenerativeModel(modelName:"gemini-2.5-flash");// Convert Texture2Ds into InlineDataPartsvarblack=ModelContent.InlineData("image/png",UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.blackTexture));varwhite=ModelContent.InlineData("image/png",UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.whiteTexture));// Provide a text prompt to include with the imagesvarprompt=ModelContent.Text("What's different between these pictures?");// To stream generated text output, call GenerateContentStreamAsync and pass in the promptvarresponseStream=model.GenerateContentStreamAsync(new[]{black,white,prompt});awaitforeach(varresponseinresponseStream){if(!string.IsNullOrWhiteSpace(response.Text)){UnityEngine.Debug.Log(response.Text);}}
Learn how to choose amodelappropriate for your use case and app.
Requirements and recommendations for input image files
Note that a file provided as inline data is encoded to base64 in transit, which
increases the size of the request. You get an HTTP 413 error if a request is
too large.
See "Supported input files and requirements" page to learn detailed information
about the following:
Geminimultimodal models support the following image MIME types:
PNG -image/png
JPEG -image/jpeg
WebP -image/webp
Limits per request
There isn't a specific limit to the number of pixels in an image. However,
larger images are scaled down and padded to fit a maximum resolution of 3072 x
3072 while preserving their original aspect ratio.
Maximum files per request: 3,000 image files
What else can you do?
Learn how tocount tokensbefore sending long prompts to the model.
Set upCloud Storage for Firebaseso that you can include large files in your multimodal requests and have a
more managed solution for providing files in prompts.
Files can include images, PDFs, video, and audio.
Start thinking about preparing for production (see theproduction checklist),
including:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[],[],null,[]]