Gemini 3 Pro & Flash, Gemini 3 Pro Image (nano banana pro), and the latest Gemini Live API native audio models are now available to use with Firebase AI Logic on all platforms!

Analyze image files using the Gemini API

You can ask a Gemini model to analyze image files that you provide either inline (base64-encoded) or via URL. When you use Firebase AI Logic , you can make this request directly from your app.

With this capability, you can do things like:

Create captions or answer questions about images
Write a short story or a poem about an image
Detect objects in an image and return bounding box coordinates for them
Label or categorize a set of images for sentiment, style, or other characteristic

Jump to code samples Jump to code for streamed responses

See other guides for additional options for working with images
Generate structured output Multi-turn chat Analyze images on-device Generate images

Before you begin

Click your Gemini API provider to view provider-specific content and code on this page.

If you haven't already, complete the getting started guide , which describes how to set up your Firebase project, connect your app to Firebase, add the SDK, initialize the backend service for your chosen Gemini API provider, and create a GenerativeModel instance.

For testing and iterating on your prompts, we recommend using Google AI Studio .

Need a sample image file?

You can use this publicly available file with a MIME type of image/jpeg ( view or download file ). https://storage.googleapis.com/cloud-samples-data/generative-ai/image/scones.jpg

Generate text from image files (base64-encoded)

Before trying this sample, complete the Before you begin section of this guide to set up your project and app.
In that section, you'll also click a button for your chosen Gemini API provider so that you see provider-specific content on this page.

You can ask a Gemini model to generate text by prompting with text and images—providing each input file's mimeType and the file itself. Find requirements and recommendations for input files later on this page.

Swift

You can call generateContent() to generate text from multimodal input of text and images.

Single file input

  import 
  
 FirebaseAILogic 
 // Initialize the Gemini Developer API backend service 
 let 
  
 ai 
  
 = 
  
 FirebaseAI 
 . 
 firebaseAI 
 ( 
 backend 
 : 
  
 . 
 googleAI 
 ()) 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 let 
  
 model 
  
 = 
  
 ai 
 . 
 generativeModel 
 ( 
 modelName 
 : 
  
 "gemini-2.5-flash" 
 ) 
  guard 
  
 let 
  
 image 
  
 = 
  
 UIImage 
 ( 
 systemName 
 : 
  
 "bicycle" 
 ) 
  
 else 
  
 { 
  
 fatalError 
 () 
  
 } 
 // Provide a text prompt to include with the image 
 let 
  
 prompt 
  
 = 
  
 "What's in this picture?" 
 // To generate text output, call generateContent and pass in the prompt 
 let 
  
 response 
  
 = 
  
 try 
  
 await 
  
 model 
 . 
 generateContent 
 ( 
 image 
 , 
  
 prompt 
 ) 
 print 
 ( 
 response 
 . 
 text 
  
 ?? 
  
 "No text in response." 
 )

Multiple file input

  import 
  
 FirebaseAILogic 
 // Initialize the Gemini Developer API backend service 
 let 
  
 ai 
  
 = 
  
 FirebaseAI 
 . 
 firebaseAI 
 ( 
 backend 
 : 
  
 . 
 googleAI 
 ()) 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 let 
  
 model 
  
 = 
  
 ai 
 . 
 generativeModel 
 ( 
 modelName 
 : 
  
 "gemini-2.5-flash" 
 ) 
  guard 
  
 let 
  
 image1 
  
 = 
  
 UIImage 
 ( 
 systemName 
 : 
  
 "car" 
 ) 
  
 else 
  
 { 
  
 fatalError 
 () 
  
 } 
 guard 
  
 let 
  
 image2 
  
 = 
  
 UIImage 
 ( 
 systemName 
 : 
  
 "car.2" 
 ) 
  
 else 
  
 { 
  
 fatalError 
 () 
  
 } 
 // Provide a text prompt to include with the images 
 let 
  
 prompt 
  
 = 
  
 "What's different between these pictures?" 
 // To generate text output, call generateContent and pass in the prompt 
 let 
  
 response 
  
 = 
  
 try 
  
 await 
  
 model 
 . 
 generateContent 
 ( 
 image1 
 , 
  
 image2 
 , 
  
 prompt 
 ) 
 print 
 ( 
 response 
 . 
 text 
  
 ?? 
  
 "No text in response." 
 )

Kotlin

You can call generateContent() to generate text from multimodal input of text and images.

^{For Kotlin, the methods in this SDK are suspend functions and need to be called
from a Coroutine scope
.}

Single file input

  // Initialize the Gemini Developer API backend service 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 val 
  
 model 
  
 = 
  
 Firebase 
 . 
 ai 
 ( 
 backend 
  
 = 
  
 GenerativeBackend 
 . 
 googleAI 
 ()) 
  
 . 
 generativeModel 
 ( 
 "gemini-2.5-flash" 
 ) 
  // Loads an image from the app/res/drawable/ directory 
 val 
  
 bitmap 
 : 
  
 Bitmap 
  
 = 
  
 BitmapFactory 
 . 
 decodeResource 
 ( 
 resources 
 , 
  
 R 
 . 
 drawable 
 . 
 sparky 
 ) 
 // Provide a prompt that includes the image specified above and text 
 val 
  
 prompt 
  
 = 
  
 content 
  
 { 
  
 image 
 ( 
 bitmap 
 ) 
  
 text 
 ( 
 "What developer tool is this mascot from?" 
 ) 
 } 
 // To generate text output, call generateContent with the prompt 
 val 
  
 response 
  
 = 
  
 model 
 . 
 generateContent 
 ( 
 prompt 
 ) 
 print 
 ( 
 response 
 . 
 text 
 )

Multiple file input

^{For Kotlin, the methods in this SDK are suspend functions and need to be called
from a Coroutine scope
.}

  // Initialize the Gemini Developer API backend service 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 val 
  
 model 
  
 = 
  
 Firebase 
 . 
 ai 
 ( 
 backend 
  
 = 
  
 GenerativeBackend 
 . 
 googleAI 
 ()) 
  
 . 
 generativeModel 
 ( 
 "gemini-2.5-flash" 
 ) 
  // Loads an image from the app/res/drawable/ directory 
 val 
  
 bitmap1 
 : 
  
 Bitmap 
  
 = 
  
 BitmapFactory 
 . 
 decodeResource 
 ( 
 resources 
 , 
  
 R 
 . 
 drawable 
 . 
 sparky 
 ) 
 val 
  
 bitmap2 
 : 
  
 Bitmap 
  
 = 
  
 BitmapFactory 
 . 
 decodeResource 
 ( 
 resources 
 , 
  
 R 
 . 
 drawable 
 . 
 sparky_eats_pizza 
 ) 
 // Provide a prompt that includes the images specified above and text 
 val 
  
 prompt 
  
 = 
  
 content 
  
 { 
  
 image 
 ( 
 bitmap1 
 ) 
  
 image 
 ( 
 bitmap2 
 ) 
  
 text 
 ( 
 "What is different between these pictures?" 
 ) 
 } 
 // To generate text output, call generateContent with the prompt 
 val 
  
 response 
  
 = 
  
 model 
 . 
 generateContent 
 ( 
 prompt 
 ) 
 print 
 ( 
 response 
 . 
 text 
 )

Java

You can call generateContent() to generate text from multimodal input of text and images.

^{For Java, the methods in this SDK return a ListenableFuture

.}

Single file input

  // Initialize the Gemini Developer API backend service 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 GenerativeModel 
  
 ai 
  
 = 
  
 FirebaseAI 
 . 
 getInstance 
 ( 
 GenerativeBackend 
 . 
 googleAI 
 ()) 
  
 . 
 generativeModel 
 ( 
 "gemini-2.5-flash" 
 ); 
 // Use the GenerativeModelFutures Java compatibility layer which offers 
 // support for ListenableFuture and Publisher APIs 
 GenerativeModelFutures 
  
 model 
  
 = 
  
 GenerativeModelFutures 
 . 
 from 
 ( 
 ai 
 ); 
  Bitmap 
  
 bitmap 
  
 = 
  
 BitmapFactory 
 . 
 decodeResource 
 ( 
 getResources 
 (), 
  
 R 
 . 
 drawable 
 . 
 sparky 
 ); 
 // Provide a prompt that includes the image specified above and text 
 Content 
  
 content 
  
 = 
  
 new 
  
 Content 
 . 
 Builder 
 () 
  
 . 
 addImage 
 ( 
 bitmap 
 ) 
  
 . 
 addText 
 ( 
 "What developer tool is this mascot from?" 
 ) 
  
 . 
 build 
 (); 
 // To generate text output, call generateContent with the prompt 
 ListenableFuture<GenerateContentResponse> 
  
 response 
  
 = 
  
 model 
 . 
 generateContent 
 ( 
 content 
 ); 
 Futures 
 . 
 addCallback 
 ( 
 response 
 , 
  
 new 
  
 FutureCallback<GenerateContentResponse> 
 () 
  
 { 
  
 @Override 
  
 public 
  
 void 
  
 onSuccess 
 ( 
 GenerateContentResponse 
  
 result 
 ) 
  
 { 
  
 String 
  
 resultText 
  
 = 
  
 result 
 . 
 getText 
 (); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 resultText 
 ); 
  
 } 
  
 @Override 
  
 public 
  
 void 
  
 onFailure 
 ( 
 Throwable 
  
 t 
 ) 
  
 { 
  
 t 
 . 
 printStackTrace 
 (); 
  
 } 
 }, 
  
 executor 
 );

Multiple file input

  // Initialize the Gemini Developer API backend service 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 GenerativeModel 
  
 ai 
  
 = 
  
 FirebaseAI 
 . 
 getInstance 
 ( 
 GenerativeBackend 
 . 
 googleAI 
 ()) 
  
 . 
 generativeModel 
 ( 
 "gemini-2.5-flash" 
 ); 
 // Use the GenerativeModelFutures Java compatibility layer which offers 
 // support for ListenableFuture and Publisher APIs 
 GenerativeModelFutures 
  
 model 
  
 = 
  
 GenerativeModelFutures 
 . 
 from 
 ( 
 ai 
 ); 
  Bitmap 
  
 bitmap1 
  
 = 
  
 BitmapFactory 
 . 
 decodeResource 
 ( 
 getResources 
 (), 
  
 R 
 . 
 drawable 
 . 
 sparky 
 ); 
 Bitmap 
  
 bitmap2 
  
 = 
  
 BitmapFactory 
 . 
 decodeResource 
 ( 
 getResources 
 (), 
  
 R 
 . 
 drawable 
 . 
 sparky_eats_pizza 
 ); 
 // Provide a prompt that includes the images specified above and text 
 Content 
  
 prompt 
  
 = 
  
 new 
  
 Content 
 . 
 Builder 
 () 
  
 . 
 addImage 
 ( 
 bitmap1 
 ) 
  
 . 
 addImage 
 ( 
 bitmap2 
 ) 
  
 . 
 addText 
 ( 
 "What's different between these pictures?" 
 ) 
  
 . 
 build 
 (); 
 // To generate text output, call generateContent with the prompt 
 ListenableFuture<GenerateContentResponse> 
  
 response 
  
 = 
  
 model 
 . 
 generateContent 
 ( 
 prompt 
 ); 
 Futures 
 . 
 addCallback 
 ( 
 response 
 , 
  
 new 
  
 FutureCallback<GenerateContentResponse> 
 () 
  
 { 
  
 @Override 
  
 public 
  
 void 
  
 onSuccess 
 ( 
 GenerateContentResponse 
  
 result 
 ) 
  
 { 
  
 String 
  
 resultText 
  
 = 
  
 result 
 . 
 getText 
 (); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 resultText 
 ); 
  
 } 
  
 @Override 
  
 public 
  
 void 
  
 onFailure 
 ( 
 Throwable 
  
 t 
 ) 
  
 { 
  
 t 
 . 
 printStackTrace 
 (); 
  
 } 
 }, 
  
 executor 
 );

Web

You can call generateContent() to generate text from multimodal input of text and images.

Single file input

  import 
  
 { 
  
 initializeApp 
  
 } 
  
 from 
  
 "firebase/app" 
 ; 
 import 
  
 { 
  
 getAI 
 , 
  
 getGenerativeModel 
 , 
  
 GoogleAIBackend 
  
 } 
  
 from 
  
 "firebase/ai" 
 ; 
 // TODO(developer) Replace the following with your app's Firebase configuration 
 // See: https://firebase.google.com/docs/web/learn-more#config-object 
 const 
  
 firebaseConfig 
  
 = 
  
 { 
  
 // ... 
 }; 
 // Initialize FirebaseApp 
 const 
  
 firebaseApp 
  
 = 
  
 initializeApp 
 ( 
 firebaseConfig 
 ); 
 // Initialize the Gemini Developer API backend service 
 const 
  
 ai 
  
 = 
  
 getAI 
 ( 
 firebaseApp 
 , 
  
 { 
  
 backend 
 : 
  
 new 
  
 GoogleAIBackend 
 () 
  
 }); 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 const 
  
 model 
  
 = 
  
 getGenerativeModel 
 ( 
 ai 
 , 
  
 { 
  
 model 
 : 
  
 "gemini-2.5-flash" 
  
 }); 
  // Converts a File object to a Part object. 
 async 
  
 function 
  
 fileToGenerativePart 
 ( 
 file 
 ) 
  
 { 
  
 const 
  
 base64EncodedDataPromise 
  
 = 
  
 new 
  
 Promise 
 (( 
 resolve 
 ) 
  
 => 
  
 { 
  
 const 
  
 reader 
  
 = 
  
 new 
  
 FileReader 
 (); 
  
 reader 
 . 
 onloadend 
  
 = 
  
 () 
  
 => 
  
 resolve 
 ( 
 reader 
 . 
 result 
 . 
 split 
 ( 
 ',' 
 )[ 
 1 
 ]); 
  
 reader 
 . 
 readAsDataURL 
 ( 
 file 
 ); 
  
 }); 
  
 return 
  
 { 
  
 inlineData 
 : 
  
 { 
  
 data 
 : 
  
 await 
  
 base64EncodedDataPromise 
 , 
  
 mimeType 
 : 
  
 file 
 . 
 type 
  
 }, 
  
 }; 
 } 
 async 
  
 function 
  
 run 
 () 
  
 { 
  
 // Provide a text prompt to include with the image 
  
 const 
  
 prompt 
  
 = 
  
 "What do you see?" 
 ; 
  
 const 
  
 fileInputEl 
  
 = 
  
 document 
 . 
 querySelector 
 ( 
 "input[type=file]" 
 ); 
  
 const 
  
 imagePart 
  
 = 
  
 await 
  
 fileToGenerativePart 
 ( 
 fileInputEl 
 . 
 files 
 [ 
 0 
 ]); 
  
 // To generate text output, call generateContent with the text and image 
  
 const 
  
 result 
  
 = 
  
 await 
  
 model 
 . 
 generateContent 
 ([ 
 prompt 
 , 
  
 imagePart 
 ]); 
  
 const 
  
 response 
  
 = 
  
 result 
 . 
 response 
 ; 
  
 const 
  
 text 
  
 = 
  
 response 
 . 
 text 
 (); 
  
 console 
 . 
 log 
 ( 
 text 
 ); 
 } 
 run 
 ();

Multiple file input

  import 
  
 { 
  
 initializeApp 
  
 } 
  
 from 
  
 "firebase/app" 
 ; 
 import 
  
 { 
  
 getAI 
 , 
  
 getGenerativeModel 
 , 
  
 GoogleAIBackend 
  
 } 
  
 from 
  
 "firebase/ai" 
 ; 
 // TODO(developer) Replace the following with your app's Firebase configuration 
 // See: https://firebase.google.com/docs/web/learn-more#config-object 
 const 
  
 firebaseConfig 
  
 = 
  
 { 
  
 // ... 
 }; 
 // Initialize FirebaseApp 
 const 
  
 firebaseApp 
  
 = 
  
 initializeApp 
 ( 
 firebaseConfig 
 ); 
 // Initialize the Gemini Developer API backend service 
 const 
  
 ai 
  
 = 
  
 getAI 
 ( 
 firebaseApp 
 , 
  
 { 
  
 backend 
 : 
  
 new 
  
 GoogleAIBackend 
 () 
  
 }); 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 const 
  
 model 
  
 = 
  
 getGenerativeModel 
 ( 
 ai 
 , 
  
 { 
  
 model 
 : 
  
 "gemini-2.5-flash" 
  
 }); 
  // Converts a File object to a Part object. 
 async 
  
 function 
  
 fileToGenerativePart 
 ( 
 file 
 ) 
  
 { 
  
 const 
  
 base64EncodedDataPromise 
  
 = 
  
 new 
  
 Promise 
 (( 
 resolve 
 ) 
  
 => 
  
 { 
  
 const 
  
 reader 
  
 = 
  
 new 
  
 FileReader 
 (); 
  
 reader 
 . 
 onloadend 
  
 = 
  
 () 
  
 => 
  
 resolve 
 ( 
 reader 
 . 
 result 
 . 
 split 
 ( 
 ',' 
 )[ 
 1 
 ]); 
  
 reader 
 . 
 readAsDataURL 
 ( 
 file 
 ); 
  
 }); 
  
 return 
  
 { 
  
 inlineData 
 : 
  
 { 
  
 data 
 : 
  
 await 
  
 base64EncodedDataPromise 
 , 
  
 mimeType 
 : 
  
 file 
 . 
 type 
  
 }, 
  
 }; 
 } 
 async 
  
 function 
  
 run 
 () 
  
 { 
  
 // Provide a text prompt to include with the images 
  
 const 
  
 prompt 
  
 = 
  
 "What's different between these pictures?" 
 ; 
  
 // Prepare images for input 
  
 const 
  
 fileInputEl 
  
 = 
  
 document 
 . 
 querySelector 
 ( 
 "input[type=file]" 
 ); 
  
 const 
  
 imageParts 
  
 = 
  
 await 
  
 Promise 
 . 
 all 
 ( 
  
 [... 
 fileInputEl 
 . 
 files 
 ]. 
 map 
 ( 
 fileToGenerativePart 
 ) 
  
 ); 
  
 // To generate text output, call generateContent with the text and images 
  
 const 
  
 result 
  
 = 
  
 await 
  
 model 
 . 
 generateContent 
 ([ 
 prompt 
 , 
  
 ... 
 imageParts 
 ]); 
  
 const 
  
 response 
  
 = 
  
 result 
 . 
 response 
 ; 
  
 const 
  
 text 
  
 = 
  
 response 
 . 
 text 
 (); 
  
 console 
 . 
 log 
 ( 
 text 
 ); 
 } 
 run 
 ();

Dart

You can call generateContent() to generate text from multimodal input of text and images.

Single file input

  import 
  
 'package:firebase_ai/firebase_ai.dart' 
 ; 
 import 
  
 'package:firebase_core/firebase_core.dart' 
 ; 
 import 
  
 'firebase_options.dart' 
 ; 
 // Initialize FirebaseApp 
 await 
  
 Firebase 
 . 
 initializeApp 
 ( 
  
 options: 
  
 DefaultFirebaseOptions 
 . 
 currentPlatform 
 , 
 ); 
 // Initialize the Gemini Developer API backend service 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 final 
  
 model 
  
 = 
  
 FirebaseAI 
 . 
 googleAI 
 (). 
 generativeModel 
 ( 
 model: 
  
 'gemini-2.5-flash' 
 ); 
  // Provide a text prompt to include with the image 
 final 
  
 prompt 
  
 = 
  
 TextPart 
 ( 
 "What's in the picture?" 
 ); 
 // Prepare images for input 
 final 
  
 image 
  
 = 
  
 await 
  
 File 
 ( 
 'image0.jpg' 
 ). 
 readAsBytes 
 (); 
 final 
  
 imagePart 
  
 = 
  
 InlineDataPart 
 ( 
 'image/jpeg' 
 , 
  
 image 
 ); 
 // To generate text output, call generateContent with the text and image 
 final 
  
 response 
  
 = 
  
 await 
  
 model 
 . 
 generateContent 
 ([ 
  
 Content 
 . 
 multi 
 ([ 
 prompt 
 , 
 imagePart 
 ]) 
 ]); 
 print 
 ( 
 response 
 . 
 text 
 );

Multiple file input

  import 
  
 'package:firebase_ai/firebase_ai.dart' 
 ; 
 import 
  
 'package:firebase_core/firebase_core.dart' 
 ; 
 import 
  
 'firebase_options.dart' 
 ; 
 // Initialize FirebaseApp 
 await 
  
 Firebase 
 . 
 initializeApp 
 ( 
  
 options: 
  
 DefaultFirebaseOptions 
 . 
 currentPlatform 
 , 
 ); 
 // Initialize the Gemini Developer API backend service 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 final 
  
 model 
  
 = 
  
 FirebaseAI 
 . 
 googleAI 
 (). 
 generativeModel 
 ( 
 model: 
  
 'gemini-2.5-flash' 
 ); 
  final 
  
 ( 
 firstImage 
 , 
  
 secondImage 
 ) 
  
 = 
  
 await 
  
 ( 
  
 File 
 ( 
 'image0.jpg' 
 ). 
 readAsBytes 
 (), 
  
 File 
 ( 
 'image1.jpg' 
 ). 
 readAsBytes 
 () 
 ). 
 wait 
 ; 
 // Provide a text prompt to include with the images 
 final 
  
 prompt 
  
 = 
  
 TextPart 
 ( 
 "What's different between these pictures?" 
 ); 
 // Prepare images for input 
 final 
  
 imageParts 
  
 = 
  
 [ 
  
 InlineDataPart 
 ( 
 'image/jpeg' 
 , 
  
 firstImage 
 ), 
  
 InlineDataPart 
 ( 
 'image/jpeg' 
 , 
  
 secondImage 
 ), 
 ]; 
 // To generate text output, call generateContent with the text and images 
 final 
  
 response 
  
 = 
  
 await 
  
 model 
 . 
 generateContent 
 ([ 
  
 Content 
 . 
 multi 
 ([ 
 prompt 
 , 
  
 ... 
 imageParts 
 ]) 
 ]); 
 print 
 ( 
 response 
 . 
 text 
 );

Unity

You can call GenerateContentAsync() to generate text from multimodal input of text and images.

Single file input

  using 
  
 Firebase 
 ; 
 using 
  
 Firebase.AI 
 ; 
 // Initialize the Gemini Developer API backend service 
 var 
  
 ai 
  
 = 
  
 FirebaseAI 
 . 
 GetInstance 
 ( 
 FirebaseAI 
 . 
 Backend 
 . 
 GoogleAI 
 ()); 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 var 
  
 model 
  
 = 
  
 ai 
 . 
 GetGenerativeModel 
 ( 
 modelName 
 : 
  
 "gemini-2.5-flash" 
 ); 
  // Convert a Texture2D into InlineDataParts 
 var 
  
 grayImage 
  
 = 
  
 ModelContent 
 . 
 InlineData 
 ( 
 "image/png" 
 , 
  
 UnityEngine 
 . 
 ImageConversion 
 . 
 EncodeToPNG 
 ( 
 UnityEngine 
 . 
 Texture2D 
 . 
 grayTexture 
 )); 
 // Provide a text prompt to include with the image 
 var 
  
 prompt 
  
 = 
  
 ModelContent 
 . 
 Text 
 ( 
 "What's in this picture?" 
 ); 
 // To generate text output, call GenerateContentAsync and pass in the prompt 
 var 
  
 response 
  
 = 
  
 await 
  
 model 
 . 
 GenerateContentAsync 
 ( 
 new 
  
 [] 
  
 { 
  
 grayImage 
 , 
  
 prompt 
  
 }); 
 UnityEngine 
 . 
 Debug 
 . 
 Log 
 ( 
 response 
 . 
 Text 
  
 ?? 
  
 "No text in response." 
 );

Multiple file input

  using 
  
 Firebase 
 ; 
 using 
  
 Firebase.AI 
 ; 
 // Initialize the Gemini Developer API backend service 
 var 
  
 ai 
  
 = 
  
 FirebaseAI 
 . 
 GetInstance 
 ( 
 FirebaseAI 
 . 
 Backend 
 . 
 GoogleAI 
 ()); 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 var 
  
 model 
  
 = 
  
 ai 
 . 
 GetGenerativeModel 
 ( 
 modelName 
 : 
  
 "gemini-2.5-flash" 
 ); 
  // Convert Texture2Ds into InlineDataParts 
 var 
  
 blackImage 
  
 = 
  
 ModelContent 
 . 
 InlineData 
 ( 
 "image/png" 
 , 
  
 UnityEngine 
 . 
 ImageConversion 
 . 
 EncodeToPNG 
 ( 
 UnityEngine 
 . 
 Texture2D 
 . 
 blackTexture 
 )); 
 var 
  
 whiteImage 
  
 = 
  
 ModelContent 
 . 
 InlineData 
 ( 
 "image/png" 
 , 
  
 UnityEngine 
 . 
 ImageConversion 
 . 
 EncodeToPNG 
 ( 
 UnityEngine 
 . 
 Texture2D 
 . 
 whiteTexture 
 )); 
 // Provide a text prompt to include with the images 
 var 
  
 prompt 
  
 = 
  
 ModelContent 
 . 
 Text 
 ( 
 "What's different between these pictures?" 
 ); 
 // To generate text output, call GenerateContentAsync and pass in the prompt 
 var 
  
 response 
  
 = 
  
 await 
  
 model 
 . 
 GenerateContentAsync 
 ( 
 new 
  
 [] 
  
 { 
  
 blackImage 
 , 
  
 whiteImage 
 , 
  
 prompt 
  
 }); 
 UnityEngine 
 . 
 Debug 
 . 
 Log 
 ( 
 response 
 . 
 Text 
  
 ?? 
  
 "No text in response." 
 );

Learn how to choose a model appropriate for your use case and app.

Stream the response

You can achieve faster interactions by not waiting for the entire result from the model generation, and instead use streaming to handle partial results. To stream the response, call generateContentStream .

View example: Stream generated text from image files

Swift

You can call generateContentStream() to stream generated text from multimodal input of text and images.

Single file input

  import 
  
 FirebaseAILogic 
 // Initialize the Gemini Developer API backend service 
 let 
  
 ai 
  
 = 
  
 FirebaseAI 
 . 
 firebaseAI 
 ( 
 backend 
 : 
  
 . 
 googleAI 
 ()) 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 let 
  
 model 
  
 = 
  
 ai 
 . 
 generativeModel 
 ( 
 modelName 
 : 
  
 "gemini-2.5-flash" 
 ) 
  guard 
  
 let 
  
 image 
  
 = 
  
 UIImage 
 ( 
 systemName 
 : 
  
 "bicycle" 
 ) 
  
 else 
  
 { 
  
 fatalError 
 () 
  
 } 
 // Provide a text prompt to include with the image 
 let 
  
 prompt 
  
 = 
  
 "What's in this picture?" 
 // To stream generated text output, call generateContentStream and pass in the prompt 
 let 
  
 contentStream 
  
 = 
  
 try 
  
 model 
 . 
 generateContentStream 
 ( 
 image 
 , 
  
 prompt 
 ) 
 for 
  
 try 
  
 await 
  
 chunk 
  
 in 
  
 contentStream 
  
 { 
  
 if 
  
 let 
  
 text 
  
 = 
  
 chunk 
 . 
 text 
  
 { 
  
 print 
 ( 
 text 
 ) 
  
 } 
 }

Multiple file input

  import 
  
 FirebaseAILogic 
 // Initialize the Gemini Developer API backend service 
 let 
  
 ai 
  
 = 
  
 FirebaseAI 
 . 
 firebaseAI 
 ( 
 backend 
 : 
  
 . 
 googleAI 
 ()) 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 let 
  
 model 
  
 = 
  
 ai 
 . 
 generativeModel 
 ( 
 modelName 
 : 
  
 "gemini-2.5-flash" 
 ) 
  guard 
  
 let 
  
 image1 
  
 = 
  
 UIImage 
 ( 
 systemName 
 : 
  
 "car" 
 ) 
  
 else 
  
 { 
  
 fatalError 
 () 
  
 } 
 guard 
  
 let 
  
 image2 
  
 = 
  
 UIImage 
 ( 
 systemName 
 : 
  
 "car.2" 
 ) 
  
 else 
  
 { 
  
 fatalError 
 () 
  
 } 
 // Provide a text prompt to include with the images 
 let 
  
 prompt 
  
 = 
  
 "What's different between these pictures?" 
 // To stream generated text output, call generateContentStream and pass in the prompt 
 let 
  
 contentStream 
  
 = 
  
 try 
  
 model 
 . 
 generateContentStream 
 ( 
 image1 
 , 
  
 image2 
 , 
  
 prompt 
 ) 
 for 
  
 try 
  
 await 
  
 chunk 
  
 in 
  
 contentStream 
  
 { 
  
 if 
  
 let 
  
 text 
  
 = 
  
 chunk 
 . 
 text 
  
 { 
  
 print 
 ( 
 text 
 ) 
  
 } 
 }

Kotlin

You can call generateContentStream() to stream generated text from multimodal input of text and images.

^{For Kotlin, the methods in this SDK are suspend functions and need to be called
from a Coroutine scope
.}

Single file input

  // Initialize the Gemini Developer API backend service 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 val 
  
 model 
  
 = 
  
 Firebase 
 . 
 ai 
 ( 
 backend 
  
 = 
  
 GenerativeBackend 
 . 
 googleAI 
 ()) 
  
 . 
 generativeModel 
 ( 
 "gemini-2.5-flash" 
 ) 
  // Loads an image from the app/res/drawable/ directory 
 val 
  
 bitmap 
 : 
  
 Bitmap 
  
 = 
  
 BitmapFactory 
 . 
 decodeResource 
 ( 
 resources 
 , 
  
 R 
 . 
 drawable 
 . 
 sparky 
 ) 
 // Provide a prompt that includes the image specified above and text 
 val 
  
 prompt 
  
 = 
  
 content 
  
 { 
  
 image 
 ( 
 bitmap 
 ) 
  
 text 
 ( 
 "What developer tool is this mascot from?" 
 ) 
 } 
 // To stream generated text output, call generateContentStream with the prompt 
 var 
  
 fullResponse 
  
 = 
  
 "" 
 model 
 . 
 generateContentStream 
 ( 
 prompt 
 ). 
 collect 
  
 { 
  
 chunk 
  
 -> 
  
 print 
 ( 
 chunk 
 . 
 text 
 ) 
  
 fullResponse 
  
 += 
  
 chunk 
 . 
 text 
 }

Multiple file input

  // Initialize the Gemini Developer API backend service 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 val 
  
 model 
  
 = 
  
 Firebase 
 . 
 ai 
 ( 
 backend 
  
 = 
  
 GenerativeBackend 
 . 
 googleAI 
 ()) 
  
 . 
 generativeModel 
 ( 
 "gemini-2.5-flash" 
 ) 
  // Loads an image from the app/res/drawable/ directory 
 val 
  
 bitmap1 
 : 
  
 Bitmap 
  
 = 
  
 BitmapFactory 
 . 
 decodeResource 
 ( 
 resources 
 , 
  
 R 
 . 
 drawable 
 . 
 sparky 
 ) 
 val 
  
 bitmap2 
 : 
  
 Bitmap 
  
 = 
  
 BitmapFactory 
 . 
 decodeResource 
 ( 
 resources 
 , 
  
 R 
 . 
 drawable 
 . 
 sparky_eats_pizza 
 ) 
 // Provide a prompt that includes the images specified above and text 
 val 
  
 prompt 
  
 = 
  
 content 
  
 { 
  
 image 
 ( 
 bitmap1 
 ) 
  
 image 
 ( 
 bitmap2 
 ) 
  
 text 
 ( 
 "What's different between these pictures?" 
 ) 
 } 
 // To stream generated text output, call generateContentStream with the prompt 
 var 
  
 fullResponse 
  
 = 
  
 "" 
 model 
 . 
 generateContentStream 
 ( 
 prompt 
 ). 
 collect 
  
 { 
  
 chunk 
  
 -> 
  
 print 
 ( 
 chunk 
 . 
 text 
 ) 
  
 fullResponse 
  
 += 
  
 chunk 
 . 
 text 
 }

Java

You can call generateContentStream() to stream generated text from multimodal input of text and images.

^{For Java, the streaming methods in this SDK return a Publisher
type from the Reactive Streams library
.}

Single file input

  // Initialize the Gemini Developer API backend service 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 GenerativeModel 
  
 ai 
  
 = 
  
 FirebaseAI 
 . 
 getInstance 
 ( 
 GenerativeBackend 
 . 
 googleAI 
 ()) 
  
 . 
 generativeModel 
 ( 
 "gemini-2.5-flash" 
 ); 
 // Use the GenerativeModelFutures Java compatibility layer which offers 
 // support for ListenableFuture and Publisher APIs 
 GenerativeModelFutures 
  
 model 
  
 = 
  
 GenerativeModelFutures 
 . 
 from 
 ( 
 ai 
 ); 
  Bitmap 
  
 bitmap 
  
 = 
  
 BitmapFactory 
 . 
 decodeResource 
 ( 
 getResources 
 (), 
  
 R 
 . 
 drawable 
 . 
 sparky 
 ); 
 // Provide a prompt that includes the image specified above and text 
 Content 
  
 prompt 
  
 = 
  
 new 
  
 Content 
 . 
 Builder 
 () 
  
 . 
 addImage 
 ( 
 bitmap 
 ) 
  
 . 
 addText 
 ( 
 "What developer tool is this mascot from?" 
 ) 
  
 . 
 build 
 (); 
 // To stream generated text output, call generateContentStream with the prompt 
 Publisher<GenerateContentResponse> 
  
 streamingResponse 
  
 = 
  
 model 
 . 
 generateContentStream 
 ( 
 prompt 
 ); 
 final 
  
 String 
 [] 
  
 fullResponse 
  
 = 
  
 { 
 "" 
 }; 
 streamingResponse 
 . 
 subscribe 
 ( 
 new 
  
 Subscriber<GenerateContentResponse> 
 () 
  
 { 
  
 @Override 
  
 public 
  
 void 
  
 onNext 
 ( 
 GenerateContentResponse 
  
 generateContentResponse 
 ) 
  
 { 
  
 String 
  
 chunk 
  
 = 
  
 generateContentResponse 
 . 
 getText 
 (); 
  
 fullResponse 
 [ 
 0 
 ] 
  
 += 
  
 chunk 
 ; 
  
 } 
  
 @Override 
  
 public 
  
 void 
  
 onComplete 
 () 
  
 { 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 fullResponse 
 [ 
 0 
 ] 
 ); 
  
 } 
  
 @Override 
  
 public 
  
 void 
  
 onError 
 ( 
 Throwable 
  
 t 
 ) 
  
 { 
  
 t 
 . 
 printStackTrace 
 (); 
  
 } 
  
 @Override 
  
 public 
  
 void 
  
 onSubscribe 
 ( 
 Subscription 
  
 s 
 ) 
  
 { 
  
 } 
 });

Multiple file input

  // Initialize the Gemini Developer API backend service 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 GenerativeModel 
  
 ai 
  
 = 
  
 FirebaseAI 
 . 
 getInstance 
 ( 
 GenerativeBackend 
 . 
 googleAI 
 ()) 
  
 . 
 generativeModel 
 ( 
 "gemini-2.5-flash" 
 ); 
 // Use the GenerativeModelFutures Java compatibility layer which offers 
 // support for ListenableFuture and Publisher APIs 
 GenerativeModelFutures 
  
 model 
  
 = 
  
 GenerativeModelFutures 
 . 
 from 
 ( 
 ai 
 ); 
  Bitmap 
  
 bitmap1 
  
 = 
  
 BitmapFactory 
 . 
 decodeResource 
 ( 
 getResources 
 (), 
  
 R 
 . 
 drawable 
 . 
 sparky 
 ); 
 Bitmap 
  
 bitmap2 
  
 = 
  
 BitmapFactory 
 . 
 decodeResource 
 ( 
 getResources 
 (), 
  
 R 
 . 
 drawable 
 . 
 sparky_eats_pizza 
 ); 
 // Provide a prompt that includes the images specified above and text 
 Content 
  
 prompt 
  
 = 
  
 new 
  
 Content 
 . 
 Builder 
 () 
  
 . 
 addImage 
 ( 
 bitmap1 
 ) 
  
 . 
 addImage 
 ( 
 bitmap2 
 ) 
  
 . 
 addText 
 ( 
 "What's different between these pictures?" 
 ) 
  
 . 
 build 
 (); 
 // To stream generated text output, call generateContentStream with the prompt 
 Publisher<GenerateContentResponse> 
  
 streamingResponse 
  
 = 
  
 model 
 . 
 generateContentStream 
 ( 
 prompt 
 ); 
 final 
  
 String 
 [] 
  
 fullResponse 
  
 = 
  
 { 
 "" 
 }; 
 streamingResponse 
 . 
 subscribe 
 ( 
 new 
  
 Subscriber<GenerateContentResponse> 
 () 
  
 { 
  
 @Override 
  
 public 
  
 void 
  
 onNext 
 ( 
 GenerateContentResponse 
  
 generateContentResponse 
 ) 
  
 { 
  
 String 
  
 chunk 
  
 = 
  
 generateContentResponse 
 . 
 getText 
 (); 
  
 fullResponse 
 [ 
 0 
 ] 
  
 += 
  
 chunk 
 ; 
  
 } 
  
 @Override 
  
 public 
  
 void 
  
 onComplete 
 () 
  
 { 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 fullResponse 
 [ 
 0 
 ] 
 ); 
  
 } 
  
 @Override 
  
 public 
  
 void 
  
 onError 
 ( 
 Throwable 
  
 t 
 ) 
  
 { 
  
 t 
 . 
 printStackTrace 
 (); 
  
 } 
  
 @Override 
  
 public 
  
 void 
  
 onSubscribe 
 ( 
 Subscription 
  
 s 
 ) 
  
 { 
  
 } 
 });

Web

You can call generateContentStream() to stream generated text from multimodal input of text and images.

Single file input

  import 
  
 { 
  
 initializeApp 
  
 } 
  
 from 
  
 "firebase/app" 
 ; 
 import 
  
 { 
  
 getAI 
 , 
  
 getGenerativeModel 
 , 
  
 GoogleAIBackend 
  
 } 
  
 from 
  
 "firebase/ai" 
 ; 
 // TODO(developer) Replace the following with your app's Firebase configuration 
 // See: https://firebase.google.com/docs/web/learn-more#config-object 
 const 
  
 firebaseConfig 
  
 = 
  
 { 
  
 // ... 
 }; 
 // Initialize FirebaseApp 
 const 
  
 firebaseApp 
  
 = 
  
 initializeApp 
 ( 
 firebaseConfig 
 ); 
 // Initialize the Gemini Developer API backend service 
 const 
  
 ai 
  
 = 
  
 getAI 
 ( 
 firebaseApp 
 , 
  
 { 
  
 backend 
 : 
  
 new 
  
 GoogleAIBackend 
 () 
  
 }); 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 const 
  
 model 
  
 = 
  
 getGenerativeModel 
 ( 
 ai 
 , 
  
 { 
  
 model 
 : 
  
 "gemini-2.5-flash" 
  
 }); 
  // Converts a File object to a Part object. 
 async 
  
 function 
  
 fileToGenerativePart 
 ( 
 file 
 ) 
  
 { 
  
 const 
  
 base64EncodedDataPromise 
  
 = 
  
 new 
  
 Promise 
 (( 
 resolve 
 ) 
  
 => 
  
 { 
  
 const 
  
 reader 
  
 = 
  
 new 
  
 FileReader 
 (); 
  
 reader 
 . 
 onloadend 
  
 = 
  
 () 
  
 => 
  
 resolve 
 ( 
 reader 
 . 
 result 
 . 
 split 
 ( 
 ',' 
 )[ 
 1 
 ]); 
  
 reader 
 . 
 readAsDataURL 
 ( 
 file 
 ); 
  
 }); 
  
 return 
  
 { 
  
 inlineData 
 : 
  
 { 
  
 data 
 : 
  
 await 
  
 base64EncodedDataPromise 
 , 
  
 mimeType 
 : 
  
 file 
 . 
 type 
  
 }, 
  
 }; 
 } 
 async 
  
 function 
  
 run 
 () 
  
 { 
  
 // Provide a text prompt to include with the image 
  
 const 
  
 prompt 
  
 = 
  
 "What do you see?" 
 ; 
  
 // Prepare image for input 
  
 const 
  
 fileInputEl 
  
 = 
  
 document 
 . 
 querySelector 
 ( 
 "input[type=file]" 
 ); 
  
 const 
  
 imagePart 
  
 = 
  
 await 
  
 fileToGenerativePart 
 ( 
 fileInputEl 
 . 
 files 
 [ 
 0 
 ]); 
  
 // To stream generated text output, call generateContentStream with the text and image 
  
 const 
  
 result 
  
 = 
  
 await 
  
 model 
 . 
 generateContentStream 
 ([ 
 prompt 
 , 
  
 imagePart 
 ]); 
  
 for 
  
 await 
  
 ( 
 const 
  
 chunk 
  
 of 
  
 result 
 . 
 stream 
 ) 
  
 { 
  
 const 
  
 chunkText 
  
 = 
  
 chunk 
 . 
 text 
 (); 
  
 console 
 . 
 log 
 ( 
 chunkText 
 ); 
  
 } 
 } 
 run 
 ();

Multiple file input

  import 
  
 { 
  
 initializeApp 
  
 } 
  
 from 
  
 "firebase/app" 
 ; 
 import 
  
 { 
  
 getAI 
 , 
  
 getGenerativeModel 
 , 
  
 GoogleAIBackend 
  
 } 
  
 from 
  
 "firebase/ai" 
 ; 
 // TODO(developer) Replace the following with your app's Firebase configuration 
 // See: https://firebase.google.com/docs/web/learn-more#config-object 
 const 
  
 firebaseConfig 
  
 = 
  
 { 
  
 // ... 
 }; 
 // Initialize FirebaseApp 
 const 
  
 firebaseApp 
  
 = 
  
 initializeApp 
 ( 
 firebaseConfig 
 ); 
 // Initialize the Gemini Developer API backend service 
 const 
  
 ai 
  
 = 
  
 getAI 
 ( 
 firebaseApp 
 , 
  
 { 
  
 backend 
 : 
  
 new 
  
 GoogleAIBackend 
 () 
  
 }); 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 const 
  
 model 
  
 = 
  
 getGenerativeModel 
 ( 
 ai 
 , 
  
 { 
  
 model 
 : 
  
 "gemini-2.5-flash" 
  
 }); 
  // Converts a File object to a Part object. 
 async 
  
 function 
  
 fileToGenerativePart 
 ( 
 file 
 ) 
  
 { 
  
 const 
  
 base64EncodedDataPromise 
  
 = 
  
 new 
  
 Promise 
 (( 
 resolve 
 ) 
  
 => 
  
 { 
  
 const 
  
 reader 
  
 = 
  
 new 
  
 FileReader 
 (); 
  
 reader 
 . 
 onloadend 
  
 = 
  
 () 
  
 => 
  
 resolve 
 ( 
 reader 
 . 
 result 
 . 
 split 
 ( 
 ',' 
 )[ 
 1 
 ]); 
  
 reader 
 . 
 readAsDataURL 
 ( 
 file 
 ); 
  
 }); 
  
 return 
  
 { 
  
 inlineData 
 : 
  
 { 
  
 data 
 : 
  
 await 
  
 base64EncodedDataPromise 
 , 
  
 mimeType 
 : 
  
 file 
 . 
 type 
  
 }, 
  
 }; 
 } 
 async 
  
 function 
  
 run 
 () 
  
 { 
  
 // Provide a text prompt to include with the images 
  
 const 
  
 prompt 
  
 = 
  
 "What's different between these pictures?" 
 ; 
  
 const 
  
 fileInputEl 
  
 = 
  
 document 
 . 
 querySelector 
 ( 
 "input[type=file]" 
 ); 
  
 const 
  
 imageParts 
  
 = 
  
 await 
  
 Promise 
 . 
 all 
 ( 
  
 [... 
 fileInputEl 
 . 
 files 
 ]. 
 map 
 ( 
 fileToGenerativePart 
 ) 
  
 ); 
  
 // To stream generated text output, call generateContentStream with the text and images 
  
 const 
  
 result 
  
 = 
  
 await 
  
 model 
 . 
 generateContentStream 
 ([ 
 prompt 
 , 
  
 ... 
 imageParts 
 ]); 
  
 for 
  
 await 
  
 ( 
 const 
  
 chunk 
  
 of 
  
 result 
 . 
 stream 
 ) 
  
 { 
  
 const 
  
 chunkText 
  
 = 
  
 chunk 
 . 
 text 
 (); 
  
 console 
 . 
 log 
 ( 
 chunkText 
 ); 
  
 } 
 } 
 run 
 ();

Dart

You can call generateContentStream() to stream generated text from multimodal input of text and images.

Single file input

  import 
  
 'package:firebase_ai/firebase_ai.dart' 
 ; 
 import 
  
 'package:firebase_core/firebase_core.dart' 
 ; 
 import 
  
 'firebase_options.dart' 
 ; 
 // Initialize FirebaseApp 
 await 
  
 Firebase 
 . 
 initializeApp 
 ( 
  
 options: 
  
 DefaultFirebaseOptions 
 . 
 currentPlatform 
 , 
 ); 
 // Initialize the Gemini Developer API backend service 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 final 
  
 model 
  
 = 
  
 FirebaseAI 
 . 
 googleAI 
 (). 
 generativeModel 
 ( 
 model: 
  
 'gemini-2.5-flash' 
 ); 
  // Provide a text prompt to include with the image 
 final 
  
 prompt 
  
 = 
  
 TextPart 
 ( 
 "What's in the picture?" 
 ); 
 // Prepare images for input 
 final 
  
 image 
  
 = 
  
 await 
  
 File 
 ( 
 'image0.jpg' 
 ). 
 readAsBytes 
 (); 
 final 
  
 imagePart 
  
 = 
  
 InlineDataPart 
 ( 
 'image/jpeg' 
 , 
  
 image 
 ); 
 // To stream generated text output, call generateContentStream with the text and image 
 final 
  
 response 
  
 = 
  
 await 
  
 model 
 . 
 generateContentStream 
 ([ 
  
 Content 
 . 
 multi 
 ([ 
 prompt 
 , 
 imagePart 
 ]) 
 ]); 
 await 
  
 for 
  
 ( 
 final 
  
 chunk 
  
 in 
  
 response 
 ) 
  
 { 
  
 print 
 ( 
 chunk 
 . 
 text 
 ); 
 }

Multiple file input

  import 
  
 'package:firebase_ai/firebase_ai.dart' 
 ; 
 import 
  
 'package:firebase_core/firebase_core.dart' 
 ; 
 import 
  
 'firebase_options.dart' 
 ; 
 // Initialize FirebaseApp 
 await 
  
 Firebase 
 . 
 initializeApp 
 ( 
  
 options: 
  
 DefaultFirebaseOptions 
 . 
 currentPlatform 
 , 
 ); 
 // Initialize the Gemini Developer API backend service 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 final 
  
 model 
  
 = 
  
 FirebaseAI 
 . 
 googleAI 
 (). 
 generativeModel 
 ( 
 model: 
  
 'gemini-2.5-flash' 
 ); 
  final 
  
 ( 
 firstImage 
 , 
  
 secondImage 
 ) 
  
 = 
  
 await 
  
 ( 
  
 File 
 ( 
 'image0.jpg' 
 ). 
 readAsBytes 
 (), 
  
 File 
 ( 
 'image1.jpg' 
 ). 
 readAsBytes 
 () 
 ). 
 wait 
 ; 
 // Provide a text prompt to include with the images 
 final 
  
 prompt 
  
 = 
  
 TextPart 
 ( 
 "What's different between these pictures?" 
 ); 
 // Prepare images for input 
 final 
  
 imageParts 
  
 = 
  
 [ 
  
 InlineDataPart 
 ( 
 'image/jpeg' 
 , 
  
 firstImage 
 ), 
  
 InlineDataPart 
 ( 
 'image/jpeg' 
 , 
  
 secondImage 
 ), 
 ]; 
 // To stream generated text output, call generateContentStream with the text and images 
 final 
  
 response 
  
 = 
  
 await 
  
 model 
 . 
 generateContentStream 
 ([ 
  
 Content 
 . 
 multi 
 ([ 
 prompt 
 , 
  
 ... 
 imageParts 
 ]) 
 ]); 
 await 
  
 for 
  
 ( 
 final 
  
 chunk 
  
 in 
  
 response 
 ) 
  
 { 
  
 print 
 ( 
 chunk 
 . 
 text 
 ); 
 }

Unity

You can call GenerateContentStreamAsync() to stream generated text from multimodal input of text and images.

Single file input

  using 
  
 Firebase 
 ; 
 using 
  
 Firebase.AI 
 ; 
 // Initialize the Gemini Developer API backend service 
 var 
  
 ai 
  
 = 
  
 FirebaseAI 
 . 
 GetInstance 
 ( 
 FirebaseAI 
 . 
 Backend 
 . 
 GoogleAI 
 ()); 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 var 
  
 model 
  
 = 
  
 ai 
 . 
 GetGenerativeModel 
 ( 
 modelName 
 : 
  
 "gemini-2.5-flash" 
 ); 
  // Convert a Texture2D into InlineDataParts 
 var 
  
 gray 
  
 = 
  
 ModelContent 
 . 
 InlineData 
 ( 
 "image/png" 
 , 
  
 UnityEngine 
 . 
 ImageConversion 
 . 
 EncodeToPNG 
 ( 
 UnityEngine 
 . 
 Texture2D 
 . 
 grayTexture 
 )); 
 // Provide a text prompt to include with the image 
 var 
  
 prompt 
  
 = 
  
 ModelContent 
 . 
 Text 
 ( 
 "What's in this picture?" 
 ); 
 // To stream generated text output, call GenerateContentStreamAsync and pass in the prompt 
 var 
  
 responseStream 
  
 = 
  
 model 
 . 
 GenerateContentStreamAsync 
 ( 
 new 
  
 [] 
  
 { 
  
 gray 
 , 
  
 prompt 
  
 }); 
 await 
  
 foreach 
  
 ( 
 var 
  
 response 
  
 in 
  
 responseStream 
 ) 
  
 { 
  
 if 
  
 ( 
 ! 
 string 
 . 
 IsNullOrWhiteSpace 
 ( 
 response 
 . 
 Text 
 )) 
  
 { 
  
 UnityEngine 
 . 
 Debug 
 . 
 Log 
 ( 
 response 
 . 
 Text 
 ); 
  
 } 
 }

Multiple file input

  using 
  
 Firebase 
 ; 
 using 
  
 Firebase.AI 
 ; 
 // Initialize the Gemini Developer API backend service 
 var 
  
 ai 
  
 = 
  
 FirebaseAI 
 . 
 GetInstance 
 ( 
 FirebaseAI 
 . 
 Backend 
 . 
 GoogleAI 
 ()); 
 // Create a `GenerativeModel` instance with a model that supports your use case 
 var 
  
 model 
  
 = 
  
 ai 
 . 
 GetGenerativeModel 
 ( 
 modelName 
 : 
  
 "gemini-2.5-flash" 
 ); 
  // Convert Texture2Ds into InlineDataParts 
 var 
  
 black 
  
 = 
  
 ModelContent 
 . 
 InlineData 
 ( 
 "image/png" 
 , 
  
 UnityEngine 
 . 
 ImageConversion 
 . 
 EncodeToPNG 
 ( 
 UnityEngine 
 . 
 Texture2D 
 . 
 blackTexture 
 )); 
 var 
  
 white 
  
 = 
  
 ModelContent 
 . 
 InlineData 
 ( 
 "image/png" 
 , 
  
 UnityEngine 
 . 
 ImageConversion 
 . 
 EncodeToPNG 
 ( 
 UnityEngine 
 . 
 Texture2D 
 . 
 whiteTexture 
 )); 
 // Provide a text prompt to include with the images 
 var 
  
 prompt 
  
 = 
  
 ModelContent 
 . 
 Text 
 ( 
 "What's different between these pictures?" 
 ); 
 // To stream generated text output, call GenerateContentStreamAsync and pass in the prompt 
 var 
  
 responseStream 
  
 = 
  
 model 
 . 
 GenerateContentStreamAsync 
 ( 
 new 
  
 [] 
  
 { 
  
 black 
 , 
  
 white 
 , 
  
 prompt 
  
 }); 
 await 
  
 foreach 
  
 ( 
 var 
  
 response 
  
 in 
  
 responseStream 
 ) 
  
 { 
  
 if 
  
 ( 
 ! 
 string 
 . 
 IsNullOrWhiteSpace 
 ( 
 response 
 . 
 Text 
 )) 
  
 { 
  
 UnityEngine 
 . 
 Debug 
 . 
 Log 
 ( 
 response 
 . 
 Text 
 ); 
  
 } 
 }

Learn how to choose a model appropriate for your use case and app.

Requirements and recommendations for input image files

Note that a file provided as inline data is encoded to base64 in transit, which increases the size of the request. You get an HTTP 413 error if a request is too large.

See "Supported input files and requirements" page to learn detailed information about the following:

Different options for providing a file in a request (either inline or using the file's URL)
Requirements and best practices for image files

Supported image MIME types

Gemini multimodal models support the following image MIME types:

PNG - image/png
JPEG - image/jpeg
WebP - image/webp

Limits per request

There isn't a specific limit to the number of pixels in an image. However, larger images are scaled down and padded to fit a maximum resolution of 3072 x 3072 while preserving their original aspect ratio.

Maximum files per request: 3,000 image files

What else can you do?

Learn how to count tokens before sending long prompts to the model.
Set up Cloud Storage for Firebase so that you can include large files in your multimodal requests and have a more managed solution for providing files in prompts. Files can include images, PDFs, video, and audio.
Start thinking about preparing for production (see the production checklist ), including:
- Setting up Firebase App Check to protect the Gemini API from abuse by unauthorized clients.
- Integrating Firebase Remote Config to update values in your app (like model name) without releasing a new app version.

Try out other capabilities

Build multi-turn conversations (chat) .
Generate text from text-only prompts .
Generate structured output (like JSON) from both text and multimodal prompts.
Generate images from text prompts ( Gemini or Imagen ).
Use tools (like function calling and grounding with Google Search ) to connect a Gemini model to other parts of your app and external systems and information.

Learn how to control content generation

Understand prompt design , including best practices, strategies, and example prompts.
Configure model parameters like temperature and maximum output tokens (for Gemini ) or aspect ratio and person generation (for Imagen ).
Use safety settings to adjust the likelihood of getting responses that may be considered harmful.

You can also experiment with prompts and model configurations and even get a generated code snippet using Google AI Studio .

Learn more about the supported models

Learn about the models available for various use cases and their quotas and pricing .

Give feedback about your experience with Firebase AI Logic

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-18 UTC.

Design a Mobile Site

View Site in Mobile | Classic

Share by:

Analyze image files using the Gemini API Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Generate text from image files (base64-encoded)

Swift

Single file input

Multiple file input

Kotlin

Single file input

Multiple file input

Java

Single file input

Multiple file input

Web

Single file input

Multiple file input

Dart

Single file input

Multiple file input

Unity

Single file input

Multiple file input

Stream the response

View example: Stream generated text from image files

Swift

Single file input

Multiple file input

Kotlin

Single file input

Multiple file input

Java

Single file input

Multiple file input

Web

Single file input

Multiple file input

Dart

Single file input

Multiple file input

Unity

Single file input

Multiple file input

Requirements and recommendations for input image files

Supported image MIME types

Limits per request

What else can you do?

Try out other capabilities

Learn how to control content generation

Learn more about the supported models

Analyze image files using the Gemini API