Skip to main content
The latest Gemini models, like Gemini 3.1 Flash Image
( Nano Banana 2
), are available to use with Firebase AI Logic! Learn more.
Gemini 2.0 Flash and Flash-Lite models will shut down on June 1, 2026
. To avoid service disruption, update to a newer model like gemini-3.1-flash-lite
. Learn more.
All Imagen models will shut down on June 24, 2026
. Learn about migrating your apps to use Nano Banana.
Send feedback
Analyze image files using the Gemini API Stay organized with collections
Save and categorize content based on your preferences.
You can ask a Gemini
model to analyze image files that you provide
either inline (base64-encoded) or via URL. When you use Firebase AI Logic
,
you can make this request directly from your app.
With this capability, you can do things like:
Create captions or answer questions about images
Write a short story or a poem about an image
Detect objects in an image and return bounding box coordinates for them
Label or categorize a set of images for sentiment, style, or other
characteristic
arrow_downward
Jump to code samples
arrow_downward
Jump to code for streamed responses
Before you begin
Click your Gemini API
provider to view provider-specific content
and code on this page.
Gemini Developer API
Vertex AI Gemini API
If you haven't already, complete the getting started guide
, which describes how to
set up your Firebase project, connect your app to Firebase, add the SDK,
initialize the backend service for your chosen Gemini API
provider, and
create a GenerativeModel
instance.
All our docs assume that you're using the latest versions
of the Firebase AI Logic
SDKs.
For testing and iterating on your prompts, we recommend using Google AI Studio
.
Need a sample image file?
You can use this publicly available file with a MIME type of image/jpeg
( view or download file
). https://storage.googleapis.com/cloud-samples-data/generative-ai/image/scones.jpg
Note: Firebase AI Logic
does not
yet support configuring the input media
resolution, but that feature is coming soon!
Generate text from image files (base64-encoded)
Before trying this sample, complete the Before you begin
section of this guide
to set up your project and app. In that section, you'll also click a button for your chosen Gemini API
provider so that you see provider-specific content
on this page .
You can ask a Gemini
model to
generate text by prompting with text and images—providing each
input file's mimeType
and the file itself. Find requirements and recommendations for input files
later on this page.
Swift
You can call generateContent()
to generate text from multimodal input of text and images.
Single file input
import
FirebaseAILogic
// Initialize the Gemini Developer API backend service
let
ai
=
FirebaseAI
.
firebaseAI
(
backend
:
.
googleAI
())
// Create a `GenerativeModel` instance with a model that supports your use case
let
model
=
ai
.
generativeModel
(
modelName
:
"gemini-3.5-flash"
)
guard
let
image
=
UIImage
(
systemName
:
"bicycle"
)
else
{
fatalError
()
}
// Provide a text prompt to include with the image
let
prompt
=
"What's in this picture?"
// To generate text output, call generateContent and pass in the prompt
let
response
=
try
await
model
.
generateContent
(
image
,
prompt
)
print
(
response
.
text
??
"No text in response."
)
Multiple file input
import
FirebaseAILogic
// Initialize the Gemini Developer API backend service
let
ai
=
FirebaseAI
.
firebaseAI
(
backend
:
.
googleAI
())
// Create a `GenerativeModel` instance with a model that supports your use case
let
model
=
ai
.
generativeModel
(
modelName
:
"gemini-3.5-flash"
)
guard
let
image1
=
UIImage
(
systemName
:
"car"
)
else
{
fatalError
()
}
guard
let
image2
=
UIImage
(
systemName
:
"car.2"
)
else
{
fatalError
()
}
// Provide a text prompt to include with the images
let
prompt
=
"What's different between these pictures?"
// To generate text output, call generateContent and pass in the prompt
let
response
=
try
await
model
.
generateContent
(
image1
,
image2
,
prompt
)
print
(
response
.
text
??
"No text in response."
)
Note
: The example above takes advantage of a
simplified way to handle platform-native image types provided as inline data,
which means the MIME type doesn't need to be specified. Learn more
about this feature and options for providing images inline.
Kotlin
You can call generateContent()
to generate text from multimodal input of text and images.
For Kotlin, the methods in this SDK are suspend functions and need to be called
from a Coroutine scope
.
Single file input
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val
model
=
Firebase
.
ai
(
backend
=
GenerativeBackend
.
googleAI
())
.
generativeModel
(
"gemini-3.5-flash"
)
// Loads an image from the app/res/drawable/ directory
val
bitmap
:
Bitmap
=
BitmapFactory
.
decodeResource
(
resources
,
R
.
drawable
.
sparky
)
// Provide a prompt that includes the image specified above and text
val
prompt
=
content
{
image
(
bitmap
)
text
(
"What developer tool is this mascot from?"
)
}
// To generate text output, call generateContent with the prompt
val
response
=
model
.
generateContent
(
prompt
)
print
(
response
.
text
)
Multiple file input
For Kotlin, the methods in this SDK are suspend functions and need to be called
from a Coroutine scope
.
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val
model
=
Firebase
.
ai
(
backend
=
GenerativeBackend
.
googleAI
())
.
generativeModel
(
"gemini-3.5-flash"
)
// Loads an image from the app/res/drawable/ directory
val
bitmap1
:
Bitmap
=
BitmapFactory
.
decodeResource
(
resources
,
R
.
drawable
.
sparky
)
val
bitmap2
:
Bitmap
=
BitmapFactory
.
decodeResource
(
resources
,
R
.
drawable
.
sparky_eats_pizza
)
// Provide a prompt that includes the images specified above and text
val
prompt
=
content
{
image
(
bitmap1
)
image
(
bitmap2
)
text
(
"What is different between these pictures?"
)
}
// To generate text output, call generateContent with the prompt
val
response
=
model
.
generateContent
(
prompt
)
print
(
response
.
text
)
Note
: The example above takes advantage of a
simplified way to handle platform-native image types provided as inline data,
which means the MIME type doesn't need to be specified. Learn more
about this feature and options for providing images inline.
Java
You can call generateContent()
to generate text from multimodal input of text and images.
For Java, the methods in this SDK return a ListenableFuture
.
Single file input
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel
ai
=
FirebaseAI
.
getInstance
(
GenerativeBackend
.
googleAI
())
.
generativeModel
(
"gemini-3.5-flash"
);
// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures
model
=
GenerativeModelFutures
.
from
(
ai
);
Bitmap
bitmap
=
BitmapFactory
.
decodeResource
(
getResources
(),
R
.
drawable
.
sparky
);
// Provide a prompt that includes the image specified above and text
Content
content
=
new
Content
.
Builder
()
.
addImage
(
bitmap
)
.
addText
(
"What developer tool is this mascot from?"
)
.
build
();
// To generate text output, call generateContent with the prompt
ListenableFuture<GenerateContentResponse>
response
=
model
.
generateContent
(
content
);
Futures
.
addCallback
(
response
,
new
FutureCallback<GenerateContentResponse>
()
{
@Override
public
void
onSuccess
(
GenerateContentResponse
result
)
{
String
resultText
=
result
.
getText
();
System
.
out
.
println
(
resultText
);
}
@Override
public
void
onFailure
(
Throwable
t
)
{
t
.
printStackTrace
();
}
},
executor
);
Multiple file input
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel
ai
=
FirebaseAI
.
getInstance
(
GenerativeBackend
.
googleAI
())
.
generativeModel
(
"gemini-3.5-flash"
);
// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures
model
=
GenerativeModelFutures
.
from
(
ai
);
Bitmap
bitmap1
=
BitmapFactory
.
decodeResource
(
getResources
(),
R
.
drawable
.
sparky
);
Bitmap
bitmap2
=
BitmapFactory
.
decodeResource
(
getResources
(),
R
.
drawable
.
sparky_eats_pizza
);
// Provide a prompt that includes the images specified above and text
Content
prompt
=
new
Content
.
Builder
()
.
addImage
(
bitmap1
)
.
addImage
(
bitmap2
)
.
addText
(
"What's different between these pictures?"
)
.
build
();
// To generate text output, call generateContent with the prompt
ListenableFuture<GenerateContentResponse>
response
=
model
.
generateContent
(
prompt
);
Futures
.
addCallback
(
response
,
new
FutureCallback<GenerateContentResponse>
()
{
@Override
public
void
onSuccess
(
GenerateContentResponse
result
)
{
String
resultText
=
result
.
getText
();
System
.
out
.
println
(
resultText
);
}
@Override
public
void
onFailure
(
Throwable
t
)
{
t
.
printStackTrace
();
}
},
executor
);
Note
: The example above takes advantage of a
simplified way to handle platform-native image types provided as inline data,
which means the MIME type doesn't need to be specified. Learn more
about this feature and options for providing images inline.
Web
You can call generateContent()
to generate text from multimodal input of text and images.
Single file input
import
{
initializeApp
}
from
"firebase/app"
;
import
{
getAI
,
getGenerativeModel
,
GoogleAIBackend
}
from
"firebase/ai"
;
// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const
firebaseConfig
=
{
// ...
};
// Initialize FirebaseApp
const
firebaseApp
=
initializeApp
(
firebaseConfig
);
// Initialize the Gemini Developer API backend service
const
ai
=
getAI
(
firebaseApp
,
{
backend
:
new
GoogleAIBackend
()
});
// Create a `GenerativeModel` instance with a model that supports your use case
const
model
=
getGenerativeModel
(
ai
,
{
model
:
"gemini-3.5-flash"
});
// Converts a File object to a Part object.
async
function
fileToGenerativePart
(
file
)
{
const
base64EncodedDataPromise
=
new
Promise
((
resolve
)
=>
{
const
reader
=
new
FileReader
();
reader
.
onloadend
=
()
=>
resolve
(
reader
.
result
.
split
(
','
)[
1
]);
reader
.
readAsDataURL
(
file
);
});
return
{
inlineData
:
{
data
:
await
base64EncodedDataPromise
,
mimeType
:
file
.
type
},
};
}
async
function
run
()
{
// Provide a text prompt to include with the image
const
prompt
=
"What do you see?"
;
const
fileInputEl
=
document
.
querySelector
(
"input[type=file]"
);
const
imagePart
=
await
fileToGenerativePart
(
fileInputEl
.
files
[
0
]);
// To generate text output, call generateContent with the text and image
const
result
=
await
model
.
generateContent
([
prompt
,
imagePart
]);
const
response
=
result
.
response
;
const
text
=
response
.
text
();
console
.
log
(
text
);
}
run
();
Multiple file input
import
{
initializeApp
}
from
"firebase/app"
;
import
{
getAI
,
getGenerativeModel
,
GoogleAIBackend
}
from
"firebase/ai"
;
// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const
firebaseConfig
=
{
// ...
};
// Initialize FirebaseApp
const
firebaseApp
=
initializeApp
(
firebaseConfig
);
// Initialize the Gemini Developer API backend service
const
ai
=
getAI
(
firebaseApp
,
{
backend
:
new
GoogleAIBackend
()
});
// Create a `GenerativeModel` instance with a model that supports your use case
const
model
=
getGenerativeModel
(
ai
,
{
model
:
"gemini-3.5-flash"
});
// Converts a File object to a Part object.
async
function
fileToGenerativePart
(
file
)
{
const
base64EncodedDataPromise
=
new
Promise
((
resolve
)
=>
{
const
reader
=
new
FileReader
();
reader
.
onloadend
=
()
=>
resolve
(
reader
.
result
.
split
(
','
)[
1
]);
reader
.
readAsDataURL
(
file
);
});
return
{
inlineData
:
{
data
:
await
base64EncodedDataPromise
,
mimeType
:
file
.
type
},
};
}
async
function
run
()
{
// Provide a text prompt to include with the images
const
prompt
=
"What's different between these pictures?"
;
// Prepare images for input
const
fileInputEl
=
document
.
querySelector
(
"input[type=file]"
);
const
imageParts
=
await
Promise
.
all
(
[...
fileInputEl
.
files
].
map
(
fileToGenerativePart
)
);
// To generate text output, call generateContent with the text and images
const
result
=
await
model
.
generateContent
([
prompt
,
...
imageParts
]);
const
response
=
result
.
response
;
const
text
=
response
.
text
();
console
.
log
(
text
);
}
run
();
Dart
You can call generateContent()
to generate text from multimodal input of text and images.
Single file input
import
'package:firebase_ai/firebase_ai.dart'
;
import
'package:firebase_core/firebase_core.dart'
;
import
'firebase_options.dart'
;
// Initialize FirebaseApp
await
Firebase
.
initializeApp
(
options:
DefaultFirebaseOptions
.
currentPlatform
,
);
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final
model
=
FirebaseAI
.
googleAI
().
generativeModel
(
model:
'gemini-3.5-flash'
);
// Provide a text prompt to include with the image
final
prompt
=
TextPart
(
"What's in the picture?"
);
// Prepare images for input
final
image
=
await
File
(
'image0.jpg'
).
readAsBytes
();
final
imagePart
=
InlineDataPart
(
'image/jpeg'
,
image
);
// To generate text output, call generateContent with the text and image
final
response
=
await
model
.
generateContent
([
Content
.
multi
([
prompt
,
imagePart
])
]);
print
(
response
.
text
);
Multiple file input
import
'package:firebase_ai/firebase_ai.dart'
;
import
'package:firebase_core/firebase_core.dart'
;
import
'firebase_options.dart'
;
// Initialize FirebaseApp
await
Firebase
.
initializeApp
(
options:
DefaultFirebaseOptions
.
currentPlatform
,
);
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final
model
=
FirebaseAI
.
googleAI
().
generativeModel
(
model:
'gemini-3.5-flash'
);
final
(
firstImage
,
secondImage
)
=
await
(
File
(
'image0.jpg'
).
readAsBytes
(),
File
(
'image1.jpg'
).
readAsBytes
()
).
wait
;
// Provide a text prompt to include with the images
final
prompt
=
TextPart
(
"What's different between these pictures?"
);
// Prepare images for input
final
imageParts
=
[
InlineDataPart
(
'image/jpeg'
,
firstImage
),
InlineDataPart
(
'image/jpeg'
,
secondImage
),
];
// To generate text output, call generateContent with the text and images
final
response
=
await
model
.
generateContent
([
Content
.
multi
([
prompt
,
...
imageParts
])
]);
print
(
response
.
text
);
Unity
You can call GenerateContentAsync()
to generate text from multimodal input of text and images.
Single file input
using
Firebase
;
using
Firebase.AI
;
// Initialize the Gemini Developer API backend service
var
ai
=
FirebaseAI
.
GetInstance
(
FirebaseAI
.
Backend
.
GoogleAI
());
// Create a `GenerativeModel` instance with a model that supports your use case
var
model
=
ai
.
GetGenerativeModel
(
modelName
:
"gemini-3.5-flash"
);
// Convert a Texture2D into InlineDataParts
var
grayImage
=
ModelContent
.
InlineData
(
"image/png"
,
UnityEngine
.
ImageConversion
.
EncodeToPNG
(
UnityEngine
.
Texture2D
.
grayTexture
));
// Provide a text prompt to include with the image
var
prompt
=
ModelContent
.
Text
(
"What's in this picture?"
);
// To generate text output, call GenerateContentAsync and pass in the prompt
var
response
=
await
model
.
GenerateContentAsync
(
new
[]
{
grayImage
,
prompt
});
UnityEngine
.
Debug
.
Log
(
response
.
Text
??
"No text in response."
);
Multiple file input
using
Firebase
;
using
Firebase.AI
;
// Initialize the Gemini Developer API backend service
var
ai
=
FirebaseAI
.
GetInstance
(
FirebaseAI
.
Backend
.
GoogleAI
());
// Create a `GenerativeModel` instance with a model that supports your use case
var
model
=
ai
.
GetGenerativeModel
(
modelName
:
"gemini-3.5-flash"
);
// Convert Texture2Ds into InlineDataParts
var
blackImage
=
ModelContent
.
InlineData
(
"image/png"
,
UnityEngine
.
ImageConversion
.
EncodeToPNG
(
UnityEngine
.
Texture2D
.
blackTexture
));
var
whiteImage
=
ModelContent
.
InlineData
(
"image/png"
,
UnityEngine
.
ImageConversion
.
EncodeToPNG
(
UnityEngine
.
Texture2D
.
whiteTexture
));
// Provide a text prompt to include with the images
var
prompt
=
ModelContent
.
Text
(
"What's different between these pictures?"
);
// To generate text output, call GenerateContentAsync and pass in the prompt
var
response
=
await
model
.
GenerateContentAsync
(
new
[]
{
blackImage
,
whiteImage
,
prompt
});
UnityEngine
.
Debug
.
Log
(
response
.
Text
??
"No text in response."
);
Learn how to choose a model
appropriate for your use case and app.
Stream the response
Before trying this sample, complete the Before you begin
section of this guide
to set up your project and app. In that section, you'll also click a button for your chosen Gemini API
provider so that you see provider-specific content
on this page .
You can achieve faster interactions by not waiting for the entire result from
the model generation, and instead use streaming to handle partial results.
To stream the response, call generateContentStream
.
View example: Stream generated text from image files
Swift
You can call generateContentStream()
to stream generated text from multimodal input of text and images.
Single file input
import
FirebaseAILogic
// Initialize the Gemini Developer API backend service
let
ai
=
FirebaseAI
.
firebaseAI
(
backend
:
.
googleAI
())
// Create a `GenerativeModel` instance with a model that supports your use case
let
model
=
ai
.
generativeModel
(
modelName
:
"gemini-3.5-flash"
)
guard
let
image
=
UIImage
(
systemName
:
"bicycle"
)
else
{
fatalError
()
}
// Provide a text prompt to include with the image
let
prompt
=
"What's in this picture?"
// To stream generated text output, call generateContentStream and pass in the prompt
let
contentStream
=
try
model
.
generateContentStream
(
image
,
prompt
)
for
try
await
chunk
in
contentStream
{
if
let
text
=
chunk
.
text
{
print
(
text
)
}
}
Multiple file input
import
FirebaseAILogic
// Initialize the Gemini Developer API backend service
let
ai
=
FirebaseAI
.
firebaseAI
(
backend
:
.
googleAI
())
// Create a `GenerativeModel` instance with a model that supports your use case
let
model
=
ai
.
generativeModel
(
modelName
:
"gemini-3.5-flash"
)
guard
let
image1
=
UIImage
(
systemName
:
"car"
)
else
{
fatalError
()
}
guard
let
image2
=
UIImage
(
systemName
:
"car.2"
)
else
{
fatalError
()
}
// Provide a text prompt to include with the images
let
prompt
=
"What's different between these pictures?"
// To stream generated text output, call generateContentStream and pass in the prompt
let
contentStream
=
try
model
.
generateContentStream
(
image1
,
image2
,
prompt
)
for
try
await
chunk
in
contentStream
{
if
let
text
=
chunk
.
text
{
print
(
text
)
}
}
Note
: The example above takes advantage of a
simplified way to handle platform-native image types provided as inline data,
which means the MIME type doesn't need to be specified. Learn more
about this feature and options for providing images inline.
Kotlin
You can call generateContentStream()
to stream generated text from multimodal input of text and images.
For Kotlin, the methods in this SDK are suspend functions and need to be called
from a Coroutine scope
.
Single file input
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val
model
=
Firebase
.
ai
(
backend
=
GenerativeBackend
.
googleAI
())
.
generativeModel
(
"gemini-3.5-flash"
)
// Loads an image from the app/res/drawable/ directory
val
bitmap
:
Bitmap
=
BitmapFactory
.
decodeResource
(
resources
,
R
.
drawable
.
sparky
)
// Provide a prompt that includes the image specified above and text
val
prompt
=
content
{
image
(
bitmap
)
text
(
"What developer tool is this mascot from?"
)
}
// To stream generated text output, call generateContentStream with the prompt
var
fullResponse
=
""
model
.
generateContentStream
(
prompt
).
collect
{
chunk
->
print
(
chunk
.
text
)
fullResponse
+=
chunk
.
text
}
Multiple file input
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val
model
=
Firebase
.
ai
(
backend
=
GenerativeBackend
.
googleAI
())
.
generativeModel
(
"gemini-3.5-flash"
)
// Loads an image from the app/res/drawable/ directory
val
bitmap1
:
Bitmap
=
BitmapFactory
.
decodeResource
(
resources
,
R
.
drawable
.
sparky
)
val
bitmap2
:
Bitmap
=
BitmapFactory
.
decodeResource
(
resources
,
R
.
drawable
.
sparky_eats_pizza
)
// Provide a prompt that includes the images specified above and text
val
prompt
=
content
{
image
(
bitmap1
)
image
(
bitmap2
)
text
(
"What's different between these pictures?"
)
}
// To stream generated text output, call generateContentStream with the prompt
var
fullResponse
=
""
model
.
generateContentStream
(
prompt
).
collect
{
chunk
->
print
(
chunk
.
text
)
fullResponse
+=
chunk
.
text
}
Note
: The example above takes advantage of a
simplified way to handle platform-native image types provided as inline data,
which means the MIME type doesn't need to be specified. Learn more
about this feature and options for providing images inline.
Java
You can call generateContentStream()
to stream generated text from multimodal input of text and images.
For Java, the streaming methods in this SDK return a Publisher
type from the Reactive Streams library
.
Single file input
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel
ai
=
FirebaseAI
.
getInstance
(
GenerativeBackend
.
googleAI
())
.
generativeModel
(
"gemini-3.5-flash"
);
// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures
model
=
GenerativeModelFutures
.
from
(
ai
);
Bitmap
bitmap
=
BitmapFactory
.
decodeResource
(
getResources
(),
R
.
drawable
.
sparky
);
// Provide a prompt that includes the image specified above and text
Content
prompt
=
new
Content
.
Builder
()
.
addImage
(
bitmap
)
.
addText
(
"What developer tool is this mascot from?"
)
.
build
();
// To stream generated text output, call generateContentStream with the prompt
Publisher<GenerateContentResponse>
streamingResponse
=
model
.
generateContentStream
(
prompt
);
final
String
[]
fullResponse
=
{
""
};
streamingResponse
.
subscribe
(
new
Subscriber<GenerateContentResponse>
()
{
@Override
public
void
onNext
(
GenerateContentResponse
generateContentResponse
)
{
String
chunk
=
generateContentResponse
.
getText
();
fullResponse
[
0
]
+=
chunk
;
}
@Override
public
void
onComplete
()
{
System
.
out
.
println
(
fullResponse
[
0
]
);
}
@Override
public
void
onError
(
Throwable
t
)
{
t
.
printStackTrace
();
}
@Override
public
void
onSubscribe
(
Subscription
s
)
{
}
});
Multiple file input
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel
ai
=
FirebaseAI
.
getInstance
(
GenerativeBackend
.
googleAI
())
.
generativeModel
(
"gemini-3.5-flash"
);
// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures
model
=
GenerativeModelFutures
.
from
(
ai
);
Bitmap
bitmap1
=
BitmapFactory
.
decodeResource
(
getResources
(),
R
.
drawable
.
sparky
);
Bitmap
bitmap2
=
BitmapFactory
.
decodeResource
(
getResources
(),
R
.
drawable
.
sparky_eats_pizza
);
// Provide a prompt that includes the images specified above and text
Content
prompt
=
new
Content
.
Builder
()
.
addImage
(
bitmap1
)
.
addImage
(
bitmap2
)
.
addText
(
"What's different between these pictures?"
)
.
build
();
// To stream generated text output, call generateContentStream with the prompt
Publisher<GenerateContentResponse>
streamingResponse
=
model
.
generateContentStream
(
prompt
);
final
String
[]
fullResponse
=
{
""
};
streamingResponse
.
subscribe
(
new
Subscriber<GenerateContentResponse>
()
{
@Override
public
void
onNext
(
GenerateContentResponse
generateContentResponse
)
{
String
chunk
=
generateContentResponse
.
getText
();
fullResponse
[
0
]
+=
chunk
;
}
@Override
public
void
onComplete
()
{
System
.
out
.
println
(
fullResponse
[
0
]
);
}
@Override
public
void
onError
(
Throwable
t
)
{
t
.
printStackTrace
();
}
@Override
public
void
onSubscribe
(
Subscription
s
)
{
}
});
Note
: The example above takes advantage of a
simplified way to handle platform-native image types provided as inline data,
which means the MIME type doesn't need to be specified. Learn more
about this feature and options for providing images inline.
Web
You can call generateContentStream()
to stream generated text from multimodal input of text and images.
Single file input
import
{
initializeApp
}
from
"firebase/app"
;
import
{
getAI
,
getGenerativeModel
,
GoogleAIBackend
}
from
"firebase/ai"
;
// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const
firebaseConfig
=
{
// ...
};
// Initialize FirebaseApp
const
firebaseApp
=
initializeApp
(
firebaseConfig
);
// Initialize the Gemini Developer API backend service
const
ai
=
getAI
(
firebaseApp
,
{
backend
:
new
GoogleAIBackend
()
});
// Create a `GenerativeModel` instance with a model that supports your use case
const
model
=
getGenerativeModel
(
ai
,
{
model
:
"gemini-3.5-flash"
});
// Converts a File object to a Part object.
async
function
fileToGenerativePart
(
file
)
{
const
base64EncodedDataPromise
=
new
Promise
((
resolve
)
=>
{
const
reader
=
new
FileReader
();
reader
.
onloadend
=
()
=>
resolve
(
reader
.
result
.
split
(
','
)[
1
]);
reader
.
readAsDataURL
(
file
);
});
return
{
inlineData
:
{
data
:
await
base64EncodedDataPromise
,
mimeType
:
file
.
type
},
};
}
async
function
run
()
{
// Provide a text prompt to include with the image
const
prompt
=
"What do you see?"
;
// Prepare image for input
const
fileInputEl
=
document
.
querySelector
(
"input[type=file]"
);
const
imagePart
=
await
fileToGenerativePart
(
fileInputEl
.
files
[
0
]);
// To stream generated text output, call generateContentStream with the text and image
const
result
=
await
model
.
generateContentStream
([
prompt
,
imagePart
]);
for
await
(
const
chunk
of
result
.
stream
)
{
const
chunkText
=
chunk
.
text
();
console
.
log
(
chunkText
);
}
}
run
();
Multiple file input
import
{
initializeApp
}
from
"firebase/app"
;
import
{
getAI
,
getGenerativeModel
,
GoogleAIBackend
}
from
"firebase/ai"
;
// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const
firebaseConfig
=
{
// ...
};
// Initialize FirebaseApp
const
firebaseApp
=
initializeApp
(
firebaseConfig
);
// Initialize the Gemini Developer API backend service
const
ai
=
getAI
(
firebaseApp
,
{
backend
:
new
GoogleAIBackend
()
});
// Create a `GenerativeModel` instance with a model that supports your use case
const
model
=
getGenerativeModel
(
ai
,
{
model
:
"gemini-3.5-flash"
});
// Converts a File object to a Part object.
async
function
fileToGenerativePart
(
file
)
{
const
base64EncodedDataPromise
=
new
Promise
((
resolve
)
=>
{
const
reader
=
new
FileReader
();
reader
.
onloadend
=
()
=>
resolve
(
reader
.
result
.
split
(
','
)[
1
]);
reader
.
readAsDataURL
(
file
);
});
return
{
inlineData
:
{
data
:
await
base64EncodedDataPromise
,
mimeType
:
file
.
type
},
};
}
async
function
run
()
{
// Provide a text prompt to include with the images
const
prompt
=
"What's different between these pictures?"
;
const
fileInputEl
=
document
.
querySelector
(
"input[type=file]"
);
const
imageParts
=
await
Promise
.
all
(
[...
fileInputEl
.
files
].
map
(
fileToGenerativePart
)
);
// To stream generated text output, call generateContentStream with the text and images
const
result
=
await
model
.
generateContentStream
([
prompt
,
...
imageParts
]);
for
await
(
const
chunk
of
result
.
stream
)
{
const
chunkText
=
chunk
.
text
();
console
.
log
(
chunkText
);
}
}
run
();
Dart
You can call generateContentStream()
to stream generated text from multimodal input of text and images.
Single file input
import
'package:firebase_ai/firebase_ai.dart'
;
import
'package:firebase_core/firebase_core.dart'
;
import
'firebase_options.dart'
;
// Initialize FirebaseApp
await
Firebase
.
initializeApp
(
options:
DefaultFirebaseOptions
.
currentPlatform
,
);
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final
model
=
FirebaseAI
.
googleAI
().
generativeModel
(
model:
'gemini-3.5-flash'
);
// Provide a text prompt to include with the image
final
prompt
=
TextPart
(
"What's in the picture?"
);
// Prepare images for input
final
image
=
await
File
(
'image0.jpg'
).
readAsBytes
();
final
imagePart
=
InlineDataPart
(
'image/jpeg'
,
image
);
// To stream generated text output, call generateContentStream with the text and image
final
response
=
await
model
.
generateContentStream
([
Content
.
multi
([
prompt
,
imagePart
])
]);
await
for
(
final
chunk
in
response
)
{
print
(
chunk
.
text
);
}
Multiple file input
import
'package:firebase_ai/firebase_ai.dart'
;
import
'package:firebase_core/firebase_core.dart'
;
import
'firebase_options.dart'
;
// Initialize FirebaseApp
await
Firebase
.
initializeApp
(
options:
DefaultFirebaseOptions
.
currentPlatform
,
);
// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final
model
=
FirebaseAI
.
googleAI
().
generativeModel
(
model:
'gemini-3.5-flash'
);
final
(
firstImage
,
secondImage
)
=
await
(
File
(
'image0.jpg'
).
readAsBytes
(),
File
(
'image1.jpg'
).
readAsBytes
()
).
wait
;
// Provide a text prompt to include with the images
final
prompt
=
TextPart
(
"What's different between these pictures?"
);
// Prepare images for input
final
imageParts
=
[
InlineDataPart
(
'image/jpeg'
,
firstImage
),
InlineDataPart
(
'image/jpeg'
,
secondImage
),
];
// To stream generated text output, call generateContentStream with the text and images
final
response
=
await
model
.
generateContentStream
([
Content
.
multi
([
prompt
,
...
imageParts
])
]);
await
for
(
final
chunk
in
response
)
{
print
(
chunk
.
text
);
}
Unity
You can call GenerateContentStreamAsync()
to stream generated text from multimodal input of text and images.
Single file input
using
Firebase
;
using
Firebase.AI
;
// Initialize the Gemini Developer API backend service
var
ai
=
FirebaseAI
.
GetInstance
(
FirebaseAI
.
Backend
.
GoogleAI
());
// Create a `GenerativeModel` instance with a model that supports your use case
var
model
=
ai
.
GetGenerativeModel
(
modelName
:
"gemini-3.5-flash"
);
// Convert a Texture2D into InlineDataParts
var
gray
=
ModelContent
.
InlineData
(
"image/png"
,
UnityEngine
.
ImageConversion
.
EncodeToPNG
(
UnityEngine
.
Texture2D
.
grayTexture
));
// Provide a text prompt to include with the image
var
prompt
=
ModelContent
.
Text
(
"What's in this picture?"
);
// To stream generated text output, call GenerateContentStreamAsync and pass in the prompt
var
responseStream
=
model
.
GenerateContentStreamAsync
(
new
[]
{
gray
,
prompt
});
await
foreach
(
var
response
in
responseStream
)
{
if
(
!
string
.
IsNullOrWhiteSpace
(
response
.
Text
))
{
UnityEngine
.
Debug
.
Log
(
response
.
Text
);
}
}
Multiple file input
using
Firebase
;
using
Firebase.AI
;
// Initialize the Gemini Developer API backend service
var
ai
=
FirebaseAI
.
GetInstance
(
FirebaseAI
.
Backend
.
GoogleAI
());
// Create a `GenerativeModel` instance with a model that supports your use case
var
model
=
ai
.
GetGenerativeModel
(
modelName
:
"gemini-3.5-flash"
);
// Convert Texture2Ds into InlineDataParts
var
black
=
ModelContent
.
InlineData
(
"image/png"
,
UnityEngine
.
ImageConversion
.
EncodeToPNG
(
UnityEngine
.
Texture2D
.
blackTexture
));
var
white
=
ModelContent
.
InlineData
(
"image/png"
,
UnityEngine
.
ImageConversion
.
EncodeToPNG
(
UnityEngine
.
Texture2D
.
whiteTexture
));
// Provide a text prompt to include with the images
var
prompt
=
ModelContent
.
Text
(
"What's different between these pictures?"
);
// To stream generated text output, call GenerateContentStreamAsync and pass in the prompt
var
responseStream
=
model
.
GenerateContentStreamAsync
(
new
[]
{
black
,
white
,
prompt
});
await
foreach
(
var
response
in
responseStream
)
{
if
(
!
string
.
IsNullOrWhiteSpace
(
response
.
Text
))
{
UnityEngine
.
Debug
.
Log
(
response
.
Text
);
}
}
Learn how to choose a model
appropriate for your use case and app.
Note that a file provided as inline data is encoded to base64 in transit, which
increases the size of the request. You get an HTTP 413 error if a request is
too large.
See "Supported input files and requirements" page to learn detailed information
about the following:
Supported image MIME types
Gemini
multimodal models support the following image MIME types:
PNG - image/png
JPEG - image/jpeg
WebP - image/webp
Limits per request
There isn't a specific limit to the number of pixels in an image. However,
larger images are scaled down and padded to fit a maximum resolution of 3072 x
3072 while preserving their original aspect ratio.
Maximum files per request: 3,000 image files
What else can you do?
Learn how to count tokens
before sending long prompts to the model.
Set up Cloud Storage for Firebase
so that you can include large files in your multimodal requests and have a
more managed solution for providing files in prompts.
Files can include images, PDFs, video, and audio.
Start thinking about preparing for production (see the production checklist
):
Try out other capabilities
Learn how to control content generation
You can also experiment with prompts and model configurations and even get a
generated code snippet using Google AI Studio
. Learn more about the supported models
Learn about the models available for various use cases
and their quotas
and pricing
.
Give feedback
about your experience with Firebase AI Logic
Send feedback
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
, and code samples are licensed under the Apache 2.0 License
. For details, see the Google Developers Site Policies
. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-05-27 UTC.
Need to tell us more?
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-05-27 UTC."],[],[]]