The latest Gemini models, like Gemini 3.1 Flash Image ( Nano Banana 2 ), are available to use with Firebase AI Logic on all platforms! Learn more.

Gemini 2.0 Flash and Flash-Lite models will be retired on June 1, 2026 . To avoid service disruption, update to a newer model like gemini-2.5-flash-lite . Learn more.

Build hybrid experiences in Web apps with on-device and cloud-hosted models

Build AI-powered web apps and features with hybrid inference using Firebase AI Logic . Hybrid inference enables running inference using on-device models when available and seamlessly falling back to cloud-hosted models otherwise (and vice versa).

This page describes how to get started using the client SDK . After completing this standard setup, check out the additional configuration options and capabilities (like structured output).

Note that on-device inference is supported for web apps running on Chrome on Desktop .

Jump to the code examples

Recommended use cases and supported capabilities

Recommended use cases:

Using an on-device model for inference offers:
- Enhanced privacy
- Local context
- Inference at no-cost
- Offline functionality
Using hybrid functionality offers:
- Reach 100% of your audience, regardless of on-device model availability or internet connectivity

Supported capabilities and features for on-device inference:

On-device inference only supports single-turn text generation ( not chat), with streaming or non-streaming output. It supports the following text-generation capabilities:

Generating text from text-only input
Generating text from text-and-image input , specifically input image types of JPEG and PNG

You can also generate structured output , including JSON and enums.

Before you begin

Take note of the following:

Inference using an on-device model uses the Prompt API from Chrome ; whereas inference using a cloud-hosted model uses your chosen Gemini API provider (either the Gemini Developer API or the Vertex AI Gemini API ).
This page describes how to get started developing using localhost(learn more about using APIs on localhost in the Chrome documentation).

After completing this standard setup, check out the additional configuration options and capabilities (like structured output).
After you've implemented your feature, you can enable end-users to try your feature in your actual app.

Get started on localhost

These get started steps describe the required general setup for any supported prompt request that you want to send.

Step 1: Set up Chrome and the Prompt API for on-device inference

Make sure you're using a recent version of Chrome. Update in chrome://settings/help .
On-device inference is available from Chrome v139 and higher.
Enable the on-device multimodal model by setting the following flag to Enabled:
- chrome://flags/#prompt-api-for-gemini-nano-multimodal-input
Restart Chrome.
(Optional) Download the on-device model before the first request.

The Prompt API is built into Chrome; however, the on-device model isn't available by default. If you haven't yet downloaded the model before your first request for on-device inference, the request will automatically start the model download in the background.

Note: Downloading the model can take several minutes, so waiting to auto-download with the first request can significantly delay receiving a response to that request.

View instructions to download the on-device model
1. Open Developer Tools > Console.
2. Run the following:
```
  await 
  
 LanguageModel 
 . 
 availability 
 (); 
 
```
3. Make sure that the output is available , downloading , or downloadable .
4. If the output is downloadable , start the model download by running:
```
  await 
  
 LanguageModel 
 . 
 create 
 (); 
 
```
5. You can use the following monitor callback to listen for download progress and make sure that the model is available before making requests:
```
  const 
  
 session 
  
 = 
  
 await 
  
 LanguageModel 
 . 
 create 
 ({ 
  
 monitor 
 ( 
 m 
 ) 
  
 { 
  
 m 
 . 
 addEventListener 
 ( 
 "downloadprogress" 
 , 
  
 ( 
 e 
 ) 
  
 = 
>  
 { 
  
 console 
 . 
 log 
 ( 
 `Downloaded 
 ${ 
 e 
 . 
 loaded 
  
 * 
  
 100 
 } 
 %` 
 ); 
  
 }); 
  
 }, 
 }); 
 
```

import { initializeApp } from "firebase/app" ; // TODO(developer) Replace the following with your app's Firebase configuration // See: https://firebase.google.com/docs/web/learn-more#config-object const firebaseConfig = { // ... }; // Initialize FirebaseApp const firebaseApp = initializeApp ( firebaseConfig );

import { initializeApp } from "firebase/app" ; import { getAI , getGenerativeModel , GoogleAIBackend , InferenceMode } from "firebase/ai" ; // TODO(developer) Replace the following with your app's Firebase configuration // See: https://firebase.google.com/docs/web/learn-more#config-object const firebaseConfig = { // ... }; // Initialize FirebaseApp const firebaseApp = initializeApp ( firebaseConfig ); // Initialize the Gemini Developer API backend service const ai = getAI ( firebaseApp , { backend : new GoogleAIBackend () }); // Create a `GenerativeModel` instance // Call `getGenerativeModel` after or on an end-user interaction // Set the mode (for example, use the on-device model if it's available) const model = getGenerativeModel ( ai , { mode : InferenceMode . PREFER_ON_DEVICE });

// Imports + initialization of FirebaseApp and backend service + creation of model instance // Wrap in an async function so you can use await async function run () { // Provide a prompt that contains text const prompt = "Write a story about a magic backpack." // To generate text output, call `generateContent` with the text input const result = await model . generateContent ( prompt ); const response = result . response ; const text = response . text (); console . log ( text ); } run ();

// Imports + initialization of FirebaseApp and backend service + creation of model instance // Converts a File object to a Part object. async function fileToGenerativePart ( file ) { const base64EncodedDataPromise = new Promise (( resolve ) = > { const reader = new FileReader (); reader . onloadend = () = > resolve ( reader . result . split ( ',' )[ 1 ]); reader . readAsDataURL ( file ); }); return { inlineData : { data : await base64EncodedDataPromise , mimeType : file . type }, }; } async function run () { // Provide a text prompt to include with the image const prompt = "Write a poem about this picture:" ; const fileInputEl = document . querySelector ( "input[type=file]" ); const imagePart = await fileToGenerativePart ( fileInputEl . files [ 0 ]); // To generate text output, call `generateContent` with the text and image const result = await model . generateContent ([ prompt , imagePart ]); const response = result . response ; const text = response . text (); console . log ( text ); } run ();

Enable end-users to try your feature

For end-users to try your feature in your app, you must enroll in the Chrome Origin Trials . Note that there's a limited duration and usage for these trials.

Provide the token on every web page for which you want the trial feature to be enabled. Use one of the following options:

Provide the token as a meta tag in the <head> tag: <meta http-equiv="origin-trial" content=" TOKEN ">
Provide the token as an HTTP header: Origin-Trial: TOKEN
Provide the token programmatically .

Features not yet available for on-device inference

As a preview release, not all the capabilities of the Web SDK are available for on-device inference. The following features are not yet supported for on-device inference (but they are usually available for cloud-based inference).

Generating text from image file input types other than JPEG and PNG

Can fallback to the cloud-hosted model; however, ONLY_ON_DEVICE mode will throw an error.

Generating text from audio, video, and documents (like PDFs) inputs

Can fallback to the cloud-hosted model; however, ONLY_ON_DEVICE mode will throw an error.

Generating images using Gemini or Imagen models

Can fallback to the cloud-hosted model; however, ONLY_ON_DEVICE mode will throw an error.

Providing files using URLs in multimodal requests. You must provide files as inline data to on-device models.

Multi-turn chat

Can fallback to the cloud-hosted model; however, ONLY_ON_DEVICE mode will throw an error.

Bi-directional streaming with the Gemini Live API

Providing the model with tools to help it generate its response (like function calling, code execution, URL context, and grounding with Google Search)

Count tokens

Always throws an error. The count will differ between cloud-hosted and on-device models, so there is no intuitive fallback.

AI monitoring in the Firebase console for on-device inference.

Note that any inference using the cloud-hosted models can be monitored just like other inference using the Firebase AI Logic client SDK for Web.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-03-20 UTC.

Build hybrid experiences in Web apps with on-device and cloud-hosted models

Recommended use cases and supported capabilities

Before you begin

Get started on localhost

Step 1: Set up Chrome and the Prompt API for on-device inference

Step 2: Set up a Firebase project and connect your app to Firebase

Step 3: Add the SDK

Step 4: Initialize the service and create a model instance

Step 5: Send a prompt request to a model

Generate text from text-only input

Generate text from text-and-image (multimodal) input

Enable end-users to try your feature

What else can you do?

Features not yet available for on-device inference

Build hybrid experiences in Web apps with on-device and cloud-hosted models Stay organized with collections Save and categorize content based on your preferences.

Recommended use cases and supported capabilities

Before you begin

Get started on localhost

Step 1: Set up Chrome and the Prompt API for on-device inference

Step 2: Set up a Firebase project and connect your app to Firebase

Step 3: Add the SDK

Step 4: Initialize the service and create a model instance

Step 5: Send a prompt request to a model

Generate text from text-only input

Generate text from text-and-image (multimodal) input

Enable end-users to try your feature

What else can you do?

Features not yet available for on-device inference

Build hybrid experiences in Web apps with on-device and cloud-hosted models