Build hybrid experiences with on-device and cloud-hosted models


Build AI-powered apps and features with hybrid inference using Firebase AI Logic . Hybrid inference enables running inference using on-device models when available and seamlessly falling back to cloud-hosted models otherwise.

With this release, hybrid inference is available using the Firebase AI Logic client SDK for Web with support for on-device inference for Chrome on Desktop.

Jump to the code examples

Recommended use cases and supported capabilities

Recommended use cases:

  • Using an on-device model for inference offers:

    • Enhanced privacy
    • Local context
    • Inference at no-cost
    • Offline functionality
  • Using hybrid functionality offers:

    • Reach 100% of your audience, regardless of on-device model availability

Supported capabilities and features for on-device inference:

  • Single-turn content generation, streaming and non-streaming
  • Generating text from text-only input
  • Generating text from text-and-image input, specifically input image types of JPEG and PNG
  • Generating structured output, including JSON and enums

Get started

This guide shows you how to get started using the Firebase AI Logic  SDK for Web to perform hybrid inference.

Inference using an on-device model uses the Prompt API from Chrome ; whereas inference using a cloud-hosted model uses your chosen Gemini API provider (either the Gemini Developer API or the Vertex AI Gemini API ).

Get started developing using localhost, as described in this section (you can also learn more about using APIs on localhost in the Chrome documentation). Then, once you've implemented your feature, you can optionally enable end-users to try out your feature .

Step 1: Set up Chrome and the Prompt API for on-device inference

  1. Make sure you're using a recent version of Chrome. Update in chrome://settings/help .
    On-device inference is available from Chrome v139 and higher.

  2. Enable the on-device multimodal model by setting the following flag to Enabled:

    • chrome://flags/#prompt-api-for-gemini-nano-multimodal-input
  3. Restart Chrome.

  4. (Optional) Download the on-device model before the first request.

    The Prompt API is built into Chrome; however, the on-device model isn't available by default. If you haven't yet downloaded the model before your first request for on-device inference, the request will automatically start the model download in the background.

    View instructions to download the on-device model

    1. Open Developer Tools > Console.

    2. Run the following:

        await 
        
       LanguageModel 
       . 
       availability 
       (); 
       
      
    3. Make sure that the output is available , downloading , or downloadable .

    4. If the output is downloadable , start the model download by running:

        await 
        
       LanguageModel 
       . 
       create 
       (); 
       
      
    5. You can use the following monitor callback to listen for download progress and make sure that the model is available before making requests:

        const 
        
       session 
        
       = 
        
       await 
        
       LanguageModel 
       . 
       create 
       ({ 
        
       monitor 
       ( 
       m 
       ) 
        
       { 
        
       m 
       . 
       addEventListener 
       ( 
       "downloadprogress" 
       , 
        
       ( 
       e 
       ) 
        
       = 
      >  
       { 
        
       console 
       . 
       log 
       ( 
       `Downloaded 
       ${ 
       e 
       . 
       loaded 
        
       * 
        
       100 
       } 
       %` 
       ); 
        
       }); 
        
       }, 
       }); 
       
      

Step 2: Set up a Firebase project and connect your app to Firebase

  1. Sign into the Firebase console , and then select your Firebase project.

    Don't already have a Firebase project?

    If you don't already have a Firebase project, click the button to create a new Firebase project, and then use either of the following options:

    • Option 1: Create a wholly new Firebase project (and its underlying Google Cloud project automatically) by entering a new project name in the first step of the workflow.

    • Option 2: "Add Firebase" to an existing Google Cloud project by clicking Add Firebase to Google Cloud project(at bottom of page). In the first step of the workflow, start entering the project nameof the existing project, and then select the project from the displayed list.

    Complete the remaining steps of the on-screen workflow to create a Firebase project. Note that when prompted, you do not need to set up Google Analytics to use the Firebase AI Logic  SDKs.

  • In the Firebase console, go to the Firebase AI Logic page .

  • Click Get startedto launch a guided workflow that helps you set up the required APIs and resources for your project.

  • Select the " Gemini API " provider that you'd like to use with the Firebase AI Logic  SDKs. Gemini Developer API is recommended for first-time users. You can always add billing or set up Vertex AI Gemini API later, if you'd like.

    • Gemini Developer API billing optional (available on the no-cost Spark pricing plan, and you can upgrade later if desired)
      The console will enable the required APIs and create a Gemini API key in your project.
      Do notadd this Gemini API key into your app's codebase. Learn more.

    • Vertex AI Gemini API billing required (requires the pay-as-you-go Blaze pricing plan)
      The console will help you set up billing and enable the required APIs in your project.

  • If prompted in the console's workflow, follow the on-screen instructions to register your app and connect it to Firebase.

  • Continue to the next step in this guide to add the SDK to your app.

  • Step 3: Add the SDK

    The Firebase library provides access to the APIs for interacting with generative models. The library is included as part of the Firebase JavaScript SDK for Web.

    1. Install the Firebase JS SDK for Web using npm:

        npm install firebase 
       
      
    2. Initialize Firebase in your app:

        import 
        
       { 
        
       initializeApp 
        
       } 
        
       from 
        
       "firebase/app" 
       ; 
       // TODO(developer) Replace the following with your app's Firebase configuration 
       // See: https://firebase.google.com/docs/web/learn-more#config-object 
       const 
        
       firebaseConfig 
        
       = 
        
       { 
        
       // ... 
       }; 
       // Initialize FirebaseApp 
       const 
        
       firebaseApp 
        
       = 
        
       initializeApp 
       ( 
       firebaseConfig 
       ); 
       
      

    Step 4: Initialize the service and create a model instance

    Click your Gemini API provider to view provider-specific content and code on this page.

    Before sending a prompt to a Gemini model, initialize the service for your chosen API provider and create a GenerativeModel instance.

    Set the mode to one of:

    • PREFER_ON_DEVICE : Configures the SDK to use the on-device model if it's available, or fall back to the cloud-hosted model.

    • ONLY_ON_DEVICE : Configures the SDK to use the on-device model or throw an exception.

    • ONLY_IN_CLOUD : Configures the SDK to never use the on-device model.

    By default when you use PREFER_ON_DEVICE or ONLY_IN_CLOUD , the cloud-hosted model is gemini-2.0-flash-lite , but you can override the default .

      import 
      
     { 
      
     initializeApp 
      
     } 
      
     from 
      
     "firebase/app" 
     ; 
     import 
      
     { 
      
     getAI 
     , 
      
     getGenerativeModel 
     , 
      
     GoogleAIBackend 
     , 
      
     InferenceMode 
      
     } 
      
     from 
      
     "firebase/ai" 
     ; 
     // TODO(developer) Replace the following with your app's Firebase configuration 
     // See: https://firebase.google.com/docs/web/learn-more#config-object 
     const 
      
     firebaseConfig 
      
     = 
      
     { 
      
     // ... 
     }; 
     // Initialize FirebaseApp 
     const 
      
     firebaseApp 
      
     = 
      
     initializeApp 
     ( 
     firebaseConfig 
     ); 
      // Initialize the Gemini Developer API backend service 
     const 
      
     ai 
      
     = 
      
     getAI 
     ( 
     firebaseApp 
     , 
      
     { 
      
     backend 
     : 
      
     new 
      
     GoogleAIBackend 
     () 
      
     }); 
     // Create a `GenerativeModel` instance 
     // Set the mode, for example to use on-device model when possible 
     const 
      
     model 
      
     = 
      
     getGenerativeModel 
     ( 
     ai 
     , 
      
     { 
      
     mode 
     : 
      
     InferenceMode 
     . 
     PREFER_ON_DEVICE 
      
     }); 
     
    

    Send a prompt request to a model

    This section provides examples for how to send various types of input to generate different types of output, including:

    If you want to generate structured output (like JSON or enums), then use one of the following "generate text" examples and additionally configure the model to respond according to a provided schema .

    Generate text from text-only input

    Before trying this sample, make sure that you've completed the Get started section of this guide.

    You can use generateContent() to generate text from a prompt that contains text:

      // Imports + initialization of FirebaseApp and backend service + creation of model instance 
     // Wrap in an async function so you can use await 
     async 
      
     function 
      
     run 
     () 
      
     { 
      
     // Provide a prompt that contains text 
      
     const 
      
     prompt 
      
     = 
      
     "Write a story about a magic backpack." 
      
     // To generate text output, call `generateContent` with the text input 
      
     const 
      
     result 
      
     = 
      
     await 
      
     model 
     . 
     generateContent 
     ( 
     prompt 
     ); 
      
     const 
      
     response 
      
     = 
      
     result 
     . 
     response 
     ; 
      
     const 
      
     text 
      
     = 
      
     response 
     . 
     text 
     (); 
      
     console 
     . 
     log 
     ( 
     text 
     ); 
     } 
     run 
     (); 
     
    

    Generate text from text-and-image (multimodal) input

    Before trying this sample, make sure that you've completed the Get started section of this guide.

    You can use generateContent() to generate text from a prompt that contains text and image files—providing each input file's mimeType and the file itself.

    The supported input image types for on-device inference are PNG and JPEG.

      // Imports + initialization of FirebaseApp and backend service + creation of model instance 
     // Converts a File object to a Part object. 
     async 
      
     function 
      
     fileToGenerativePart 
     ( 
     file 
     ) 
      
     { 
      
     const 
      
     base64EncodedDataPromise 
      
     = 
      
     new 
      
     Promise 
     (( 
     resolve 
     ) 
      
     = 
    >  
     { 
      
     const 
      
     reader 
      
     = 
      
     new 
      
     FileReader 
     (); 
      
     reader 
     . 
     onloadend 
      
     = 
      
     () 
      
     = 
    >  
     resolve 
     ( 
     reader 
     . 
     result 
     . 
     split 
     ( 
     ',' 
     )[ 
     1 
     ]); 
      
     reader 
     . 
     readAsDataURL 
     ( 
     file 
     ); 
      
     }); 
      
     return 
      
     { 
      
     inlineData 
     : 
      
     { 
      
     data 
     : 
      
     await 
      
     base64EncodedDataPromise 
     , 
      
     mimeType 
     : 
      
     file 
     . 
     type 
      
     }, 
      
     }; 
     } 
     async 
      
     function 
      
     run 
     () 
      
     { 
      
     // Provide a text prompt to include with the image 
      
     const 
      
     prompt 
      
     = 
      
     "Write a poem about this picture:" 
     ; 
      
     const 
      
     fileInputEl 
      
     = 
      
     document 
     . 
     querySelector 
     ( 
     "input[type=file]" 
     ); 
      
     const 
      
     imagePart 
      
     = 
      
     await 
      
     fileToGenerativePart 
     ( 
     fileInputEl 
     . 
     files 
     [ 
     0 
     ]); 
      
     // To generate text output, call `generateContent` with the text and image 
      
     const 
      
     result 
      
     = 
      
     await 
      
     model 
     . 
     generateContent 
     ([ 
     prompt 
     , 
      
     imagePart 
     ]); 
      
     const 
      
     response 
      
     = 
      
     result 
     . 
     response 
     ; 
      
     const 
      
     text 
      
     = 
      
     response 
     . 
     text 
     (); 
      
     console 
     . 
     log 
     ( 
     text 
     ); 
     } 
     run 
     (); 
     
    

    What else can you do?

    In addition to the examples above, you can also enable end-users to try out your feature , use alternative inference modes , override the default fallback model , and use model configuration to control responses .

    Enable end-users to try out your feature

    To enable end-users to try out your feature, you can enroll in the Chrome Origin Trials . Note that there's a limited duration and usage for these trials.

    1. Register for the Prompt API Chrome Origin Trial . You'll be given a token.

    2. Provide the token on every web page for which you want the trial feature to be enabled. Use one of the following options:

      • Provide the token as a meta tag in the <head> tag: <meta http-equiv="origin-trial" content=" TOKEN ">

      • Provide the token as an HTTP header: Origin-Trial: TOKEN

      • Provide the token programmatically .

    Use alternative inference modes

    The examples above used the PREFER_ON_DEVICE mode to configure the SDK to use an on-device model if it's available, or fall back to a cloud-hosted model. The SDK offers two alternative inference modes : ONLY_ON_DEVICE and ONLY_IN_CLOUD .

    • Use ONLY_ON_DEVICE mode so that the SDK can only use an on-device model. In this configuration, the API will throw an error if an on-device model is not available.

        const 
        
       model 
        
       = 
        
       getGenerativeModel 
       ( 
       ai 
       , 
        
       { 
        
       mode 
       : 
        
       InferenceMode 
       . 
       ONLY_ON_DEVICE 
        
       }); 
       
      
    • Use ONLY_IN_CLOUD mode so that the SDK can only use a cloud-hosted model.

        const 
        
       model 
        
       = 
        
       getGenerativeModel 
       ( 
       ai 
       , 
        
       { 
        
       mode 
       : 
        
       InferenceMode 
       . 
       ONLY_IN_CLOUD 
        
       }); 
       
      

    Override the default fallback model

    When you use the PREFER_ON_DEVICE mode, the SDK will fall back to using a cloud-hosted model if an on-device model is unavailable. The default fallback cloud-hosted model is gemini-2.0-flash-lite . This cloud-hosted model is also the default when you use the ONLY_IN_CLOUD mode.

    You can use the inCloudParams configuration option to specify an alternative default cloud-hosted model:

      const 
      
     model 
      
     = 
      
     getGenerativeModel 
     ( 
     ai 
     , 
      
     { 
      
     mode 
     : 
      
     InferenceMode 
     . 
     PREFER_ON_DEVICE 
     , 
      
     inCloudParams 
     : 
      
     { 
      
     model 
     : 
      
     "gemini-2.5-flash" 
      
     } 
     }); 
     
    

    Find model names for all supported Gemini models .

    Use model configuration to control responses

    In each request to a model, you can send along a model configuration to control how the model generates a response. Cloud-hosted models and on-device models offer different configuration options.

    The configuration is maintained for the lifetime of the instance. If you want to use a different config, create a new GenerativeModel instance with that config.

    Set the configuration for a cloud-hosted model

    Use the inCloudParams option to configure a cloud-hosted Gemini model. Learn about available parameters .

      const 
      
     model 
      
     = 
      
     getGenerativeModel 
     ( 
     ai 
     , 
      
     { 
      
     mode 
     : 
      
     InferenceMode 
     . 
     PREFER_ON_DEVICE 
     , 
      
     inCloudParams 
     : 
      
     { 
      
     model 
     : 
      
     "gemini-2.5-flash" 
      
     temperature 
     : 
      
     0.8 
     , 
      
     topK 
     : 
      
     10 
      
     } 
     }); 
     
    

    Set the configuration for an on-device model

    Note that inference using an on-device model uses the Prompt API from Chrome .

    Use the onDeviceParams option to configure an on-device model. Learn about available parameters .

      const 
      
     model 
      
     = 
      
     getGenerativeModel 
     ( 
     ai 
     , 
      
     { 
      
     mode 
     : 
      
     InferenceMode 
     . 
     PREFER_ON_DEVICE 
     , 
      
     onDeviceParams 
     : 
      
     { 
      
     createOptions 
     : 
      
     { 
      
     temperature 
     : 
      
     0.8 
     , 
      
     topK 
     : 
      
     8 
      
     } 
      
     } 
     }); 
     
    

    Set the configuration for structured output

    Generating structured output (like JSON and enums) is supported for inference using both cloud-hosted and on-device models.

    For hybrid inference, use both inCloudParams and onDeviceParams to configure the model to respond with structured output. For the other modes, use only the applicable configuration.

    • For inCloudParams : Specify the appropriate responseMimeType (in this example, application/json ) as well as the responseSchema that you want the model to use.

    • For onDeviceParams : Specify the responseConstraint that you want the model to use.

    JSON output

    The following example adapts the general JSON output example for hybrid inference:

      import 
      
     { 
      
     getAI 
     , 
      
     getGenerativeModel 
     , 
      
     Schema 
     } 
      
     from 
      
     "firebase/ai" 
     ; 
     const 
      
     jsonSchema 
      
     = 
      
     Schema 
     . 
     object 
     ({ 
      
     properties 
     : 
      
     { 
      
     characters 
     : 
      
     Schema 
     . 
     array 
     ({ 
      
     items 
     : 
      
     Schema 
     . 
     object 
     ({ 
      
     properties 
     : 
      
     { 
      
     name 
     : 
      
     Schema 
     . 
     string 
     (), 
      
     accessory 
     : 
      
     Schema 
     . 
     string 
     (), 
      
     age 
     : 
      
     Schema 
     . 
     number 
     (), 
      
     species 
     : 
      
     Schema 
     . 
     string 
     (), 
      
     }, 
      
     optionalProperties 
     : 
      
     [ 
     "accessory" 
     ], 
      
     }), 
      
     }), 
      
     } 
     }); 
     const 
      
     model 
      
     = 
      
     getGenerativeModel 
     ( 
     ai 
     , 
      
     { 
      
     mode 
     : 
      
     InferenceMode 
     . 
     PREFER_ON_DEVICE 
     , 
      
     inCloudParams 
     : 
      
     { 
      
     model 
     : 
      
     "gemini-2.5-flash" 
      
     generationConfig 
     : 
      
     { 
      
     responseMimeType 
     : 
      
     "application/json" 
     , 
      
     responseSchema 
     : 
      
     jsonSchema 
      
     }, 
      
     } 
      
     onDeviceParams 
     : 
      
     { 
      
     promptOptions 
     : 
      
     { 
      
     responseConstraint 
     : 
      
     jsonSchema 
      
     } 
      
     } 
     }); 
     
    
    Enum output

    As above, but adapting the documentation on enum output for hybrid inference:

      // ... 
     const 
      
     enumSchema 
      
     = 
      
     Schema 
     . 
     enumString 
     ({ 
      
     enum 
     : 
      
     [ 
     "drama" 
     , 
      
     "comedy" 
     , 
      
     "documentary" 
     ], 
     }); 
     const 
      
     model 
      
     = 
      
     getGenerativeModel 
     ( 
     ai 
     , 
      
     { 
     // ... 
      
     generationConfig 
     : 
      
     { 
      
     responseMimeType 
     : 
      
     "text/x.enum" 
     , 
      
     responseSchema 
     : 
      
     enumSchema 
      
     }, 
     // ... 
     }); 
     // ... 
     
    

    Features not yet available for on-device inference

    As an experimental release, not all the capabilities of the Web SDK are available for on-device inference. The following features are not yet supported for on-device inference (but they are usually available for cloud-based inference).

    • Generating text from image file input types other than JPEG and PNG

      • Can fallback to the cloud-hosted model; however, ONLY_ON_DEVICE mode will throw an error.
    • Generating text from audio, video, and documents (like PDFs) inputs

      • Can fallback to the cloud-hosted model; however, ONLY_ON_DEVICE mode will throw an error.
    • Generating images using Gemini or Imagen models

      • Can fallback to the cloud-hosted model; however, ONLY_ON_DEVICE mode will throw an error.
    • Providing files using URLs in multimodal requests. You must provide files as inline data to on-device models.

    • Multi-turn chat

      • Can fallback to the cloud-hosted model; however, ONLY_ON_DEVICE mode will throw an error.
    • Bi-directional streaming with the Gemini Live API

      • Note that this isn't supported by the Firebase AI Logic client SDK for Web even for cloud-hosted models .
    • Using "tools", including function calling and grounding with Google Search

      • Coming soon!
    • Count tokens

      • Always throws an error. The count will differ between cloud-hosted and on-device models, so there is no intuitive fallback.
    • AI monitoring in the Firebase console for on-device inference.

      • Note that any inference using the cloud-hosted models can be monitored just like other inference using the Firebase AI Logic client SDK for Web.


    Give feedback about your experience with Firebase AI Logic


    Create a Mobile Website
    View Site in Mobile | Classic
    Share by: