Build hybrid experiences in Web apps with on-device and cloud-hosted models


Build AI-powered web apps and features with hybrid inference using Firebase AI Logic. Hybrid inference enables running inference using on-device models when available and seamlessly falling back to cloud-hosted models otherwise (and vice versa).

This page describes how to get started using the client SDK. After completing this standard setup, check out the additional configuration options and capabilities (like structured output).

Note that on-device inference is supported for web apps running on Chrome on Desktop.

Jump to the code examples

Recommended use cases and supported capabilities

Recommended use cases:

  • Using an on-device model for inference offers:

    • Enhanced privacy
    • Local context
    • Inference at no-cost
    • Offline functionality
  • Using hybrid functionality offers:

    • Reach 100% of your audience, regardless of on-device model availability or internet connectivity

Supported capabilities and features for on-device inference:

On-device inference only supports single-turn text generation (not chat), with streaming or non-streaming output. It supports the following text-generation capabilities:

You can also generate structured output, including JSON and enums.

Before you begin

Take note of the following:

Get started on localhost

These get started steps describe the required general setup for any supported prompt request that you want to send.

Step 1: Set up Chrome and the Prompt API for on-device inference

  1. Make sure you're using a recent version of Chrome. Update in chrome://settings/help.
    On-device inference is available from Chrome v139 and higher.

  2. Enable the on-device multimodal model by setting the following flag to Enabled:

    • chrome://flags/#prompt-api-for-gemini-nano-multimodal-input
  3. Restart Chrome.

  4. (Optional) Download the on-device model before the first request.

    The Prompt API is built into Chrome; however, the on-device model isn't available by default. If you haven't yet downloaded the model before your first request for on-device inference, the request will automatically start the model download in the background.

Step 2: Set up a Firebase project and connect your app to Firebase

  1. Sign into the Firebase console, and then select your Firebase project.

  2. In the Firebase console, go to the Firebase AI Logic page.

  3. Click Get started to launch a guided workflow that helps you set up the required APIs and resources for your project.

  4. Set up your project to use a "Gemini API" provider.

    We recommend getting started using the Gemini Developer API. At any point, you can always set up the Vertex AI Gemini API (and its requirement for billing).

    For the Gemini Developer API, the console will enable the required APIs and create a Gemini API key in your project.
    Do not add this Gemini API key into your app's codebase. Learn more.

  5. If prompted in the console's workflow, follow the on-screen instructions to register your app and connect it to Firebase.

  6. Continue to the next step in this guide to add the SDK to your app.

Step 3: Add the SDK

The Firebase library provides access to the APIs for interacting with generative models. The library is included as part of the Firebase JavaScript SDK for Web.

  1. Install the Firebase JS SDK for Web using npm:

    npm install firebase
    
  2. Initialize Firebase in your app:

    import { initializeApp } from "firebase/app";
    
    // TODO(developer) Replace the following with your app's Firebase configuration
    // See: https://firebase.google.com/docs/web/learn-more#config-object
    const firebaseConfig = {
      // ...
    };
    
    // Initialize FirebaseApp
    const firebaseApp = initializeApp(firebaseConfig);
    

Step 4: Initialize the service and create a model instance

Click your Gemini API provider to view provider-specific content and code on this page.

Set up the following before you send a prompt request to the model.

  1. Initialize the service for your chosen API provider.

  2. Create a GenerativeModel instance. Make sure to do the following:

    1. Call getGenerativeModel after or on an end-user interaction (like a button click). This is a prerequisite for inferenceMode.

    2. Set the mode to one of:

      • PREFER_ON_DEVICE: Use the on-device model if it's available; otherwise, fall back to the cloud-hosted model.

      • ONLY_ON_DEVICE: Use the on-device model if it's available; otherwise, throw an exception.

      • PREFER_IN_CLOUD: Use the cloud-hosted model if it's available; otherwise, fall back to the on-device model.

      • ONLY_IN_CLOUD: Use the cloud-hosted model if it's available; otherwise, throw an exception.

import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend, InferenceMode } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance
// Call `getGenerativeModel` after or on an end-user interaction
// Set the mode (for example, use the on-device model if it's available)
const model = getGenerativeModel(ai, { mode: InferenceMode.PREFER_ON_DEVICE });

Step 5: Send a prompt request to a model

This section shows you how to send various types of input to generate different types of output, including:

If you want to generate structured output (like JSON or enums), then use one of the following "generate text" examples and additionally configure the model to respond according to a provided schema.

Generate text from text-only input

Before trying this sample, make sure that you've completed the Get started section of this guide.

You can use generateContent() to generate text from a prompt that contains text:

// Imports + initialization of FirebaseApp and backend service + creation of model instance

// Wrap in an async function so you can use await
async function run() {
  // Provide a prompt that contains text
  const prompt = "Write a story about a magic backpack."

  // To generate text output, call `generateContent` with the text input
  const result = await model.generateContent(prompt);

  const response = result.response;
  const text = response.text();
  console.log(text);
}

run();

Note that Firebase AI Logic also supports streaming of text responses using generateContentStream (instead of generateContent).

Generate text from text-and-image (multimodal) input

Before trying this sample, make sure that you've completed the Get started section of this guide.

You can use generateContent() to generate text from a prompt that contains text and image files—providing each input file's mimeType and the file itself.

The supported input image types for on-device inference are PNG and JPEG.

// Imports + initialization of FirebaseApp and backend service + creation of model instance

// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the image
  const prompt = "Write a poem about this picture:";

  const fileInputEl = document.querySelector("input[type=file]");
  const imagePart = await fileToGenerativePart(fileInputEl.files[0]);

  // To generate text output, call `generateContent` with the text and image
  const result = await model.generateContent([prompt, imagePart]);

  const response = result.response;
  const text = response.text();
  console.log(text);
}

run();

Note that Firebase AI Logic also supports streaming of text responses using generateContentStream (instead of generateContent).

Enable end-users to try your feature

For end-users to try your feature in your app, you must enroll in the Chrome Origin Trials. Note that there's a limited duration and usage for these trials.

  1. Register for the Prompt API Chrome Origin Trial. You'll be given a token.

  2. Provide the token on every web page for which you want the trial feature to be enabled. Use one of the following options:

    • Provide the token as a meta tag in the <head> tag: <meta http-equiv="origin-trial" content="TOKEN">

    • Provide the token as an HTTP header: Origin-Trial: TOKEN

    • Provide the token programmatically.

What else can you do?

You can use various additional configuration options and capabilities for your hybrid experiences:

Features not yet available for on-device inference

As a preview release, not all the capabilities of the Web SDK are available for on-device inference. The following features are not yet supported for on-device inference (but they are usually available for cloud-based inference).

  • Generating text from image file input types other than JPEG and PNG

    • Can fallback to the cloud-hosted model; however, ONLY_ON_DEVICE mode will throw an error.
  • Generating text from audio, video, and documents (like PDFs) inputs

    • Can fallback to the cloud-hosted model; however, ONLY_ON_DEVICE mode will throw an error.
  • Generating images using Gemini or Imagen models

    • Can fallback to the cloud-hosted model; however, ONLY_ON_DEVICE mode will throw an error.
  • Providing files using URLs in multimodal requests. You must provide files as inline data to on-device models.

  • Multi-turn chat

    • Can fallback to the cloud-hosted model; however, ONLY_ON_DEVICE mode will throw an error.
  • Bi-directional streaming with the Gemini Live API

  • Providing the model with tools to help it generate its response (like function calling, code execution, URL context, and grounding with Google Search)

  • Count tokens

    • Always throws an error. The count will differ between cloud-hosted and on-device models, so there is no intuitive fallback.
  • AI monitoring in the Firebase console for on-device inference.

    • Note that any inference using the cloud-hosted models can be monitored just like other inference using the Firebase AI Logic client SDK for Web.


Give feedback about your experience with Firebase AI Logic