Understand and manage your quotas

Vertex AI in Firebase requires two different APIs (each with its own quota): Vertex AI API and Vertex AI in Firebase API.

Each of these APIs has a quota that's measured as requests per minute (RPM) — specifically "generate content" requests (both streaming and without streaming). The Vertex AI API also has a quota for input tokens per minute.

This page describes the following:

You can learn general information about quotas in the Google Cloud documentation.

Understand the quotas for each API

Each API's quota is measured slightly differently, which means that they can be used for different purposes.

Understand the Vertex AI API quotas

The Vertex AI API quotas are based on "Generate content requests" on a per-model per-region per-minute basis.

Here are some important details about these quotas (specifically, requests per minute and input tokens per minute):

  • They apply at the project-level and are shared across all applications and IP addresses that use that Firebase project.

  • They apply to any call to the Vertex AI Gemini API, whether it be using the Vertex AI in Firebase client SDKs, the Vertex AI server SDKs, Firebase Genkit, the Gemini Firebase Extensions, REST calls, Vertex AI Studio, or other API clients.

  • They apply to a base model and all versions, identifiers, and tuned versions of that model. Here are some examples:

    • A request to gemini-1.0-pro and a request to gemini-1.0-pro-001 are counted as two requests toward the RPM quota of the base model, gemini-1.0 pro.

    • A request to gemini-1.0-pro-001 and a request to a tuned model that's based on gemini-1.0-pro-001 are counted as two requests toward the RPM quota of the base model, gemini-1.0-pro.

  • The default quotas for each model and for each region can be found in the Google Cloud documentation.

You can basically consider this API's quotas to be your "total" quotas for all your users (who use the AI features in your app that rely on a specific model and in a specific region).

These quotas need to be high enough to reasonably accommodate the total number of end users in a specific region who might access your AI features that rely on a specific model. Since these are per-minute quotas, it's relatively unlikely that all your users in a region will use the same set of features all at the same time and deplete these quotas. But each app is different, so adjust these quotas accordingly.

Understand the Vertex AI in Firebase API quota

The Vertex AI in Firebase API quota is based on "Generate content requests" on a per-user per-region per-minute basis.

Here are some important details about this quota (specifically, requests per minute):

  • It applies at the project-level and applies to all applications and IP addresses that use that Firebase project.

  • It applies to any call that specifically goes through any Vertex AI in Firebase SDK.

  • The default quota is 100 RPM per user.
    Note that you still need to consider the quota limits for the Vertex AI API, especially if they're lower than this 100 RPM.

You can basically consider this API's quota to be your "per user" quota for the AI features that rely on Vertex AI in Firebase.

This quota needs to be high enough to reasonably accommodate a single user accessing the AI features that rely on Vertex AI in Firebase. Since this API acts as the gateway to the Vertex AI API, you can use the Vertex AI in Firebase API quota to ensure that no single user depletes your Vertex AI API quota (which is meant to be shared by all your users).

View the quotas for each API

You can view the quotas for each API in the Google Cloud console.

  1. In the Google Cloud console, go to the page for the API of interest: Vertex AI API or Vertex AI in Firebase API.

  2. Click Manage.

  3. Lower on the page, click the Quotas & System Limits tab.

  4. Filter the table to show the quotas of interest.

    Note that to create a Dimension filter, you need to use the filter tooling, rather than just copy-pasting the values in the following examples.

    • For the Vertex AI API: Specify the capability (requests for generating content), model name, and region.

      For example, to view the quotas for generating content requests with Gemini 1.5 Flash in any of the supported EU regions, your filter would look like this:
      Generate content requests + Dimension:base_model:gemini-1.5-flash + Dimension:region:eu

    • For the Vertex AI in Firebase API: Specify the capability (requests for generating content) and region.

      For example, to view the per-user quotas for generating content requests in any of the supported Asian regions, your filter would look like this:
      Generate content requests + Dimension:region:asia

      Note that the Vertex AI in Firebase API quotas aren't based on a particular model. Also, the (default) quota row doesn't apply to Vertex AI in Firebase.

Edit quota or request a quota increase

Before you go to production or if you're getting 429 quota-exceeded errors, you may need to edit your quota or request a quota increase. Make sure you adjust each API's quota accordingly (see Understand the quotas for each API earlier on this page for considerations).

To edit a quota, you must have the serviceusage.quotas.update permission, which is included by default in the Owner and Editor role.

Here's how to edit your quota or request a quota increase:

  1. Follow the instructions in the previous subsection to view the quotas of each API.

  2. Select the checkbox to the left of each quota of interest.

  3. At the end of the quota's row, click the three-dot menu, and then select Edit quota.

  4. In the Quota changes form, do the following:

    1. Enter the increased quota in the New value field.

      This quota applies at the project-level and is shared across all applications and IP addresses that use that Firebase project.

    2. Complete any additional fields in the form, and then click Done.

    3. Click Submit request.