Count tokens and billable characters for Gemini models

Generative models break down data into units called tokens for processing. Each model has a maximum number of tokens that it can handle in a prompt and response.

This page shows you how to use the Count Tokens API to get an estimate of token count and the number of billable characters for a request to a Gemini model. There isn't an API for getting the estimate of tokens in a response.

Note that the Count Tokens API cannot be used for Imagen models.

What information is provided in the count?

Note the following about counting tokens and billable characters:

  • Counting the total tokens

    • This count is helpful to make sure your requests don't go over the allowable context window.

    • The token count will reflect the size of all files (for example, images) that are provided as part of the request input. It will not count the number of images or the number of seconds in a video.

    • For all Gemini models, a token is equivalent to about 4 characters. 100 tokens are about 60-80 English words.

  • Counting the total billable characters

    • This count is helpful for understanding and controlling your costs, since for Vertex AI, number of characters is part of the pricing calculation.

    • The billable character count will reflect the number of characters in the text that's provided as part of the request input.

For older Gemini models, tokens are not part of the pricing calculation; however, for Gemini 2.0 models, tokens are used in the pricing calculation. Learn more about token limits per model and pricing per model.

Pricing and quota for counting tokens and billable characters

There's no charge or quota restriction for using the CountTokens API. The maximum quota for the CountTokens API is 3000 requests per minute (RPM).

Code samples

Text-only input

Swift

let response = try await model.countTokens("Write a story about a magic backpack.")
print("Total Tokens: \(response.totalTokens)")
print("Total Billable Characters: \(response.totalBillableCharacters)")

Kotlin

val response = generativeModel.countTokens("Write a story about a magic backpack.")
println("Total Tokens: ${response.totalTokens}")
println("Total Billable Characters: ${response.totalBillableCharacters}")

Java

Content prompt = new Content.Builder()
    .addText("Write a story about a magic backpack.")
    .build();

GenerativeModelFutures modelFutures = GenerativeModelFutures.from(model);
ListenableFuture<CountTokensResponse> countTokensResponse =
    modelFutures.countTokens(prompt);

Futures.addCallback(countTokensResponse, new FutureCallback<CountTokensResponse>() {
    @Override
    public void onSuccess(CountTokensResponse response) {
        System.out.println("Total Tokens = " + response.getTotalTokens());
        System.out.println("Total Billable Characters: = " +
          response.getTotalBillableCharacters());
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

Web

const { totalTokens, totalBillableCharacters } = await model.countTokens("Write a story about a magic backpack.");
console.log(`Total tokens: ${totalTokens}, total billable characters: ${totalBillableCharacters}`);

Dart

final tokenCount = await model.countTokens(Content.text("Write a story about a magic backpack."));
print('Token count: ${tokenCount.totalTokens}, billable characters: ${tokenCount.totalBillableCharacters}');

Multimodal input

Swift

let response = try await model.countTokens(image, "What's in this picture?")
print("Total Tokens: \(response.totalTokens)")
print("Total Billable Characters: \(response.totalBillableCharacters)")

Kotlin

val prompt = content {
  image(bitmap)
  text("What's in this picture?")
}
val response = generativeModel.countTokens(prompt)
println("Total Tokens: ${response.totalTokens}")
println("Total Billable Characters: ${response.totalBillableCharacters}")

Java

Content prompt = new Content.Builder()
    .addImage(bitmap)
    .addText("What's in this picture?")
    .build();

GenerativeModelFutures modelFutures = GenerativeModelFutures.from(model);
ListenableFuture<CountTokensResponse> countTokensResponse =
    modelFutures.countTokens(prompt);

Futures.addCallback(countTokensResponse, new FutureCallback<CountTokensResponse>() {
    @Override
    public void onSuccess(CountTokensResponse response) {
        System.out.println("Total Tokens = " + response.getTotalTokens());
        System.out.println("Total Billable Characters: = " +
          response.getTotalBillableCharacters());
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

Web

const prompt = "What's in this picture?";
const imagePart = { inlineData: { mimeType: 'image/jpeg', data: imageAsBase64 }};

const { totalTokens, totalBillableCharacters } = await model.countTokens([prompt, imagePart]);
console.log(`Total tokens: ${totalTokens}, total billable characters: ${totalBillableCharacters}`);

Dart

final prompt = TextPart("What's in the picture?");
final tokenCount = await model.countTokens([
  Content.multi([prompt, imagePart])
]);
print('Token count: ${tokenCount.totalTokens}, billable characters: ${tokenCount.totalBillableCharacters}');