Generative models break down data into units called tokens for processing. Each model has a maximum number of tokens that it can handle in a prompt and response.
This page shows you how to use the Count Tokens API to get an estimate of token count and the number of billable characters for a request to a Gemini model. There isn't an API for getting the estimate of tokens in a response.
Note that the Count Tokens API cannot be used for Imagen models.
What information is provided in the count?
Note the following about counting tokens and billable characters:
Counting the total tokens
This count is helpful to make sure your requests don't go over the allowable context window.
The token count will reflect the size of all files (for example, images) that are provided as part of the request input. It will not count the number of images or the number of seconds in a video.
For all Gemini models, a token is equivalent to about 4 characters. 100 tokens are about 60-80 English words.
Counting the total billable characters
This count is helpful for understanding and controlling your costs, since for Vertex AI, number of characters is part of the pricing calculation.
The billable character count will reflect the number of characters in the text that's provided as part of the request input.
For older Gemini models, tokens are not part of the pricing calculation; however, for Gemini 2.0 models, tokens are used in the pricing calculation. Learn more about token limits per model and pricing per model.
Pricing and quota for counting tokens and billable characters
There's no charge or quota restriction for using the CountTokens
API. The
maximum quota for the CountTokens
API is 3000 requests per minute (RPM).
Code samples
Text-only input
Swift
let response = try await model.countTokens("Write a story about a magic backpack.")
print("Total Tokens: \(response.totalTokens)")
print("Total Billable Characters: \(response.totalBillableCharacters)")
Kotlin
val response = generativeModel.countTokens("Write a story about a magic backpack.")
println("Total Tokens: ${response.totalTokens}")
println("Total Billable Characters: ${response.totalBillableCharacters}")
Java
Content prompt = new Content.Builder()
.addText("Write a story about a magic backpack.")
.build();
GenerativeModelFutures modelFutures = GenerativeModelFutures.from(model);
ListenableFuture<CountTokensResponse> countTokensResponse =
modelFutures.countTokens(prompt);
Futures.addCallback(countTokensResponse, new FutureCallback<CountTokensResponse>() {
@Override
public void onSuccess(CountTokensResponse response) {
System.out.println("Total Tokens = " + response.getTotalTokens());
System.out.println("Total Billable Characters: = " +
response.getTotalBillableCharacters());
}
@Override
public void onFailure(Throwable t) {
t.printStackTrace();
}
}, executor);
Web
const { totalTokens, totalBillableCharacters } = await model.countTokens("Write a story about a magic backpack.");
console.log(`Total tokens: ${totalTokens}, total billable characters: ${totalBillableCharacters}`);
Dart
final tokenCount = await model.countTokens(Content.text("Write a story about a magic backpack."));
print('Token count: ${tokenCount.totalTokens}, billable characters: ${tokenCount.totalBillableCharacters}');
Multimodal input
Swift
let response = try await model.countTokens(image, "What's in this picture?")
print("Total Tokens: \(response.totalTokens)")
print("Total Billable Characters: \(response.totalBillableCharacters)")
Kotlin
val prompt = content {
image(bitmap)
text("What's in this picture?")
}
val response = generativeModel.countTokens(prompt)
println("Total Tokens: ${response.totalTokens}")
println("Total Billable Characters: ${response.totalBillableCharacters}")
Java
Content prompt = new Content.Builder()
.addImage(bitmap)
.addText("What's in this picture?")
.build();
GenerativeModelFutures modelFutures = GenerativeModelFutures.from(model);
ListenableFuture<CountTokensResponse> countTokensResponse =
modelFutures.countTokens(prompt);
Futures.addCallback(countTokensResponse, new FutureCallback<CountTokensResponse>() {
@Override
public void onSuccess(CountTokensResponse response) {
System.out.println("Total Tokens = " + response.getTotalTokens());
System.out.println("Total Billable Characters: = " +
response.getTotalBillableCharacters());
}
@Override
public void onFailure(Throwable t) {
t.printStackTrace();
}
}, executor);
Web
const prompt = "What's in this picture?";
const imagePart = { inlineData: { mimeType: 'image/jpeg', data: imageAsBase64 }};
const { totalTokens, totalBillableCharacters } = await model.countTokens([prompt, imagePart]);
console.log(`Total tokens: ${totalTokens}, total billable characters: ${totalBillableCharacters}`);
Dart
final prompt = TextPart("What's in the picture?");
final tokenCount = await model.countTokens([
Content.multi([prompt, imagePart])
]);
print('Token count: ${tokenCount.totalTokens}, billable characters: ${tokenCount.totalBillableCharacters}');