使用 Gemini API 分析圖片檔案

您可以要求 Gemini 模型分析您提供的內嵌 (base64 編碼) 或透過網址提供的圖片檔案。使用 Firebase AI Logic 時,您可以直接透過應用程式提出這項要求。

這項功能可讓您執行下列操作:

  • 建立圖片說明或回答圖片相關問題
  • 寫一篇關於圖片的短篇故事或詩
  • 偵測圖片中的物件,並傳回物件的定界框座標
  • 根據情緒、風格或其他特徵,為一組圖片加上標籤或分類

跳至程式碼範例 跳至串流回應程式碼


參閱其他指南,瞭解處理圖片的其他選項
產生結構化輸出內容 多回合聊天 在裝置上分析圖片 產生圖片

事前準備

按一下您的 Gemini API 供應商,即可在本頁查看供應商專屬內容和程式碼。

如果您尚未完成,請參閱入門指南,瞭解如何設定 Firebase 專案、將應用程式連結至 Firebase、新增 SDK、為所選 Gemini API 供應器初始化後端服務,以及建立 GenerativeModel 例項。

如要測試並重複提示,甚至取得產生的程式碼片段,建議您使用 Google AI Studio

從圖片檔案 (以 Base64 編碼) 生成文字

在嘗試這個範例前,請先完成本指南的「開始前」一節,設定專案和應用程式。
在該部分,您也需要點選所選Gemini API供應商的按鈕,才能在本頁面上看到供應商專屬內容

您可以要求 Gemini 模型使用文字和圖片進行提示,藉此生成文字,方法是提供每個輸入檔案的 mimeType 和檔案本身。請參閱本頁後續的輸入檔案相關規定和建議

Swift

您可以呼叫 generateContent(),根據多模態輸入的文字和圖片生成文字。

單一檔案輸入


import FirebaseAI

// Initialize the Gemini Developer API backend service
let ai = FirebaseAI.firebaseAI(backend: .googleAI())

// Create a `GenerativeModel` instance with a model that supports your use case
let model = ai.generativeModel(modelName: "gemini-2.0-flash")


guard let image = UIImage(systemName: "bicycle") else { fatalError() }

// Provide a text prompt to include with the image
let prompt = "What's in this picture?"

// To generate text output, call generateContent and pass in the prompt
let response = try await model.generateContent(image, prompt)
print(response.text ?? "No text in response.")

多個檔案輸入


import FirebaseAI

// Initialize the Gemini Developer API backend service
let ai = FirebaseAI.firebaseAI(backend: .googleAI())

// Create a `GenerativeModel` instance with a model that supports your use case
let model = ai.generativeModel(modelName: "gemini-2.0-flash")


guard let image1 = UIImage(systemName: "car") else { fatalError() }
guard let image2 = UIImage(systemName: "car.2") else { fatalError() }

// Provide a text prompt to include with the images
let prompt = "What's different between these pictures?"

// To generate text output, call generateContent and pass in the prompt
let response = try await model.generateContent(image1, image2, prompt)
print(response.text ?? "No text in response.")

Kotlin

您可以呼叫 generateContent(),根據多模態輸入的文字和圖片生成文字。

對於 Kotlin,這個 SDK 中的函式為暫停函式,需要從 協同程式範圍中呼叫。

單一檔案輸入


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val model = Firebase.ai(backend = GenerativeBackend.googleAI())
                        .generativeModel("gemini-2.0-flash")


// Loads an image from the app/res/drawable/ directory
val bitmap: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky)

// Provide a prompt that includes the image specified above and text
val prompt = content {
  image(bitmap)
  text("What developer tool is this mascot from?")
}

// To generate text output, call generateContent with the prompt
val response = generativeModel.generateContent(prompt)
print(response.text)

多個檔案輸入

對於 Kotlin,這個 SDK 中的函式為暫停函式,需要從 協同程式範圍中呼叫。

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val model = Firebase.ai(backend = GenerativeBackend.googleAI())
                        .generativeModel("gemini-2.0-flash")


// Loads an image from the app/res/drawable/ directory
val bitmap1: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky)
val bitmap2: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky_eats_pizza)

// Provide a prompt that includes the images specified above and text
val prompt = content {
  image(bitmap1)
  image(bitmap2)
  text("What is different between these pictures?")
}

// To generate text output, call generateContent with the prompt
val response = generativeModel.generateContent(prompt)
print(response.text)

Java

您可以呼叫 generateContent(),根據多模態輸入的文字和圖片生成文字。

對於 Java,這個 SDK 中的各個方法會傳回 ListenableFuture

單一檔案輸入


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI())
        .generativeModel("gemini-2.0-flash");

// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures model = GenerativeModelFutures.from(ai);


Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.sparky);

// Provide a prompt that includes the image specified above and text
Content content = new Content.Builder()
        .addImage(bitmap)
        .addText("What developer tool is this mascot from?")
        .build();

// To generate text output, call generateContent with the prompt
ListenableFuture<GenerateContentResponse> response = model.generateContent(content);
Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
    @Override
    public void onSuccess(GenerateContentResponse result) {
        String resultText = result.getText();
        System.out.println(resultText);
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

多個檔案輸入


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI())
        .generativeModel("gemini-2.0-flash");

// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures model = GenerativeModelFutures.from(ai);


Bitmap bitmap1 = BitmapFactory.decodeResource(getResources(), R.drawable.sparky);
Bitmap bitmap2 = BitmapFactory.decodeResource(getResources(), R.drawable.sparky_eats_pizza);

// Provide a prompt that includes the images specified above and text
Content prompt = new Content.Builder()
    .addImage(bitmap1)
    .addImage(bitmap2)
    .addText("What's different between these pictures?")
    .build();

// To generate text output, call generateContent with the prompt
ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt);
Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
    @Override
    public void onSuccess(GenerateContentResponse result) {
        String resultText = result.getText();
        System.out.println(resultText);
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

Web

您可以呼叫 generateContent(),根據多模態輸入的文字和圖片生成文字。

單一檔案輸入


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, { model: "gemini-2.0-flash" });


// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the image
  const prompt = "What do you see?";

  const fileInputEl = document.querySelector("input[type=file]");
  const imagePart = await fileToGenerativePart(fileInputEl.files[0]);

  // To generate text output, call generateContent with the text and image
  const result = await model.generateContent([prompt, imagePart]);

  const response = result.response;
  const text = response.text();
  console.log(text);
}

run();

多個檔案輸入


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, { model: "gemini-2.0-flash" });


// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the images
  const prompt = "What's different between these pictures?";

  // Prepare images for input
  const fileInputEl = document.querySelector("input[type=file]");
  const imageParts = await Promise.all(
    [...fileInputEl.files].map(fileToGenerativePart)
  );

  // To generate text output, call generateContent with the text and images
  const result = await model.generateContent([prompt, ...imageParts]);

  const response = result.response;
  const text = response.text();
  console.log(text);
}

run();

Dart

您可以呼叫 generateContent(),根據文字和圖片的多模態輸入內容生成文字。

單一檔案輸入


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

// Initialize FirebaseApp
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final model =
      FirebaseAI.googleAI().generativeModel(model: 'gemini-2.0-flash');


// Provide a text prompt to include with the image
final prompt = TextPart("What's in the picture?");
// Prepare images for input
final image = await File('image0.jpg').readAsBytes();
final imagePart = InlineDataPart('image/jpeg', image);

// To generate text output, call generateContent with the text and image
final response = await model.generateContent([
  Content.multi([prompt,imagePart])
]);
print(response.text);

多個檔案輸入


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

// Initialize FirebaseApp
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final model =
      FirebaseAI.googleAI().generativeModel(model: 'gemini-2.0-flash');


final (firstImage, secondImage) = await (
  File('image0.jpg').readAsBytes(),
  File('image1.jpg').readAsBytes()
).wait;
// Provide a text prompt to include with the images
final prompt = TextPart("What's different between these pictures?");
// Prepare images for input
final imageParts = [
  InlineDataPart('image/jpeg', firstImage),
  InlineDataPart('image/jpeg', secondImage),
];

// To generate text output, call generateContent with the text and images
final response = await model.generateContent([
  Content.multi([prompt, ...imageParts])
]);
print(response.text);

Unity

您可以呼叫 GenerateContentAsync(),根據多模態輸入的文字和圖片生成文字。

單一檔案輸入


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
var ai = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());

// Create a `GenerativeModel` instance with a model that supports your use case
var model = ai.GetGenerativeModel(modelName: "gemini-2.0-flash");


// Convert a Texture2D into InlineDataParts
var grayImage = ModelContent.InlineData("image/png",
      UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.grayTexture));

// Provide a text prompt to include with the image
var prompt = ModelContent.Text("What's in this picture?");

// To generate text output, call GenerateContentAsync and pass in the prompt
var response = await model.GenerateContentAsync(new [] { grayImage, prompt });
UnityEngine.Debug.Log(response.Text ?? "No text in response.");

多個檔案輸入


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
var ai = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());

// Create a `GenerativeModel` instance with a model that supports your use case
var model = ai.GetGenerativeModel(modelName: "gemini-2.0-flash");


// Convert Texture2Ds into InlineDataParts
var blackImage = ModelContent.InlineData("image/png",
      UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.blackTexture));
var whiteImage = ModelContent.InlineData("image/png",
      UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.whiteTexture));

// Provide a text prompt to include with the images
var prompt = ModelContent.Text("What's different between these pictures?");

// To generate text output, call GenerateContentAsync and pass in the prompt
var response = await model.GenerateContentAsync(new [] { blackImage, whiteImage, prompt });
UnityEngine.Debug.Log(response.Text ?? "No text in response.");

瞭解如何選擇適合用途和應用程式的模型

逐句顯示回應

在嘗試這個範例前,請先完成本指南的「開始前」一節,設定專案和應用程式。
在該部分,您也需要點選所選Gemini API供應商的按鈕,才能在本頁面上看到供應商專屬內容

您可以不等待模型產生的完整結果,改用串流處理部分結果,以便加快互動速度。如要串流回應,請呼叫 generateContentStream



輸入圖片檔案的規定和建議

請注意,以內嵌資料形式提供的檔案會在傳輸過程中編碼為 base64,因此會增加要求的大小。如果要求過大,您會收到 HTTP 413 錯誤。

請參閱「支援的 Vertex AI Gemini API 輸入檔案和相關規定」一文,瞭解下列項目的詳細資訊:

支援的圖片 MIME 類型

Gemini 多模態模型支援下列圖片 MIME 類型:

圖片 MIME 類型 Gemini 2.0 Flash Gemini 2.0 Flash‑Lite
PNG - image/png
JPEG - image/jpeg
WebP - image/webp

每項要求的限制

圖片的像素數量沒有特定限制。不過,較大的圖片會縮小並填滿,以便符合 3072 x 3072 的最大解析度,同時保留原始顯示比例。

以下是提示要求中允許的圖片檔案數量上限:

  • Gemini 2.0 FlashGemini 2.0 Flash‑Lite:3,000 張圖片



你還可以做些什麼?

試用其他功能

瞭解如何控管內容產生作業

您也可以嘗試使用提示和模型設定,甚至使用 Google AI Studio 取得產生的程式碼片段。

進一步瞭解支援的型號

瞭解可用於各種用途的模型,以及相關配額價格


針對使用 Firebase AI Logic 的體驗提供意見回饋