I/O is live! Tune in for livestreamed keynotes and sessions May 20-21 starting at 10 AM PT. Watch now.

本頁面由 Cloud Translation API 翻譯而成。

使用 Gemini API 分析影片檔案
透過集合功能整理內容你可以依據偏好儲存及分類內容。

您可以要求 Gemini 模型分析您提供的內嵌 (base64 編碼) 或透過網址提供的影片檔案。使用 Firebase AI Logic 時，您可以直接透過應用程式提出這項要求。

這項功能可讓您執行下列操作：

為影片加上字幕，並回答影片相關問題
使用時間戳記分析影片的特定片段
同時處理音軌和影像影格，轉錄影片內容
描述、區隔及擷取影片中的資訊，包括音軌和影像影格

跳至程式碼範例跳至串流回應程式碼

參閱其他指南，瞭解其他處理影片的選項
產生結構化輸出內容多回合即時通訊

事前準備

按一下您的 Gemini API 供應商，即可在本頁查看供應商專屬內容和程式碼。

如果您尚未完成，請參閱入門指南，瞭解如何設定 Firebase 專案、將應用程式連結至 Firebase、新增 SDK、為所選 Gemini API 供應器初始化後端服務，以及建立 GenerativeModel 例項。

如要測試並重複提示，甚至取得產生的程式碼片段，建議您使用 Google AI Studio。

需要影片檔案範例嗎？

您可以使用這份公開檔案，其 MIME 類型為 video/mp4 (查看或下載檔案)。 https://storage.googleapis.com/cloud-samples-data/video/animals.mp4

從影片檔案 (以 base64 編碼) 產生文字

在嘗試這個範例前，請先完成本指南的「開始前」一節，設定專案和應用程式。
在該部分，您也需要點選所選Gemini API供應商的按鈕，才能在本頁面上看到供應商專屬內容。

您可以要求 Gemini 模型透過文字和影片提示來產生文字，方法是提供每個輸入檔案的 mimeType 和檔案本身。請參閱本頁後續的輸入檔案相關規定和建議。

請注意，此範例顯示如何在內文中提供檔案，但 SDK 也支援提供 YouTube 網址。

Swift

您可以呼叫 generateContent() 來根據文字和影片檔案的多模態輸入內容生成文字。


import FirebaseAI

// Initialize the Gemini Developer API backend service
let ai = FirebaseAI.firebaseAI(backend: .googleAI())

// Create a `GenerativeModel` instance with a model that supports your use case
let model = ai.generativeModel(modelName: "gemini-2.0-flash")


// Provide the video as `Data` with the appropriate MIME type.
let video = InlineDataPart(data: try Data(contentsOf: videoURL), mimeType: "video/mp4")

// Provide a text prompt to include with the video
let prompt = "What is in the video?"

// To generate text output, call generateContent with the text and video
let response = try await model.generateContent(video, prompt)
print(response.text ?? "No text in response.")

Kotlin

您可以呼叫 generateContent() 來根據文字和影片檔案的多模態輸入內容生成文字。

^{對於 Kotlin，這個 SDK 中的函式為暫停函式，需要從協同程式範圍中呼叫。}


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val model = Firebase.ai(backend = GenerativeBackend.googleAI())
                        .generativeModel("gemini-2.0-flash")


val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(videoUri).use { stream ->
  stream?.let {
    val bytes = stream.readBytes()

    // Provide a prompt that includes the video specified above and text
    val prompt = content {
        inlineData(bytes, "video/mp4")
        text("What is in the video?")
    }

    // To generate text output, call generateContent with the prompt
    val response = generativeModel.generateContent(prompt)
    Log.d(TAG, response.text ?: "")
  }
}

Java

您可以呼叫 generateContent() 來根據文字和影片檔案的多模態輸入內容生成文字。

^{對於 Java，這個 SDK 中的各個方法會傳回 ListenableFuture。}


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI())
        .generativeModel("gemini-2.0-flash");

// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures model = GenerativeModelFutures.from(ai);


ContentResolver resolver = getApplicationContext().getContentResolver();
try (InputStream stream = resolver.openInputStream(videoUri)) {
    File videoFile = new File(new URI(videoUri.toString()));
    int videoSize = (int) videoFile.length();
    byte[] videoBytes = new byte[videoSize];
    if (stream != null) {
        stream.read(videoBytes, 0, videoBytes.length);
        stream.close();

        // Provide a prompt that includes the video specified above and text
        Content prompt = new Content.Builder()
                .addInlineData(videoBytes, "video/mp4")
                .addText("What is in the video?")
                .build();

        // To generate text output, call generateContent with the prompt
        ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt);
        Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
            @Override
            public void onSuccess(GenerateContentResponse result) {
                String resultText = result.getText();
                System.out.println(resultText);
            }

            @Override
            public void onFailure(Throwable t) {
                t.printStackTrace();
            }
        }, executor);
    }
} catch (IOException e) {
    e.printStackTrace();
} catch (URISyntaxException e) {
    e.printStackTrace();
}

Web

您可以呼叫 generateContent() 來根據文字和影片檔案的多模態輸入內容生成文字。


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, { model: "gemini-2.0-flash" });


// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the video
  const prompt = "What do you see?";

  const fileInputEl = document.querySelector("input[type=file]");
  const videoPart = await fileToGenerativePart(fileInputEl.files[0]);

  // To generate text output, call generateContent with the text and video
  const result = await model.generateContent([prompt, videoPart]);

  const response = result.response;
  const text = response.text();
  console.log(text);
}

run();

Dart

您可以呼叫 generateContent() 來根據文字和影片檔案的多模態輸入內容生成文字。


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

// Initialize FirebaseApp
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final model =
      FirebaseAI.googleAI().generativeModel(model: 'gemini-2.0-flash');


// Provide a text prompt to include with the video
final prompt = TextPart("What's in the video?");

// Prepare video for input
final video = await File('video0.mp4').readAsBytes();

// Provide the video as `Data` with the appropriate mimetype
final videoPart = InlineDataPart('video/mp4', video);

// To generate text output, call generateContent with the text and images
final response = await model.generateContent([
  Content.multi([prompt, ...videoPart])
]);
print(response.text);

Unity

您可以呼叫 GenerateContentAsync() 來根據文字和影片檔案的多模態輸入內容生成文字。


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
var ai = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());

// Create a `GenerativeModel` instance with a model that supports your use case
var model = ai.GetGenerativeModel(modelName: "gemini-2.0-flash");


// Provide the video as `data` with the appropriate MIME type.
var video = ModelContent.InlineData("video/mp4",
      System.IO.File.ReadAllBytes(System.IO.Path.Combine(
          UnityEngine.Application.streamingAssetsPath, "yourVideo.mp4")));

// Provide a text prompt to include with the video
var prompt = ModelContent.Text("What is in the video?");

// To generate text output, call GenerateContentAsync with the text and video
var response = await model.GenerateContentAsync(new [] { video, prompt });
UnityEngine.Debug.Log(response.Text ?? "No text in response.");

瞭解如何選擇適合用途和應用程式的模型。

逐句顯示回應

在嘗試這個範例前，請先完成本指南的「開始前」一節，設定專案和應用程式。
在該部分，您也需要點選所選Gemini API供應商的按鈕，才能在本頁面上看到供應商專屬內容。

您可以不等待模型產生的完整結果，改用串流處理部分結果，以便加快互動速度。如要串流回應，請呼叫 generateContentStream。

觀看範例：從影片檔案串流產生的文字

Swift

您可以呼叫 generateContentStream()，從多模態文字輸入和單一影片中，串流傳輸所生成的文字。


import FirebaseAI

// Initialize the Gemini Developer API backend service
let ai = FirebaseAI.firebaseAI(backend: .googleAI())

// Create a `GenerativeModel` instance with a model that supports your use case
let model = ai.generativeModel(modelName: "gemini-2.0-flash")


// Provide the video as `Data` with the appropriate MIME type
let video = InlineDataPart(data: try Data(contentsOf: videoURL), mimeType: "video/mp4")

// Provide a text prompt to include with the video
let prompt = "What is in the video?"

// To stream generated text output, call generateContentStream with the text and video
let contentStream = try model.generateContentStream(video, prompt)
for try await chunk in contentStream {
  if let text = chunk.text {
    print(text)
  }
}

Kotlin

您可以呼叫 generateContentStream()，從多模態文字輸入和單一影片中，串流傳輸所生成的文字。

^{對於 Kotlin，這個 SDK 中的函式為暫停函式，需要從協同程式範圍中呼叫。}


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val model = Firebase.ai(backend = GenerativeBackend.googleAI())
                        .generativeModel("gemini-2.0-flash")


val contentResolver = applicationContext.contentResolver
contentResolver.openInputStream(videoUri).use { stream ->
  stream?.let {
    val bytes = stream.readBytes()

    // Provide a prompt that includes the video specified above and text
    val prompt = content {
        inlineData(bytes, "video/mp4")
        text("What is in the video?")
    }

    // To stream generated text output, call generateContentStream with the prompt
    var fullResponse = ""
    generativeModel.generateContentStream(prompt).collect { chunk ->
        Log.d(TAG, chunk.text ?: "")
        fullResponse += chunk.text
    }
  }
}

Java

您可以呼叫 generateContentStream()，從多模態文字輸入和單一影片中，串流傳輸所生成的文字。

^{對於 Java，這個 SDK 中的串流方法會傳回 Reactive Streams 程式庫中的 Publisher 類型。}


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI())
        .generativeModel("gemini-2.0-flash");

// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures model = GenerativeModelFutures.from(ai);


ContentResolver resolver = getApplicationContext().getContentResolver();
try (InputStream stream = resolver.openInputStream(videoUri)) {
    File videoFile = new File(new URI(videoUri.toString()));
    int videoSize = (int) videoFile.length();
    byte[] videoBytes = new byte[videoSize];
    if (stream != null) {
        stream.read(videoBytes, 0, videoBytes.length);
        stream.close();

        // Provide a prompt that includes the video specified above and text
        Content prompt = new Content.Builder()
                .addInlineData(videoBytes, "video/mp4")
                .addText("What is in the video?")
                .build();

        // To stream generated text output, call generateContentStream with the prompt
        Publisher<GenerateContentResponse> streamingResponse =
                model.generateContentStream(prompt);

        final String[] fullResponse = {""};

        streamingResponse.subscribe(new Subscriber<GenerateContentResponse>() {
            @Override
            public void onNext(GenerateContentResponse generateContentResponse) {
                String chunk = generateContentResponse.getText();
                fullResponse[0] += chunk;
            }

            @Override
            public void onComplete() {
                System.out.println(fullResponse[0]);
            }

            @Override
            public void onError(Throwable t) {
                t.printStackTrace();
            }

            @Override
            public void onSubscribe(Subscription s) {
            }
         });
    }
} catch (IOException e) {
    e.printStackTrace();
} catch (URISyntaxException e) {
    e.printStackTrace();
}

Web

您可以呼叫 generateContentStream()，從多模態文字輸入和單一影片中，串流傳輸所生成的文字。


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, { model: "gemini-2.0-flash" });


// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the video
  const prompt = "What do you see?";

  const fileInputEl = document.querySelector("input[type=file]");
  const videoPart = await fileToGenerativePart(fileInputEl.files[0]);

  // To stream generated text output, call generateContentStream with the text and video
  const result = await model.generateContentStream([prompt, videoPart]);

  for await (const chunk of result.stream) {
    const chunkText = chunk.text();
    console.log(chunkText);
  }
}

run();

Dart

您可以呼叫 generateContentStream()，從多模態輸入的文字和單一影片，串流生成文字。


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

// Initialize FirebaseApp
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final model =
      FirebaseAI.googleAI().generativeModel(model: 'gemini-2.0-flash');


// Provide a text prompt to include with the video
final prompt = TextPart("What's in the video?");

// Prepare video for input
final video = await File('video0.mp4').readAsBytes();

// Provide the video as `Data` with the appropriate mimetype
final videoPart = InlineDataPart('video/mp4', video);

// To stream generated text output, call generateContentStream with the text and image
final response = await model.generateContentStream([
  Content.multi([prompt,videoPart])
]);
await for (final chunk in response) {
  print(chunk.text);
}

Unity

您可以呼叫 GenerateContentStreamAsync()，從多模態文字輸入和單一影片中，串流傳輸所生成的文字。


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
var ai = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());

// Create a `GenerativeModel` instance with a model that supports your use case
var model = ai.GetGenerativeModel(modelName: "gemini-2.0-flash");


// Provide the video as `data` with the appropriate MIME type.
var video = ModelContent.InlineData("video/mp4",
      System.IO.File.ReadAllBytes(System.IO.Path.Combine(
          UnityEngine.Application.streamingAssetsPath, "yourVideo.mp4")));

// Provide a text prompt to include with the video
var prompt = ModelContent.Text("What is in the video?");

// To stream generated text output, call GenerateContentStreamAsync with the text and video
var responseStream = model.GenerateContentStreamAsync(new [] { video, prompt });
await foreach (var response in responseStream) {
  if (!string.IsNullOrWhiteSpace(response.Text)) {
    UnityEngine.Debug.Log(response.Text);
  }
}

輸入影片檔案的規定和建議

請注意，以內嵌資料形式提供的檔案會在傳輸過程中編碼為 base64，因此會增加要求的大小。如果要求過大，您會收到 HTTP 413 錯誤。

請參閱「支援的 Vertex AI Gemini API 輸入檔案和相關規定」一文，瞭解下列項目的詳細資訊：

在要求中提供檔案的不同選項 (內嵌或使用檔案的網址或 URI)
影片檔案的相關規定和最佳做法

支援的影片 MIME 類型

Gemini 多模態模型支援下列影片 MIME 類型：

影片 MIME 類型	Gemini 2.0 Flash	Gemini 2.0 Flash‑Lite
FLV - `video/x-flv`
MOV - `video/quicktime`
MPEG - `video/mpeg`
MPEGPS - `video/mpegps`
MPG - `video/mpg`
MP4 - `video/mp4`
WEBM - `video/webm`
WMV - `video/wmv`
3GPP - `video/3gpp`

每項要求的限制

以下是提示要求中允許的影片檔案數量上限：

Gemini 2.0 Flash 和 Gemini 2.0 Flash‑Lite：10 個影片檔案

你還可以做些什麼？

瞭解如何計算符號，再將長提示傳送至模型。
設定 Cloud Storage for Firebase，這樣您就能在多模態要求中加入大型檔案，並透過更妥善的解決方案在提示中提供檔案。檔案可包含圖片、PDF、影片和音訊。
開始著手準備正式版 (請參閱正式版檢查清單)，包括：
- 設定 Firebase App Check，以免 Gemini API 遭到未經授權的用戶端濫用。
- 整合 Firebase Remote Config，無須發布新版應用程式，即可更新應用程式中的值 (例如型號名稱)。

試用其他功能

建構多輪對話 (聊天)。
使用文字提示來生成文字。
從文字和多模態提示產生結構化輸出內容 (例如 JSON)。
使用文字提示生成圖片。
使用函式呼叫，將生成模型連結至外部系統和資訊。

瞭解如何控管內容產生作業

瞭解提示設計，包括最佳做法、策略和提示範例。
設定模型參數，例如溫度參數和輸出符記數量上限 (適用於 Gemini)，或顯示比例和人物生成 (適用於 Imagen)。
使用安全性設定，調整可能會收到有害回應的機率。

您也可以嘗試使用提示和模型設定，甚至使用 Google AI Studio 取得產生的程式碼片段。

進一步瞭解支援的型號

瞭解可用於各種用途的模型，以及相關配額和價格。

針對使用 Firebase AI Logic 的體驗提供意見回饋

除非另有註明，否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權，程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。

上次更新時間：2025-05-22 (世界標準時間)。