使用 Gemini Live API 進行雙向串流


Gemini Live API 可讓您與 Gemini 進行低延遲的雙向文字和語音互動。使用 Live API,您可以為使用者提供自然流暢的人類語音對話體驗,並且能夠使用文字或語音指令中斷模型的回覆。這個模型可以處理文字和音訊輸入內容 (影片即將推出),並提供文字和音訊輸出內容。

您可以使用提示和 Vertex AI Studio 中的 Live API 製作原型。

Live API 是具狀態的 API,可建立 WebSocket 連線,以便在用戶端和 Gemini 伺服器之間建立工作階段。詳情請參閱 Live API 參考說明文件

事前準備

僅限使用 Vertex AI Gemini API 做為 API 供應商時使用。

如果您尚未完成,請參閱入門指南,瞭解如何設定 Firebase 專案、將應用程式連結至 Firebase、新增 SDK、初始化 Vertex AI Gemini API 的後端服務,以及建立 LiveModel 例項。

支援這項功能的型號

Live API 僅支援 gemini-2.0-flash-live-preview-04-09 (而非 gemini-2.0-flash)。

使用 Live API 的標準功能

本節將說明如何使用 Live API 的標準功能,特別是串流各種類型的輸入和輸出內容:

從串流文字輸入內容產生串流文字

在嘗試這個範例前,請先完成本指南的「開始前」一節,設定專案和應用程式。
在該部分,您也需要點選所選Gemini API供應商的按鈕,才能在本頁面上看到供應商專屬內容

您可以傳送串流文字輸入內容,並接收串流文字輸出內容。請務必建立 liveModel 例項,並將回應模式設為 Text

Swift

Live API 目前尚未支援 Apple 平台應用程式,但請稍後再回來看看!

Kotlin

// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
val model = Firebase.vertexAI.liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.TEXT 
   }
)

val session = model.connect()

// Provide a text prompt
val text = "tell a short story"

session.send(text)

var outputText = ""
session.receive().collect {
    if(it.status == Status.TURN_COMPLETE) {
        // Optional: if you don't require to send more requests.
        session.stopReceiving();
    }
    outputText = outputText + it.text
}

// Output received from the server.
println(outputText)

Java

ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.vertexAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.TEXT)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();
class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {
    @Override
    public void onSubscribe(Subscription s) {
        s.request(Long.MAX_VALUE); // Request an unlimited number of items
    }
    @Override
    public void onNext(LiveContentResponse liveContentResponse) {
       // Handle the response from the server.
	System.out.println(liveContentResponse.getText());
    }
    @Override
    public void onError(Throwable t) {
        System.err.println("Error: " + t.getMessage());
    }
    @Override
    public void onComplete() {
        System.out.println("Done receiving messages!");
    }
}
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	  LiveSessionFutures session = LiveSessionFutures.from(ses);
        // Provide a text prompt
        String text = "tell me a short story?";
        session.send(text);
        Publisher<LiveContentResponse> publisher = session.receive();
        publisher.subscribe(new LiveContentResponseSubscriber());
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web

Live API 目前尚未支援網頁應用程式,但請稍後再回來看看!

Dart

import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

late LiveModelSession _session;

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
final model = FirebaseAI.vertexAI().liveModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to respond with text
  config: LiveGenerationConfig(responseModalities: [ResponseModality.text]),
);

_session = await model.connect();

// Provide a text prompt
final prompt = Content.text('tell a short story');
await _session.send(input: prompt, turnComplete: true);

// In a separate thread, receive the response
await for (final message in _session.receive()) {
   // Process the received message 
}

Unity

using Firebase;
using Firebase.AI;

async Task SendTextReceiveText() {
  // Initialize the Vertex AI Gemini API backend service
  // Create a `LiveModel` instance with the model that supports the Live API
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.VertexAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Text })
  );

  LiveSession session = await model.ConnectAsync();

  // Provide a text prompt
  var prompt = ModelContent.Text("tell a short story");
  await session.SendAsync(content: prompt, turnComplete: true);

  // Receive the response
  await foreach (var message in session.ReceiveAsync()) {
    // Process the received message
    if (!string.IsNullOrEmpty(message.Text)) {
      UnityEngine.Debug.Log("Received message: " + message.Text);
    }
  }
}

瞭解如何選擇適合用途和應用程式的模型

從串流音訊輸入內容產生串流音訊

在嘗試這個範例前,請先完成本指南的「開始前」一節,設定專案和應用程式。
在該部分,您也需要點選所選Gemini API供應商的按鈕,才能在本頁面上看到供應商專屬內容

您可以傳送串流音訊輸入內容,並接收串流音訊輸出內容。請務必建立 LiveModel 例項,並將回應模式設為 Audio

瞭解如何設定及自訂回應語音 (本頁後續內容)。

Swift

Live API 目前尚未支援 Apple 平台應用程式,但請稍後再回來看看!

Kotlin

// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
val model = Firebase.vertexAI.liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO 
   }
)

val session = model.connect()

// This is the recommended way.
// However, you can create your own recorder and handle the stream.
session.startAudioConversation()

Java

ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.vertexAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.TEXT)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();

Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	 LiveSessionFutures session = LiveSessionFutures.from(ses);
        session.startAudioConversation();
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web

Live API 目前尚未支援網頁應用程式,但請稍後再回來看看!

Dart

import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';

late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
final model = FirebaseAI.vertexAI().liveModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
   // Configure the model to respond with audio
   config: LiveGenerationConfig(responseModalities: [ResponseModality.audio]),
);

_session = await model.connect();

final audioRecordStream = _audioRecorder.startRecordingStream();
// Map the Uint8List stream to InlineDataPart stream
final mediaChunkStream = audioRecordStream.map((data) {
  return InlineDataPart('audio/pcm', data);
});
await _session.startMediaStream(mediaChunkStream);

// In a separate thread, receive the audio response from the model
await for (final message in _session.receive()) {
   // Process the received message 
}

Unity

using Firebase;
using Firebase.AI;

async Task SendTextReceiveAudio() {
  // Initialize the Vertex AI Gemini API backend service
  // Create a `LiveModel` instance with the model that supports the Live API
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.VertexAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with audio
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio })
  );

  LiveSession session = await model.ConnectAsync();

  // Start a coroutine to send audio from the Microphone
  var recordingCoroutine = StartCoroutine(SendAudio(session));

  // Start receiving the response
  await ReceiveAudio(session);
}

IEnumerator SendAudio(LiveSession liveSession) {
  string microphoneDeviceName = null;
  int recordingFrequency = 16000;
  int recordingBufferSeconds = 2;

  var recordingClip = Microphone.Start(microphoneDeviceName, true,
                                       recordingBufferSeconds, recordingFrequency);

  int lastSamplePosition = 0;
  while (true) {
    if (!Microphone.IsRecording(microphoneDeviceName)) {
      yield break;
    }

    int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);

    if (currentSamplePosition != lastSamplePosition) {
      // The Microphone uses a circular buffer, so we need to check if the
      // current position wrapped around to the beginning, and handle it
      // accordingly.
      int sampleCount;
      if (currentSamplePosition > lastSamplePosition) {
        sampleCount = currentSamplePosition - lastSamplePosition;
      } else {
        sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
      }

      if (sampleCount > 0) {
        // Get the audio chunk
        float[] samples = new float[sampleCount];
        recordingClip.GetData(samples, lastSamplePosition);

        // Send the data, discarding the resulting Task to avoid the warning
        _ = liveSession.SendAudioAsync(samples);

        lastSamplePosition = currentSamplePosition;
      }
    }

    // Wait for a short delay before reading the next sample from the Microphone
    const float MicrophoneReadDelay = 0.5f;
    yield return new WaitForSeconds(MicrophoneReadDelay);
  }
}

Queue audioBuffer = new();

async Task ReceiveAudio(LiveSession liveSession) {
  int sampleRate = 24000;
  int channelCount = 1;

  // Create a looping AudioClip to fill with the received audio data
  int bufferSamples = (int)(sampleRate * channelCount);
  AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
                                    sampleRate, true, OnAudioRead);

  // Attach the clip to an AudioSource and start playing it
  AudioSource audioSource = GetComponent();
  audioSource.clip = clip;
  audioSource.loop = true;
  audioSource.Play();

  // Start receiving the response
  await foreach (var message in liveSession.ReceiveAsync()) {
    // Process the received message
    foreach (float[] pcmData in message.AudioAsFloat) {
      lock (audioBuffer) {
        foreach (float sample in pcmData) {
          audioBuffer.Enqueue(sample);
        }
      }
    }
  }
}

// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
  int samplesToProvide = data.Length;
  int samplesProvided = 0;

  lock(audioBuffer) {
    while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
      data[samplesProvided] = audioBuffer.Dequeue();
      samplesProvided++;
    }
  }

  while (samplesProvided < samplesToProvide) {
    data[samplesProvided] = 0.0f;
    samplesProvided++;
  }
}

瞭解如何選擇適合用途和應用程式的模型



打造更引人入勝的互動式體驗

本節說明如何建立及管理 Live API 的互動或交互功能,以提升使用者參與度。

變更回覆語音

Live API 使用 Chirp 3 支援合成語音回應。使用 Firebase AI Logic 時,您可以傳送 5 種 HD 語音和 31 種語言的音訊。

如果未指定語音,預設值為 Puck。或者,您也可以設定模型以下列任一語音回應:

Aoede (女性)
Charon (男性)
Fenrir (男性)
Kore (女性)
Puck (男性)

如要瞭解這些語音的聲音,以及可用的語言完整清單,請參閱 Chirp 3:HD 語音

如要指定語音,請在 speechConfig 物件中設定語音名稱,做為模型設定的一部分:

Swift

Live API 目前尚未支援 Apple 平台應用程式,但請稍後再回來看看!

Kotlin

// ...

val model = Firebase.vertexAI.liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to use a specific voice for its audio response
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        speechConfig = SpeechConfig(voice = Voices.FENRIR)
    }
)

// ...

Java

// ...

LiveModel model = Firebase.getVertexAI().liveModel(
    "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to use a specific voice for its audio response
    new LiveGenerationConfig.Builder()
        .setResponseModalities(ResponseModality.AUDIO)
        .setSpeechConfig(new SpeechConfig(Voices.FENRIR))
        .build()
);

// ...

Web

Live API 目前尚未支援網頁應用程式,但請稍後再回來看看!

Dart

// ...

final model = FirebaseVertexAI.instance.liveModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to use a specific voice for its audio response
  config: LiveGenerationConfig(
    responseModality: ResponseModality.audio,
    speechConfig: SpeechConfig(voice: Voice.fenrir),
  ),
);

// ...

Unity

Snippets coming soon!

如要提示模型以非英語回應,並要求模型以非英語回應,請在系統指示中加入以下內容:

RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.

在各個工作階段和要求中維持情境

您可以使用即時通訊結構,在不同工作階段和要求之間維持背景資訊。請注意,這項功能僅適用於文字輸入和文字輸出。

這種方法最適合用於短暫的內容;您可以傳送即時互動,以代表確切的事件順序。對於較長的背景資訊,建議您提供單一訊息摘要,以便釋出背景資訊視窗,供後續互動使用。

處理中斷

Firebase AI Logic 目前不支援處理中斷情形。請過一陣子再回來查看!

使用函式呼叫 (工具)

您可以定義工具 (例如可用的函式),以便與 Live API 搭配使用,就像使用標準內容產生方法一樣。本節將說明使用 Live API 搭配函式呼叫時的部分細微差異。如需函式呼叫的完整說明和範例,請參閱函式呼叫指南

模型可從單一提示產生多個函式呼叫,以及連結輸出的必要程式碼。這個程式碼會在沙箱環境中執行,產生後續的 BidiGenerateContentToolCall 訊息。執行作業會暫停,直到每個函式呼叫的結果可用為止,藉此確保按順序處理。

此外,使用 Live API 搭配函式呼叫功能特別強大,因為模型可以向使用者要求後續或說明資訊。舉例來說,如果模型沒有足夠的資訊,無法為要呼叫的函式提供參數值,則模型可以要求使用者提供更多或更清楚的資訊。

用戶端應回覆 BidiGenerateContentToolResponse



限制與需求

請注意 Live API 的下列限制和要求。

語音轉錄

Firebase AI Logic 支援語音轉錄功能。請過一陣子再回來查看!

語言

音訊格式

Live API 支援下列音訊格式:

  • 輸入音訊格式:原始 16 位元 PCM 音訊,16 kHz 小端序
  • 輸出音訊格式:24 kHz 小端序的原始 16 位元 PCM 音訊

頻率限制

適用下列頻率限制:

  • 每個 Firebase 專案 10 個並行工作階段
  • 每分鐘 400 萬個符記

工作階段時間長度

工作階段的預設長度為 30 分鐘。當工作階段時間超過限制時,連線就會終止。

模型也受限於脈絡大小。傳送大量輸入內容可能會導致工作階段提早終止。

語音活動偵測 (VAD)

模型會自動對連續音訊輸入串流執行語音活動偵測 (VAD)。VAD 預設為啟用。

符記計數

您無法將 CountTokens API 與 Live API 搭配使用。


針對使用 Firebase AI Logic 的體驗提供意見回饋