Gemini Live API 支持与 Gemini 进行低延迟的双向文本和语音互动。借助 Live API,您可以为最终用户提供自然的、类似人类的语音对话体验,并能够使用文本或语音指令中断模型的回答。该模型可以处理文本和音频输入(视频即将推出!),并提供文本和音频输出。
您可以在 Vertex AI Studio 中使用提示和 Live API 进行原型设计。
Live API 是一个有状态的 API,用于创建 WebSocket 连接,以便在客户端和 Gemini 服务器之间建立会话。如需了解详情,请参阅 Live API 参考文档。
准备工作
仅在将 Vertex AI Gemini API 用作 API 提供程序时可用。 |
如果您尚未完成入门指南,请先完成该指南。该指南介绍了如何设置 Firebase 项目、将应用连接到 Firebase、添加 SDK、初始化 Vertex AI Gemini API 的后端服务,以及创建 LiveModel
实例。
支持此功能的模型
只有 gemini-2.0-flash-live-preview-04-09
(而非 gemini-2.0-flash
)支持 Live API。
使用 Live API 的标准功能
本部分介绍了如何使用 Live API 的标准功能,尤其是如何流式传输各种类型的输入和输出:
根据流式文本输入生成流式文本
在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。 在此部分中,您还需要点击所选 Gemini API 提供方的按钮,以便在本页上看到特定于该提供方的相关内容。 |
您可以发送流式文本输入,并接收流式文本输出。请务必创建 liveModel
实例,并将响应模式设为 Text
。
Swift
Apple 平台应用尚不支持 Live API,但请稍后再回来查看!
Kotlin
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
val model = Firebase.vertexAI.liveModel(
modelName = "gemini-2.0-flash-live-preview-04-09",
// Configure the model to respond with text
generationConfig = liveGenerationConfig {
responseModality = ResponseModality.TEXT
}
)
val session = model.connect()
// Provide a text prompt
val text = "tell a short story"
session.send(text)
var outputText = ""
session.receive().collect {
if(it.status == Status.TURN_COMPLETE) {
// Optional: if you don't require to send more requests.
session.stopReceiving();
}
outputText = outputText + it.text
}
// Output received from the server.
println(outputText)
Java
ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.vertexAI()).liveModel(
"gemini-2.0-flash-live-preview-04-09",
// Configure the model to respond with text
new LiveGenerationConfig.Builder()
.setResponseModalities(ResponseModality.TEXT)
.build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture = model.connect();
class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {
@Override
public void onSubscribe(Subscription s) {
s.request(Long.MAX_VALUE); // Request an unlimited number of items
}
@Override
public void onNext(LiveContentResponse liveContentResponse) {
// Handle the response from the server.
System.out.println(liveContentResponse.getText());
}
@Override
public void onError(Throwable t) {
System.err.println("Error: " + t.getMessage());
}
@Override
public void onComplete() {
System.out.println("Done receiving messages!");
}
}
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
@Override
public void onSuccess(LiveSession ses) {
LiveSessionFutures session = LiveSessionFutures.from(ses);
// Provide a text prompt
String text = "tell me a short story?";
session.send(text);
Publisher<LiveContentResponse> publisher = session.receive();
publisher.subscribe(new LiveContentResponseSubscriber());
}
@Override
public void onFailure(Throwable t) {
// Handle exceptions
}
}, executor);
Web
Web 应用尚不支持 Live API,但请稍后再回来看看!
Dart
import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
late LiveModelSession _session;
await Firebase.initializeApp(
options: DefaultFirebaseOptions.currentPlatform,
);
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
final model = FirebaseAI.vertexAI().liveModel(
model: 'gemini-2.0-flash-live-preview-04-09',
// Configure the model to respond with text
config: LiveGenerationConfig(responseModalities: [ResponseModality.text]),
);
_session = await model.connect();
// Provide a text prompt
final prompt = Content.text('tell a short story');
await _session.send(input: prompt, turnComplete: true);
// In a separate thread, receive the response
await for (final message in _session.receive()) {
// Process the received message
}
Unity
using Firebase;
using Firebase.AI;
async Task SendTextReceiveText() {
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
var model = FirebaseAI.GetInstance(FirebaseAI.Backend.VertexAI()).GetLiveModel(
modelName: "gemini-2.0-flash-live-preview-04-09",
// Configure the model to respond with text
liveGenerationConfig: new LiveGenerationConfig(
responseModalities: new[] { ResponseModality.Text })
);
LiveSession session = await model.ConnectAsync();
// Provide a text prompt
var prompt = ModelContent.Text("tell a short story");
await session.SendAsync(content: prompt, turnComplete: true);
// Receive the response
await foreach (var message in session.ReceiveAsync()) {
// Process the received message
if (!string.IsNullOrEmpty(message.Text)) {
UnityEngine.Debug.Log("Received message: " + message.Text);
}
}
}
了解如何选择适合您的应用场景和应用的模型。
从流式音频输入生成流式音频
在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。 在此部分中,您还需要点击所选 Gemini API 提供方的按钮,以便在本页上看到特定于该提供方的相关内容。 |
您可以发送流式音频输入并接收流式音频输出。请务必创建 LiveModel
实例,并将响应模式设置为 Audio
。
了解如何配置和自定义回答语音(本页面下文)。
Swift
Apple 平台应用尚不支持 Live API,但请稍后再回来查看!
Kotlin
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
val model = Firebase.vertexAI.liveModel(
modelName = "gemini-2.0-flash-live-preview-04-09",
// Configure the model to respond with text
generationConfig = liveGenerationConfig {
responseModality = ResponseModality.AUDIO
}
)
val session = model.connect()
// This is the recommended way.
// However, you can create your own recorder and handle the stream.
session.startAudioConversation()
Java
ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.vertexAI()).liveModel(
"gemini-2.0-flash-live-preview-04-09",
// Configure the model to respond with text
new LiveGenerationConfig.Builder()
.setResponseModalities(ResponseModality.TEXT)
.build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture = model.connect();
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
@Override
public void onSuccess(LiveSession ses) {
LiveSessionFutures session = LiveSessionFutures.from(ses);
session.startAudioConversation();
}
@Override
public void onFailure(Throwable t) {
// Handle exceptions
}
}, executor);
Web
Web 应用尚不支持 Live API,但请稍后再回来看看!
Dart
import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';
late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();
await Firebase.initializeApp(
options: DefaultFirebaseOptions.currentPlatform,
);
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
final model = FirebaseAI.vertexAI().liveModel(
model: 'gemini-2.0-flash-live-preview-04-09',
// Configure the model to respond with audio
config: LiveGenerationConfig(responseModalities: [ResponseModality.audio]),
);
_session = await model.connect();
final audioRecordStream = _audioRecorder.startRecordingStream();
// Map the Uint8List stream to InlineDataPart stream
final mediaChunkStream = audioRecordStream.map((data) {
return InlineDataPart('audio/pcm', data);
});
await _session.startMediaStream(mediaChunkStream);
// In a separate thread, receive the audio response from the model
await for (final message in _session.receive()) {
// Process the received message
}
Unity
using Firebase;
using Firebase.AI;
async Task SendTextReceiveAudio() {
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
var model = FirebaseAI.GetInstance(FirebaseAI.Backend.VertexAI()).GetLiveModel(
modelName: "gemini-2.0-flash-live-preview-04-09",
// Configure the model to respond with audio
liveGenerationConfig: new LiveGenerationConfig(
responseModalities: new[] { ResponseModality.Audio })
);
LiveSession session = await model.ConnectAsync();
// Start a coroutine to send audio from the Microphone
var recordingCoroutine = StartCoroutine(SendAudio(session));
// Start receiving the response
await ReceiveAudio(session);
}
IEnumerator SendAudio(LiveSession liveSession) {
string microphoneDeviceName = null;
int recordingFrequency = 16000;
int recordingBufferSeconds = 2;
var recordingClip = Microphone.Start(microphoneDeviceName, true,
recordingBufferSeconds, recordingFrequency);
int lastSamplePosition = 0;
while (true) {
if (!Microphone.IsRecording(microphoneDeviceName)) {
yield break;
}
int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);
if (currentSamplePosition != lastSamplePosition) {
// The Microphone uses a circular buffer, so we need to check if the
// current position wrapped around to the beginning, and handle it
// accordingly.
int sampleCount;
if (currentSamplePosition > lastSamplePosition) {
sampleCount = currentSamplePosition - lastSamplePosition;
} else {
sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
}
if (sampleCount > 0) {
// Get the audio chunk
float[] samples = new float[sampleCount];
recordingClip.GetData(samples, lastSamplePosition);
// Send the data, discarding the resulting Task to avoid the warning
_ = liveSession.SendAudioAsync(samples);
lastSamplePosition = currentSamplePosition;
}
}
// Wait for a short delay before reading the next sample from the Microphone
const float MicrophoneReadDelay = 0.5f;
yield return new WaitForSeconds(MicrophoneReadDelay);
}
}
Queue audioBuffer = new();
async Task ReceiveAudio(LiveSession liveSession) {
int sampleRate = 24000;
int channelCount = 1;
// Create a looping AudioClip to fill with the received audio data
int bufferSamples = (int)(sampleRate * channelCount);
AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
sampleRate, true, OnAudioRead);
// Attach the clip to an AudioSource and start playing it
AudioSource audioSource = GetComponent();
audioSource.clip = clip;
audioSource.loop = true;
audioSource.Play();
// Start receiving the response
await foreach (var message in liveSession.ReceiveAsync()) {
// Process the received message
foreach (float[] pcmData in message.AudioAsFloat) {
lock (audioBuffer) {
foreach (float sample in pcmData) {
audioBuffer.Enqueue(sample);
}
}
}
}
}
// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
int samplesToProvide = data.Length;
int samplesProvided = 0;
lock(audioBuffer) {
while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
data[samplesProvided] = audioBuffer.Dequeue();
samplesProvided++;
}
}
while (samplesProvided < samplesToProvide) {
data[samplesProvided] = 0.0f;
samplesProvided++;
}
}
了解如何选择适合您的应用场景和应用的模型。
打造更具吸引力和互动性的体验
本部分介绍了如何创建和管理 Live API 的更具吸引力或互动性的功能。
更改回答语音
Live API 使用 Chirp 3 来支持合成语音响应。使用 Firebase AI Logic 时,您可以发送 5 种高清语音和 31 种语言的音频。
如果未指定语音,则默认为 Puck
。或者,您也可以将模型配置为使用以下任一声音进行响应:
Aoede (女性)Charon (男性) |
Fenrir (男)Kore (女) |
Puck (男) |
如需收听这些语音的示例,以及查看可用语言的完整列表,请参阅 Chirp 3:高清语音。
如需指定语音,请在 speechConfig
对象中设置语音名称,作为模型配置的一部分:
Swift
Apple 平台应用尚不支持 Live API,但请稍后再回来查看!
Kotlin
// ...
val model = Firebase.vertexAI.liveModel(
modelName = "gemini-2.0-flash-live-preview-04-09",
// Configure the model to use a specific voice for its audio response
generationConfig = liveGenerationConfig {
responseModality = ResponseModality.AUDIO
speechConfig = SpeechConfig(voice = Voices.FENRIR)
}
)
// ...
Java
// ...
LiveModel model = Firebase.getVertexAI().liveModel(
"gemini-2.0-flash-live-preview-04-09",
// Configure the model to use a specific voice for its audio response
new LiveGenerationConfig.Builder()
.setResponseModalities(ResponseModality.AUDIO)
.setSpeechConfig(new SpeechConfig(Voices.FENRIR))
.build()
);
// ...
Web
Web 应用尚不支持 Live API,但请稍后再回来看看!
Dart
// ...
final model = FirebaseVertexAI.instance.liveModel(
model: 'gemini-2.0-flash-live-preview-04-09',
// Configure the model to use a specific voice for its audio response
config: LiveGenerationConfig(
responseModality: ResponseModality.audio,
speechConfig: SpeechConfig(voice: Voice.fenrir),
),
);
// ...
Unity
Snippets coming soon!
如需提示模型以非英语语言进行回答,并要求模型以非英语语言进行回答,请在系统说明中添加以下内容:
RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.
跨会话和请求维护上下文
您可以使用聊天结构在会话和请求中维护上下文。请注意,这仅适用于文本输入和文本输出。
此方法最适合短时情境;您可以发送精细导航互动来表示事件的确切顺序。对于较长的上下文,我们建议提供单个消息摘要,以便释放上下文窗口以供后续互动。
处理中断
Firebase AI Logic 尚不支持处理中断。请过段时间再来查看!
使用函数调用(工具)
您可以定义工具(例如可用函数),以便与实时 API 搭配使用,就像使用标准内容生成方法一样。本部分介绍了将 Live API 与函数调用搭配使用时的一些细微之处。如需有关函数调用的完整说明和示例,请参阅函数调用指南。
从单个提示中,模型可以生成多个函数调用以及串联其输出所需的代码。此代码在沙盒环境中执行,会生成后续的 BidiGenerateContentToolCall
消息。执行会暂停,直到每个函数调用的结果都可用,以确保顺序处理。
此外,将 Live API 与函数调用搭配使用非常强大,因为模型可以请求用户提供后续信息或澄清信息。例如,如果模型没有足够的信息来向其要调用的函数提供参数值,则模型可以要求用户提供更多信息或澄清信息。
客户端应返回 BidiGenerateContentToolResponse
。
限制和要求
请注意 Live API 的以下限制和要求。
转录
Firebase AI Logic 尚不支持转写功能。请过段时间再来查看!
语言
- 输入语言:请参阅Gemini 模型支持的输入语言的完整列表
- 输出语言:请参阅 Chirp 3:高清语音,查看可用的输出语言的完整列表
音频格式
Live API 支持以下音频格式:
- 输入音频格式:16kHz 小端字节序的原始 16 位 PCM 音频
- 输出音频格式:24kHz 小端字节序的原始 16 位 PCM 音频
速率限制
以下速率限制适用:
- 每个 Firebase 项目 10 个并发会话
- 每分钟 400 万个 token
会话时长
会话的默认时长为 30 分钟。当会话时长超出限制时,连接会终止。
模型还受上下文大小的限制。发送大量输入内容可能会导致会话提前终止。
语音活动检测 (VAD)
该模型会自动对连续的音频输入流执行语音活动检测 (VAD)。VAD 默认处于启用状态。
令牌计数
您不能将 CountTokens
API 与 Live API 搭配使用。
就您使用 Firebase AI Logic 的体验提供反馈