All Gemini 1.0 and Gemini 1.5 models are now retired.
To avoid service disruption, update to a newer model (for example, gemini-2.5-flash-lite). Learn more.

Trang này được dịch bởi Cloud Translation API.

Truyền trực tuyến hai chiều bằng Gemini Live API

Gemini Live API cho phép tương tác hai chiều bằng văn bản và giọng nói với độ trễ thấp với Gemini. Khi sử dụng Live API, bạn có thể mang đến cho người dùng cuối trải nghiệm đàm thoại tự nhiên như giọng nói của con người, đồng thời có thể ngắt lời phản hồi của mô hình bằng lệnh thoại hoặc văn bản. Mô hình này có thể xử lý dữ liệu đầu vào là văn bản và âm thanh (sắp có video!) và có thể cung cấp dữ liệu đầu ra là văn bản và âm thanh.

Bạn có thể tạo mẫu bằng câu lệnh và Live API trong Google AI Studio hoặc Vertex AI Studio.

Live API là một API có trạng thái, tạo kết nối WebSocket để thiết lập một phiên giữa ứng dụng và máy chủ Gemini. Để biết thông tin chi tiết, hãy xem tài liệu tham khảo về Live API (Gemini Developer API | Vertex AI Gemini API).

Trước khi bắt đầu

Nhấp vào nhà cung cấp Gemini API để xem nội dung và mã dành riêng cho nhà cung cấp trên trang này.

Nếu bạn chưa thực hiện, hãy hoàn tất hướng dẫn bắt đầu sử dụng. Hướng dẫn này mô tả cách thiết lập dự án Firebase, kết nối ứng dụng với Firebase, thêm SDK, khởi chạy dịch vụ phụ trợ cho nhà cung cấp Gemini API mà bạn chọn và tạo một phiên bản LiveModel.

Các mô hình hỗ trợ tính năng này

Các mô hình hỗ trợ Live API phụ thuộc vào nhà cung cấp Gemini API mà bạn chọn.

Gemini Developer API
- gemini-live-2.5-flash (GA riêng tư^*)
- gemini-live-2.5-flash-preview
- gemini-2.0-flash-live-001
- gemini-2.0-flash-live-preview-04-09
Vertex AI Gemini API
- gemini-live-2.5-flash (GA riêng tư^*)
- gemini-2.0-flash-live-preview-04-09 (chỉ có thể truy cập trong us-central1)

Xin lưu ý rằng đối với tên mô hình 2.5 cho Live API, phân đoạn live sẽ ngay lập tức theo sau phân đoạn gemini.

^{* Liên hệ với người đại diện của nhóm quản lý tài khoản Google Cloud để yêu cầu cấp quyền truy cập.}

Sử dụng các tính năng tiêu chuẩn của Live API

Phần này mô tả cách sử dụng các tính năng tiêu chuẩn của Live API, cụ thể là để truyền phát trực tiếp nhiều loại đầu vào và đầu ra:

Gửi và nhận tin nhắn văn bản
Gửi và nhận âm thanh
Gửi tin nhắn thoại và nhận tin nhắn văn bản
Gửi văn bản và nhận âm thanh

Tạo văn bản truyền trực tiếp từ dữ liệu đầu vào văn bản truyền trực tiếp

Trước khi dùng thử mẫu này, hãy hoàn tất phần Trước khi bắt đầu của hướng dẫn này để thiết lập dự án và ứng dụng của bạn.
Trong phần đó, bạn cũng sẽ nhấp vào một nút cho nhà cung cấp Gemini API mà bạn chọn để xem nội dung dành riêng cho nhà cung cấp trên trang này.

Bạn có thể gửi dữ liệu đầu vào văn bản truyền trực tuyến và nhận dữ liệu đầu ra văn bản truyền trực tuyến. Đảm bảo bạn tạo một phiên bản liveModel và đặt phương thức phản hồi thành Text.

Swift

Live API hiện chưa được hỗ trợ cho các ứng dụng trên nền tảng Apple, nhưng hãy sớm kiểm tra lại!

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.TEXT 
   }
)

val session = model.connect()

// Provide a text prompt
val text = "tell a short story"

session.send(text)

var outputText = ""
session.receive().collect {
    if(it.turnComplete) {
        // Optional: if you don't require to send more requests.
        session.stopReceiving();
    }
    outputText = outputText + it.text
}

// Output received from the server.
println(outputText)

Java


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.TEXT)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();
class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {
    @Override
    public void onSubscribe(Subscription s) {
        s.request(Long.MAX_VALUE); // Request an unlimited number of items
    }
    @Override
    public void onNext(LiveContentResponse liveContentResponse) {
       // Handle the response from the server.
	System.out.println(liveContentResponse.getText());
    }
    @Override
    public void onError(Throwable t) {
        System.err.println("Error: " + t.getMessage());
    }
    @Override
    public void onComplete() {
        System.out.println("Done receiving messages!");
    }
}
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	  LiveSessionFutures session = LiveSessionFutures.from(ses);
        // Provide a text prompt
        String text = "tell me a short story?";
        session.send(text);
        Publisher<LiveContentResponse> publisher = session.receive();
        publisher.subscribe(new LiveContentResponseSubscriber());
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with text
  generationConfig: {
    responseModalities: [ResponseModality.TEXT],
  },
});

const session = await model.connect();

// Provide a text prompt
const prompt = "tell a short story";
session.send(prompt);

// Collect text from model's turn
let text = "";
const messages = session.receive();
for await (const message of messages) {
  switch (message.type) {
    case "serverContent":
      if (message.turnComplete) {
        console.log(text);
      } else {
        const parts = message.modelTurn?.parts;
        if (parts) {
          text += parts.map((part) => part.text).join("");
        }
      }
      break;
    case "toolCall":
      // Ignore
    case "toolCallCancellation":
      // Ignore
  }
}

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

late LiveModelSession _session;

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
final model = FirebaseAI.googleAI().liveModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to respond with text
  config: LiveGenerationConfig(responseModalities: [ResponseModality.text]),
);

_session = await model.connect();

// Provide a text prompt
final prompt = Content.text('tell a short story');
await _session.send(input: prompt, turnComplete: true);

// In a separate thread, receive the response
await for (final message in _session.receive()) {
   // Process the received message 
}

Unity


using Firebase;
using Firebase.AI;

async Task SendTextReceiveText() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Text })
  );

  LiveSession session = await model.ConnectAsync();

  // Provide a text prompt
  var prompt = ModelContent.Text("tell a short story");
  await session.SendAsync(content: prompt, turnComplete: true);

  // Receive the response
  await foreach (var message in session.ReceiveAsync()) {
    // Process the received message
    if (!string.IsNullOrEmpty(message.Text)) {
      UnityEngine.Debug.Log("Received message: " + message.Text);
    }
  }
}

Tạo âm thanh truyền trực tuyến từ đầu vào âm thanh truyền trực tuyến

Bạn có thể gửi dữ liệu đầu vào âm thanh truyền trực tuyến và nhận dữ liệu đầu ra âm thanh truyền trực tuyến. Đảm bảo tạo một thực thể LiveModel và đặt phương thức phản hồi thành Audio.

Tìm hiểu cách định cấu hình và tuỳ chỉnh giọng nói phản hồi (ở phần sau trên trang này).

Swift

Live API hiện chưa được hỗ trợ cho các ứng dụng trên nền tảng Apple, nhưng hãy sớm kiểm tra lại!

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO 
   }
)

val session = model.connect()

// This is the recommended way.
// However, you can create your own recorder and handle the stream.
session.startAudioConversation()

Java


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.TEXT)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();

Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	 LiveSessionFutures session = LiveSessionFutures.from(ses);
        session.startAudioConversation();
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with audio
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
  },
});

const session = await model.connect();

// Start the audio conversation
const audioConversationController = await startAudioConversation(session);

// ... Later, to stop the audio conversation
// await audioConversationController.stop()

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';

late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
final model = FirebaseAI.googleAI().liveModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
   // Configure the model to respond with audio
   config: LiveGenerationConfig(responseModalities: [ResponseModality.audio]),
);

_session = await model.connect();

final audioRecordStream = _audioRecorder.startRecordingStream();
// Map the Uint8List stream to InlineDataPart stream
final mediaChunkStream = audioRecordStream.map((data) {
  return InlineDataPart('audio/pcm', data);
});
await _session.startMediaStream(mediaChunkStream);

// In a separate thread, receive the audio response from the model
await for (final message in _session.receive()) {
   // Process the received message 
}

Unity


using Firebase;
using Firebase.AI;

async Task SendTextReceiveAudio() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with audio
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio })
  );

  LiveSession session = await model.ConnectAsync();

  // Start a coroutine to send audio from the Microphone
  var recordingCoroutine = StartCoroutine(SendAudio(session));

  // Start receiving the response
  await ReceiveAudio(session);
}

IEnumerator SendAudio(LiveSession liveSession) {
  string microphoneDeviceName = null;
  int recordingFrequency = 16000;
  int recordingBufferSeconds = 2;

  var recordingClip = Microphone.Start(microphoneDeviceName, true,
                                       recordingBufferSeconds, recordingFrequency);

  int lastSamplePosition = 0;
  while (true) {
    if (!Microphone.IsRecording(microphoneDeviceName)) {
      yield break;
    }

    int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);

    if (currentSamplePosition != lastSamplePosition) {
      // The Microphone uses a circular buffer, so we need to check if the
      // current position wrapped around to the beginning, and handle it
      // accordingly.
      int sampleCount;
      if (currentSamplePosition > lastSamplePosition) {
        sampleCount = currentSamplePosition - lastSamplePosition;
      } else {
        sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
      }

      if (sampleCount > 0) {
        // Get the audio chunk
        float[] samples = new float[sampleCount];
        recordingClip.GetData(samples, lastSamplePosition);

        // Send the data, discarding the resulting Task to avoid the warning
        _ = liveSession.SendAudioAsync(samples);

        lastSamplePosition = currentSamplePosition;
      }
    }

    // Wait for a short delay before reading the next sample from the Microphone
    const float MicrophoneReadDelay = 0.5f;
    yield return new WaitForSeconds(MicrophoneReadDelay);
  }
}

Queue audioBuffer = new();

async Task ReceiveAudio(LiveSession liveSession) {
  int sampleRate = 24000;
  int channelCount = 1;

  // Create a looping AudioClip to fill with the received audio data
  int bufferSamples = (int)(sampleRate * channelCount);
  AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
                                    sampleRate, true, OnAudioRead);

  // Attach the clip to an AudioSource and start playing it
  AudioSource audioSource = GetComponent();
  audioSource.clip = clip;
  audioSource.loop = true;
  audioSource.Play();

  // Start receiving the response
  await foreach (var message in liveSession.ReceiveAsync()) {
    // Process the received message
    foreach (float[] pcmData in message.AudioAsFloat) {
      lock (audioBuffer) {
        foreach (float sample in pcmData) {
          audioBuffer.Enqueue(sample);
        }
      }
    }
  }
}

// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
  int samplesToProvide = data.Length;
  int samplesProvided = 0;

  lock(audioBuffer) {
    while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
      data[samplesProvided] = audioBuffer.Dequeue();
      samplesProvided++;
    }
  }

  while (samplesProvided < samplesToProvide) {
    data[samplesProvided] = 0.0f;
    samplesProvided++;
  }
}

Tạo văn bản truyền trực tiếp từ đầu vào âm thanh truyền trực tiếp

Bạn có thể gửi dữ liệu đầu vào âm thanh được truyền trực tuyến và nhận dữ liệu đầu ra văn bản được truyền trực tuyến. Đảm bảo bạn tạo một phiên bản LiveModel và đặt phương thức phản hồi thành Text.

Swift

Live API hiện chưa được hỗ trợ cho các ứng dụng trên nền tảng Apple, nhưng hãy sớm kiểm tra lại!

Kotlin

Java


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.TEXT)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();
class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {
    @Override
    public void onSubscribe(Subscription s) {
        s.request(Long.MAX_VALUE); // Request an unlimited number of items
    }
    @Override
    public void onNext(LiveContentResponse liveContentResponse) {
        // Handle the response from the server.
	System.out.println(liveContentResponse.getText());
    }
    @Override
    public void onError(Throwable t) {
        System.err.println("Error: " + t.getMessage());
    }
    @Override
    public void onComplete() {
        System.out.println("Done receiving messages!");
    }
}
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	 LiveSessionFutures session = LiveSessionFutures.from(ses);
        // Send Audio data
	 session.send(new Content.Builder().addInlineData(audioData, "audio/pcm").build());

        session.send(text);
        Publisher<LiveContentResponse> publisher = session.receive();
        publisher.subscribe(new LiveContentResponseSubscriber());
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with text
  generationConfig: {
    responseModalities: [ResponseModality.TEXT],
  },
});

const session = await model.connect();

// TODO(developer): Collect audio data (16-bit 16kHz PCM)
// const audioData = ...

// Send audio
const audioPart = {
  inlineData: { data: audioData, mimeType: "audio/pcm" },
};
session.send([audioPart]);

// Collect text from model's turn
let text = "";
const messages = session.receive();
for await (const message of messages) {
  switch (message.type) {
    case "serverContent":
      if (message.turnComplete) {
        console.log(text);
      } else {
        const parts = message.modelTurn?.parts;
        if (parts) {
          text += parts.map((part) => part.text).join("");
        }
      }
      break;
    case "toolCall":
      // Ignore
    case "toolCallCancellation":
      // Ignore
  }
}

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';
import 'dart:async';

late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
final model = FirebaseAI.googleAI().liveModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to respond with text
  config: LiveGenerationConfig(responseModality: ResponseModality.text),
);

_session = await model.connect();

final audioRecordStream = _audioRecorder.startRecordingStream();
final mediaChunkStream = audioRecordStream.map((data) {
  return InlineDataPart('audio/pcm', data);
});

await _session.startMediaStream(mediaChunkStream);

final responseStream = _session.receive();

return responseStream.asyncMap((response) async {
  if (response.parts.isNotEmpty && response.parts.first.text != null) {
    return response.parts.first.text!;
  } else {
    throw Exception('Text response not found.');
  }
});

Future main() async {
  try {
    final textStream = await audioToText();

    await for (final text in textStream) {
      print('Received text: $text');
      // Handle the text response
    }
  } catch (e) {
    print('Error: $e');
  }
}

Unity


using Firebase;
using Firebase.AI;

async Task SendAudioReceiveText() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Text })
  );

  LiveSession session = await model.ConnectAsync();

  // Start a coroutine to send audio from the Microphone
  var recordingCoroutine = StartCoroutine(SendAudio(session));

  // Receive the response
  await foreach (var message in session.ReceiveAsync()) {
    // Process the received message
    if (!string.IsNullOrEmpty(message.Text)) {
      UnityEngine.Debug.Log("Received message: " + message.Text);
    }
  }

  StopCoroutine(recordingCoroutine);
}

IEnumerator SendAudio(LiveSession liveSession) {
  string microphoneDeviceName = null;
  int recordingFrequency = 16000;
  int recordingBufferSeconds = 2;

  var recordingClip = Microphone.Start(microphoneDeviceName, true,
                                       recordingBufferSeconds, recordingFrequency);

  int lastSamplePosition = 0;
  while (true) {
    if (!Microphone.IsRecording(microphoneDeviceName)) {
      yield break;
    }

    int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);

    if (currentSamplePosition != lastSamplePosition) {
      // The Microphone uses a circular buffer, so we need to check if the
      // current position wrapped around to the beginning, and handle it
      // accordingly.
      int sampleCount;
      if (currentSamplePosition > lastSamplePosition) {
        sampleCount = currentSamplePosition - lastSamplePosition;
      } else {
        sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
      }

      if (sampleCount > 0) {
        // Get the audio chunk
        float[] samples = new float[sampleCount];
        recordingClip.GetData(samples, lastSamplePosition);

        // Send the data, discarding the resulting Task to avoid the warning
        _ = liveSession.SendAudioAsync(samples);

        lastSamplePosition = currentSamplePosition;
      }
    }

    // Wait for a short delay before reading the next sample from the Microphone
    const float MicrophoneReadDelay = 0.5f;
    yield return new WaitForSeconds(MicrophoneReadDelay);
  }
}

Tạo âm thanh truyền trực tuyến từ văn bản đầu vào truyền trực tuyến

Bạn có thể gửi dữ liệu đầu vào văn bản được truyền trực tuyến và nhận dữ liệu đầu ra âm thanh được truyền trực tuyến. Đảm bảo bạn tạo một phiên bản LiveModel và đặt phương thức phản hồi thành Audio.

Tìm hiểu cách định cấu hình và tuỳ chỉnh giọng nói phản hồi (ở phần sau trên trang này).

Swift

Live API hiện chưa được hỗ trợ cho các ứng dụng trên nền tảng Apple, nhưng hãy sớm kiểm tra lại!

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
   }
)

val session = model.connect()

// Provide a text prompt
val text = "tell a short story"

session.send(text)

session.receive().collect {
    if(it.turnComplete) {
        // Optional: if you don't require to send more requests.
        session.stopReceiving();
    }
    // Handle 16bit pcm audio data at 24khz
    playAudio(it.data)
}

Java


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.AUDIO)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();
class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {
    @Override
    public void onSubscribe(Subscription s) {
        s.request(Long.MAX_VALUE); // Request an unlimited number of items
    }
    @Override
    public void onNext(LiveContentResponse liveContentResponse) {
        // Handle 16bit pcm audio data at 24khz
	liveContentResponse.getData();
    }
    @Override
    public void onError(Throwable t) {
        System.err.println("Error: " + t.getMessage());
    }
    @Override
    public void onComplete() {
        System.out.println("Done receiving messages!");
    }
}
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	 LiveSessionFutures session = LiveSessionFutures.from(ses);
        // Provide a text prompt
        String text = "tell me a short story?";
        session.send(text);
        Publisher<LiveContentResponse> publisher = session.receive();
        publisher.subscribe(new LiveContentResponseSubscriber());
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with audio
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
  },
});

const session = await model.connect();

// Provide a text prompt
const prompt = "tell a short story";
session.send(prompt);

// Handle the model's audio output
const messages = session.receive();
for await (const message of messages) {
  switch (message.type) {
    case "serverContent":
      if (message.turnComplete) {
        // TODO(developer): Handle turn completion
      } else if (message.interrupted) {
        // TODO(developer): Handle the interruption
        break;
      } else if (message.modelTurn) {
        const parts = message.modelTurn?.parts;
        parts?.forEach((part) => {
          if (part.inlineData) {
            // TODO(developer): Play the audio chunk
          }
        });
      }
      break;
    case "toolCall":
      // Ignore
    case "toolCallCancellation":
      // Ignore
  }
}

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'dart:async';
import 'dart:typed_data';

late LiveModelSession _session;

Future<Stream<Uint8List>> textToAudio(String textPrompt) async {
  WidgetsFlutterBinding.ensureInitialized();

  await Firebase.initializeApp(
    options: DefaultFirebaseOptions.currentPlatform,
  );

  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  final model = FirebaseAI.googleAI().liveModel(
    model: 'gemini-2.0-flash-live-preview-04-09',
    // Configure the model to respond with audio
    config: LiveGenerationConfig(responseModality: ResponseModality.audio),
  );

  _session = await model.connect();

  final prompt = Content.text(textPrompt);

  await _session.send(input: prompt);

  return _session.receive().asyncMap((response) async {
    if (response is LiveServerContent && response.modelTurn?.parts != null) {
       for (final part in response.modelTurn!.parts) {
         if (part is InlineDataPart) {
           return part.bytes;
         }
       }
    }
    throw Exception('Audio data not found');
  });
}

Future<void> main() async {
  try {
    final audioStream = await textToAudio('Convert this text to audio.');

    await for (final audioData in audioStream) {
      // Process the audio data (e.g., play it using an audio player package)
      print('Received audio data: ${audioData.length} bytes');
      // Example using flutter_sound (replace with your chosen package):
      // await _flutterSoundPlayer.startPlayer(fromDataBuffer: audioData);
    }
  } catch (e) {
    print('Error: $e');
  }
}

Unity


using Firebase;
using Firebase.AI;

async Task SendTextReceiveAudio() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with audio
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio })
  );

  LiveSession session = await model.ConnectAsync();

  // Provide a text prompt
  var prompt = ModelContent.Text("Convert this text to audio.");
  await session.SendAsync(content: prompt, turnComplete: true);

  // Start receiving the response
  await ReceiveAudio(session);
}

Queue<float> audioBuffer = new();

async Task ReceiveAudio(LiveSession session) {
  int sampleRate = 24000;
  int channelCount = 1;

  // Create a looping AudioClip to fill with the received audio data
  int bufferSamples = (int)(sampleRate * channelCount);
  AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
                                    sampleRate, true, OnAudioRead);

  // Attach the clip to an AudioSource and start playing it
  AudioSource audioSource = GetComponent<AudioSource>();
  audioSource.clip = clip;
  audioSource.loop = true;
  audioSource.Play();

  // Start receiving the response
  await foreach (var message in session.ReceiveAsync()) {
    // Process the received message
    foreach (float[] pcmData in message.AudioAsFloat) {
      lock (audioBuffer) {
        foreach (float sample in pcmData) {
          audioBuffer.Enqueue(sample);
        }
      }
    }
  }
}

// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
  int samplesToProvide = data.Length;
  int samplesProvided = 0;

  lock(audioBuffer) {
    while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
      data[samplesProvided] = audioBuffer.Dequeue();
      samplesProvided++;
    }
  }

  while (samplesProvided < samplesToProvide) {
    data[samplesProvided] = 0.0f;
    samplesProvided++;
  }
}

Tạo trải nghiệm hấp dẫn và có tính tương tác hơn

Phần này mô tả cách tạo và quản lý các tính năng hấp dẫn hoặc mang tính tương tác hơn của Live API.

Thay đổi giọng nói phản hồi

Live API sử dụng Chirp 3 để hỗ trợ các câu trả lời bằng giọng nói được tạo. Khi sử dụng Firebase AI Logic, bạn có thể gửi âm thanh bằng nhiều ngôn ngữ có giọng nói HD. Để xem danh sách đầy đủ và bản minh hoạ âm thanh của từng giọng nói, hãy xem Chirp 3: Giọng nói chất lượng cao.

Để chỉ định giọng nói, hãy đặt tên giọng nói trong đối tượng speechConfig trong phần cấu hình mô hình. Nếu bạn không chỉ định giọng nói, giá trị mặc định sẽ là Puck.

Swift

Live API hiện chưa được hỗ trợ cho các ứng dụng trên nền tảng Apple, nhưng hãy sớm kiểm tra lại!

Kotlin


// ...

val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to use a specific voice for its audio response
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        speechConfig = SpeechConfig(voice = Voice("VOICE_NAME"))
    }
)

// ...

Java


// ...

LiveModel model = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to use a specific voice for its audio response
    new LiveGenerationConfig.Builder()
        .setResponseModalities(ResponseModality.AUDIO)
        .setSpeechConfig(new SpeechConfig(new Voice("VOICE_NAME")))
        .build()
);

// ...

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to use a specific voice for its audio response
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: "VOICE_NAME" },
      },
    },
  },
});

Dart


// ...

final model = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to use a specific voice for its audio response
  config: LiveGenerationConfig(
    responseModality: ResponseModality.audio,
    speechConfig: SpeechConfig(voiceName: 'VOICE_NAME'),
  ),
);

// ...

Unity


var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  liveGenerationConfig: new LiveGenerationConfig(
    responseModalities: new[] { ResponseModality.Audio },
    speechConfig: SpeechConfig.UsePrebuiltVoice("VOICE_NAME"))
);

Để có kết quả tốt nhất khi nhắc và yêu cầu mô hình phản hồi bằng một ngôn ngữ không phải tiếng Anh, hãy thêm những nội dung sau vào hướng dẫn hệ thống:

RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.

Duy trì bối cảnh trong các phiên và yêu cầu

Bạn có thể sử dụng cấu trúc trò chuyện để duy trì ngữ cảnh trong các phiên và yêu cầu. Xin lưu ý rằng tính năng này chỉ hoạt động đối với dữ liệu đầu vào và đầu ra là văn bản.

Phương pháp này phù hợp nhất với các ngữ cảnh ngắn; bạn có thể gửi các lượt tương tác từng bước để thể hiện chính xác chuỗi sự kiện . Đối với những ngữ cảnh dài hơn, bạn nên cung cấp một bản tóm tắt thông báo duy nhất để giải phóng cửa sổ ngữ cảnh cho các hoạt động tương tác tiếp theo.

Xử lý các hoạt động gây gián đoạn

Firebase AI Logic hiện chưa hỗ trợ xử lý các gián đoạn. Vui lòng kiểm tra lại sớm!

Sử dụng tính năng gọi hàm (công cụ)

Bạn có thể xác định các công cụ (chẳng hạn như các hàm có sẵn) để sử dụng với Live API giống như khi sử dụng các phương thức tạo nội dung tiêu chuẩn. Phần này mô tả một số sắc thái khi sử dụng Live API với tính năng gọi hàm. Để xem nội dung mô tả đầy đủ và ví dụ về tính năng gọi hàm, hãy xem hướng dẫn gọi hàm.

Từ một câu lệnh duy nhất, mô hình có thể tạo nhiều lệnh gọi hàm và mã cần thiết để liên kết các đầu ra của chúng. Mã này thực thi trong môi trường hộp cát, tạo ra các thông báo BidiGenerateContentToolCall tiếp theo. Quá trình thực thi sẽ tạm dừng cho đến khi có kết quả của từng lệnh gọi hàm, điều này đảm bảo quá trình xử lý tuần tự.

Ngoài ra, việc sử dụng Live API với tính năng gọi hàm đặc biệt hiệu quả vì mô hình có thể yêu cầu người dùng cung cấp thông tin làm rõ hoặc thông tin bổ sung. Ví dụ: nếu không có đủ thông tin để cung cấp giá trị tham số cho một hàm mà mô hình muốn gọi, thì mô hình có thể yêu cầu người dùng cung cấp thêm thông tin hoặc thông tin làm rõ.

Máy khách sẽ phản hồi bằng BidiGenerateContentToolResponse.

Giới hạn và yêu cầu

Hãy lưu ý những hạn chế và yêu cầu sau đây của Live API.

Bản ghi âm

Firebase AI Logic hiện chưa hỗ trợ bản chép lời. Vui lòng kiểm tra lại sớm!

Ngôn ngữ

Ngôn ngữ nhập: Xem danh sách đầy đủ các ngôn ngữ nhập được hỗ trợ cho các mô hình Gemini
Ngôn ngữ đầu ra: Xem danh sách đầy đủ các ngôn ngữ đầu ra hiện có trong phần Chirp 3: Giọng nói chất lượng cao

Định dạng âm thanh

Công cụ Live API hỗ trợ các định dạng âm thanh sau:

Định dạng âm thanh đầu vào: Âm thanh PCM 16 bit thô ở chế độ little-endian 16 kHz
Định dạng âm thanh đầu ra: Âm thanh PCM 16 bit thô ở chế độ little-endian 24 kHz

Giới hạn về tốc độ

Live API có giới hạn về tốc độ cho cả số phiên đồng thời trên mỗi dự án Firebase cũng như số mã thông báo mỗi phút (TPM).

Gemini Developer API:
- Hạn mức sẽ khác nhau tuỳ thuộc vào Gemini Developer API"bậc sử dụng" của dự án (xem tài liệu về hạn mức sử dụng của họ)
Vertex AI Gemini API:
- 5.000 phiên hoạt động đồng thời cho mỗi dự án Firebase
- 4 triệu mã thông báo mỗi phút

Thời lượng phiên

Thời lượng mặc định cho một phiên là 10 phút. Khi thời lượng phiên vượt quá giới hạn, kết nối sẽ bị chấm dứt.

Mô hình này cũng bị giới hạn về kích thước ngữ cảnh. Việc gửi các khối đầu vào lớn có thể dẫn đến việc phiên kết thúc sớm hơn.

Phát hiện hoạt động thoại (VAD)

Mô hình này tự động thực hiện tính năng phát hiện hoạt động thoại (VAD) trên luồng đầu vào âm thanh liên tục. VAD được bật theo mặc định.

Đếm mã thông báo

Bạn không thể sử dụng API CountTokens với Live API.

Gửi ý kiến phản hồi về trải nghiệm của bạn với Firebase AI Logic

Truyền trực tuyến hai chiều bằng Gemini Live API Sử dụng bộ sưu tập để sắp xếp ngăn nắp các trang Lưu và phân loại nội dung dựa trên lựa chọn ưu tiên của bạn.

Trước khi bắt đầu

Các mô hình hỗ trợ tính năng này

Sử dụng các tính năng tiêu chuẩn của Live API

Tạo văn bản truyền trực tiếp từ dữ liệu đầu vào văn bản truyền trực tiếp

Swift

Kotlin

Java

Web

Dart

Unity

Tạo âm thanh truyền trực tuyến từ đầu vào âm thanh truyền trực tuyến

Swift

Kotlin

Java

Web

Dart

Unity

Tạo văn bản truyền trực tiếp từ đầu vào âm thanh truyền trực tiếp

Swift

Kotlin

Java

Web

Dart

Unity

Tạo âm thanh truyền trực tuyến từ văn bản đầu vào truyền trực tuyến

Swift

Kotlin

Java

Web

Dart

Unity

Tạo trải nghiệm hấp dẫn và có tính tương tác hơn

Thay đổi giọng nói phản hồi

Swift

Kotlin

Java

Web

Dart

Unity

Duy trì bối cảnh trong các phiên và yêu cầu

Xử lý các hoạt động gây gián đoạn

Sử dụng tính năng gọi hàm (công cụ)

Giới hạn và yêu cầu

Bản ghi âm

Ngôn ngữ

Định dạng âm thanh

Giới hạn về tốc độ

Thời lượng phiên

Phát hiện hoạt động thoại (VAD)

Đếm mã thông báo

Truyền trực tuyến hai chiều bằng Gemini Live API