All Gemini 1.0 and Gemini 1.5 models are now retired.
To avoid service disruption, update to a newer model (for example, gemini-2.5-flash-lite). Learn more.

דף זה תורגם על ידי Cloud Translation API.

סטרימינג דו-כיווני באמצעות Gemini Live API

ה-Gemini Live API מאפשר אינטראקציות דו-כיווניות של טקסט וקול עם Gemini עם זמן אחזור נמוך. באמצעות Live API, אתם יכולים לספק למשתמשי הקצה חוויה של שיחות קוליות טבעיות ודמויות-אנוש, עם אפשרות לקטוע את התשובות של המודל באמצעות פקודות טקסט או פקודות קוליות. המודל יכול לעבד קלט של טקסט ואודיו (בקרוב גם וידאו!), והוא יכול לספק פלט של טקסט ואודיו.

אפשר ליצור אב טיפוס באמצעות הנחיות והדגל Live API ב-Google AI Studio או ב-Vertex AI Studio.

‫Live API הוא API עם שמירת מצב שיוצר חיבור WebSocket כדי ליצור סשן בין הלקוח לבין שרת Gemini. פרטים נוספים זמינים במאמרי העזרה בנושא Live API (Gemini Developer API | Vertex AI Gemini API).

לפני שמתחילים

לוחצים על הספק Gemini API כדי לראות את התוכן והקוד הספציפיים לספק בדף הזה.

אם עדיין לא עשיתם את זה, כדאי לעיין במדריך לתחילת העבודה. במדריך הזה מוסבר איך להגדיר את פרויקט Firebase, לקשר את האפליקציה ל-Firebase, להוסיף את ה-SDK, לאתחל את שירות ה-Backend עבור ספק Gemini API שבחרתם וליצור מופע LiveModel.

מודלים שתומכים ביכולת הזו

המודלים שתומכים ב-Live API תלויים בספק Gemini API שבחרתם.

Gemini Developer API
- gemini-live-2.5-flash (פרטי GA^*)
- gemini-live-2.5-flash-preview
- gemini-2.0-flash-live-001
- gemini-2.0-flash-live-preview-04-09
Vertex AI Gemini API
- gemini-live-2.5-flash (פרטי GA^*)
- gemini-2.0-flash-live-preview-04-09 (זמין לגישה רק ב-us-central1)

שימו לב שבשמות המודלים של 2.5 עבור Live API, הפלח live מופיע מיד אחרי הפלח gemini.

^{* כדי לבקש גישה, צריך לפנות לנציג צוות ניהול החשבון שלך ב-Google Cloud.}

שימוש בתכונות הרגילות של Live API

בקטע הזה מוסבר איך להשתמש בתכונות הרגילות של Live API, במיוחד כדי להזרים סוגים שונים של נתוני קלט ופלט:

שליחת הודעות טקסט וקבלת הודעות טקסט
שליחת אודיו וקבלת אודיו
שליחת אודיו וקבלת טקסט
שליחת הודעות טקסט וקבלת אודיו

יצירת טקסט בסטרימינג מקלט טקסט בסטרימינג

לפני שמנסים את הדוגמה הזו, צריך להשלים את השלבים שבקטע לפני שמתחילים במדריך הזה כדי להגדיר את הפרויקט והאפליקציה.
בקטע הזה, צריך גם ללחוץ על לחצן של ספק Gemini API שבחרתם כדי שיוצג בדף הזה תוכן שספציפי לספק.

אתם יכולים לשלוח קלט טקסט בסטרימינג ולקבל פלט טקסט בסטרימינג. חשוב ליצור מופע liveModel ולהגדיר את אופן התגובה לערך Text.

Swift


import FirebaseAILogic

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with text
  generationConfig: LiveGenerationConfig(
    responseModalities: [.text]
  )
)

do {
  let session = try await model.connect()

  // Provide a text prompt
  let text = "tell a short story"

  await session.sendTextRealtime(text)

  var outputText = ""
  for try await message in session.responses {
    if case let .content(content) = message.payload {
      content.modelTurn?.parts.forEach { part in
        if let part = part as? TextPart {
          outputText += part.text
        }
      }
      // Optional: if you don't require to send more requests.
      if content.isTurnComplete {
        await session.close()
      }
    }
  }

  // Output received from the server.
  print(outputText)
} catch {
  fatalError(error.localizedDescription)
}

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.TEXT 
   }
)

val session = model.connect()

// Provide a text prompt
val text = "tell a short story"

session.send(text)

var outputText = ""
session.receive().collect {
    if(it.turnComplete) {
        // Optional: if you don't require to send more requests.
        session.stopReceiving();
    }
    outputText = outputText + it.text
}

// Output received from the server.
println(outputText)

Java


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.TEXT)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();
class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {
    @Override
    public void onSubscribe(Subscription s) {
        s.request(Long.MAX_VALUE); // Request an unlimited number of items
    }
    @Override
    public void onNext(LiveContentResponse liveContentResponse) {
       // Handle the response from the server.
	System.out.println(liveContentResponse.getText());
    }
    @Override
    public void onError(Throwable t) {
        System.err.println("Error: " + t.getMessage());
    }
    @Override
    public void onComplete() {
        System.out.println("Done receiving messages!");
    }
}
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	  LiveSessionFutures session = LiveSessionFutures.from(ses);
        // Provide a text prompt
        String text = "tell me a short story?";
        session.send(text);
        Publisher<LiveContentResponse> publisher = session.receive();
        publisher.subscribe(new LiveContentResponseSubscriber());
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with text
  generationConfig: {
    responseModalities: [ResponseModality.TEXT],
  },
});

const session = await model.connect();

// Provide a text prompt
const prompt = "tell a short story";
session.send(prompt);

// Collect text from model's turn
let text = "";
const messages = session.receive();
for await (const message of messages) {
  switch (message.type) {
    case "serverContent":
      if (message.turnComplete) {
        console.log(text);
      } else {
        const parts = message.modelTurn?.parts;
        if (parts) {
          text += parts.map((part) => part.text).join("");
        }
      }
      break;
    case "toolCall":
      // Ignore
    case "toolCallCancellation":
      // Ignore
  }
}

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

late LiveModelSession _session;

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
final model = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to respond with text
  liveGenerationConfig: LiveGenerationConfig(responseModalities: [ResponseModalities.text]),
);

_session = await model.connect();

// Provide a text prompt
final prompt = Content.text('tell a short story');
await _session.send(input: prompt, turnComplete: true);

// In a separate thread, receive the response
await for (final message in _session.receive()) {
   // Process the received message
}

Unity


using Firebase;
using Firebase.AI;

async Task SendTextReceiveText() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Text })
  );

  LiveSession session = await model.ConnectAsync();

  // Provide a text prompt
  var prompt = ModelContent.Text("tell a short story");
  await session.SendAsync(content: prompt, turnComplete: true);

  // Receive the response
  await foreach (var message in session.ReceiveAsync()) {
    // Process the received message
    if (!string.IsNullOrEmpty(message.Text)) {
      UnityEngine.Debug.Log("Received message: " + message.Text);
    }
  }
}

יצירת אודיו בסטרימינג מקלט אודיו בסטרימינג

אתם יכולים לשלוח קלט אודיו בסטרימינג ולקבל פלט אודיו בסטרימינג. חשוב ליצור מכונת LiveModel ולהגדיר את אופן התגובה לערך Audio.

בהמשך הדף מוסבר איך להגדיר ולשנות את הקול של התשובה.

Swift


import FirebaseAILogic

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with audio
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio]
  )
)

do {
  let session = try await model.connect()

  // Load the audio file, or tap a microphone
  guard let audioFile = NSDataAsset(name: "audio.pcm") else {
    fatalError("Failed to load audio file")
  }

  // Provide the audio data
  await session.sendAudioRealtime(audioFile.data)

  var outputText = ""
  for try await message in session.responses {
    if case let .content(content) = message.payload {
      content.modelTurn?.parts.forEach { part in
        if let part = part as? InlineDataPart, part.mimeType.starts(with: "audio/pcm") {
          // Handle 16bit pcm audio data at 24khz
          playAudio(part.data)
        }
      }
      // Optional: if you don't require to send more requests.
      if content.isTurnComplete {
        await session.close()
      }
    }
  }
} catch {
  fatalError(error.localizedDescription)
}

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with audio
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO 
   }
)

val session = model.connect()

// This is the recommended way.
// However, you can create your own recorder and handle the stream.
session.startAudioConversation()

Java


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with audio
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.AUDIO)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();

Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	 LiveSessionFutures session = LiveSessionFutures.from(ses);
        session.startAudioConversation();
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with audio
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
  },
});

const session = await model.connect();

// Start the audio conversation
const audioConversationController = await startAudioConversation(session);

// ... Later, to stop the audio conversation
// await audioConversationController.stop()

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';

late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
final model = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
   // Configure the model to respond with audio
   liveGenerationConfig: LiveGenerationConfig(responseModalities: [ResponseModalities.audio]),
);

_session = await model.connect();

final audioRecordStream = _audioRecorder.startRecordingStream();
// Map the Uint8List stream to InlineDataPart stream
final mediaChunkStream = audioRecordStream.map((data) {
  return InlineDataPart('audio/pcm', data);
});
await _session.startMediaStream(mediaChunkStream);

// In a separate thread, receive the audio response from the model
await for (final message in _session.receive()) {
   // Process the received message
}

Unity


using Firebase;
using Firebase.AI;

async Task SendTextReceiveAudio() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with audio
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio })
  );

  LiveSession session = await model.ConnectAsync();

  // Start a coroutine to send audio from the Microphone
  var recordingCoroutine = StartCoroutine(SendAudio(session));

  // Start receiving the response
  await ReceiveAudio(session);
}

IEnumerator SendAudio(LiveSession liveSession) {
  string microphoneDeviceName = null;
  int recordingFrequency = 16000;
  int recordingBufferSeconds = 2;

  var recordingClip = Microphone.Start(microphoneDeviceName, true,
                                       recordingBufferSeconds, recordingFrequency);

  int lastSamplePosition = 0;
  while (true) {
    if (!Microphone.IsRecording(microphoneDeviceName)) {
      yield break;
    }

    int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);

    if (currentSamplePosition != lastSamplePosition) {
      // The Microphone uses a circular buffer, so we need to check if the
      // current position wrapped around to the beginning, and handle it
      // accordingly.
      int sampleCount;
      if (currentSamplePosition > lastSamplePosition) {
        sampleCount = currentSamplePosition - lastSamplePosition;
      } else {
        sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
      }

      if (sampleCount > 0) {
        // Get the audio chunk
        float[] samples = new float[sampleCount];
        recordingClip.GetData(samples, lastSamplePosition);

        // Send the data, discarding the resulting Task to avoid the warning
        _ = liveSession.SendAudioAsync(samples);

        lastSamplePosition = currentSamplePosition;
      }
    }

    // Wait for a short delay before reading the next sample from the Microphone
    const float MicrophoneReadDelay = 0.5f;
    yield return new WaitForSeconds(MicrophoneReadDelay);
  }
}

Queue audioBuffer = new();

async Task ReceiveAudio(LiveSession liveSession) {
  int sampleRate = 24000;
  int channelCount = 1;

  // Create a looping AudioClip to fill with the received audio data
  int bufferSamples = (int)(sampleRate * channelCount);
  AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
                                    sampleRate, true, OnAudioRead);

  // Attach the clip to an AudioSource and start playing it
  AudioSource audioSource = GetComponent();
  audioSource.clip = clip;
  audioSource.loop = true;
  audioSource.Play();

  // Start receiving the response
  await foreach (var message in liveSession.ReceiveAsync()) {
    // Process the received message
    foreach (float[] pcmData in message.AudioAsFloat) {
      lock (audioBuffer) {
        foreach (float sample in pcmData) {
          audioBuffer.Enqueue(sample);
        }
      }
    }
  }
}

// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
  int samplesToProvide = data.Length;
  int samplesProvided = 0;

  lock(audioBuffer) {
    while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
      data[samplesProvided] = audioBuffer.Dequeue();
      samplesProvided++;
    }
  }

  while (samplesProvided < samplesToProvide) {
    data[samplesProvided] = 0.0f;
    samplesProvided++;
  }
}

יצירת טקסט בסטרימינג מקלט אודיו בסטרימינג

אתם יכולים לשלוח קלט אודיו בסטרימינג ולקבל פלט טקסט בסטרימינג. חשוב ליצור מופע LiveModel ולהגדיר את אופן התגובה לערך Text.

Swift


import FirebaseAILogic

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with text
  generationConfig: LiveGenerationConfig(
    responseModalities: [.text]
  )
)

do {
  let session = try await model.connect()

  // Load the audio file, or tap a microphone
  guard let audioFile = NSDataAsset(name: "audio.pcm") else {
    fatalError("Failed to load audio file")
  }

  // Provide the audio data
  await session.sendAudioRealtime(audioFile.data)

  var outputText = ""
  for try await message in session.responses {
    if case let .content(content) = message.payload {
      content.modelTurn?.parts.forEach { part in
        if let part = part as? TextPart {
          outputText += part.text
        }
      }
      // Optional: if you don't require to send more requests.
      if content.isTurnComplete {
        await session.close()
      }
    }
  }

  // Output received from the server.
  print(outputText)
} catch {
  fatalError(error.localizedDescription)
}

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.TEXT
   }
)

val session = model.connect()

// Provide a text prompt
val audioContent = content("user") { audioData }

session.send(audioContent)

var outputText = ""
session.receive().collect {
    if(it.status == Status.TURN_COMPLETE) {
        // Optional: if you don't require to send more requests.
        session.stopReceiving();
    }
    outputText = outputText + it.text
}

// Output received from the server.
println(outputText)

Java


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.TEXT)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();
class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {
    @Override
    public void onSubscribe(Subscription s) {
        s.request(Long.MAX_VALUE); // Request an unlimited number of items
    }
    @Override
    public void onNext(LiveContentResponse liveContentResponse) {
        // Handle the response from the server.
	System.out.println(liveContentResponse.getText());
    }
    @Override
    public void onError(Throwable t) {
        System.err.println("Error: " + t.getMessage());
    }
    @Override
    public void onComplete() {
        System.out.println("Done receiving messages!");
    }
}
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	 LiveSessionFutures session = LiveSessionFutures.from(ses);
        // Send Audio data
	 session.send(new Content.Builder().addInlineData(audioData, "audio/pcm").build());

        session.send(text);
        Publisher<LiveContentResponse> publisher = session.receive();
        publisher.subscribe(new LiveContentResponseSubscriber());
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with text
  generationConfig: {
    responseModalities: [ResponseModality.TEXT],
  },
});

const session = await model.connect();

// TODO(developer): Collect audio data (16-bit 16kHz PCM)
// const audioData = ...

// Send audio
const audioPart = {
  inlineData: { data: audioData, mimeType: "audio/pcm" },
};
session.send([audioPart]);

// Collect text from model's turn
let text = "";
const messages = session.receive();
for await (const message of messages) {
  switch (message.type) {
    case "serverContent":
      if (message.turnComplete) {
        console.log(text);
      } else {
        const parts = message.modelTurn?.parts;
        if (parts) {
          text += parts.map((part) => part.text).join("");
        }
      }
      break;
    case "toolCall":
      // Ignore
    case "toolCallCancellation":
      // Ignore
  }
}

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';
import 'dart:async';

late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
final model = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to respond with text
  liveGenerationConfig: LiveGenerationConfig(responseModalities: ResponseModalities.text),
);

_session = await model.connect();

final audioRecordStream = _audioRecorder.startRecordingStream();
final mediaChunkStream = audioRecordStream.map((data) {
  return InlineDataPart('audio/pcm', data);
});

await _session.startMediaStream(mediaChunkStream);

final responseStream = _session.receive();

return responseStream.asyncMap((response) async {
  if (response.parts.isNotEmpty && response.parts.first.text != null) {
    return response.parts.first.text!;
  } else {
    throw Exception('Text response not found.');
  }
});

Future main() async {
  try {
    final textStream = await audioToText();

    await for (final text in textStream) {
      print('Received text: $text');
      // Handle the text response
    }
  } catch (e) {
    print('Error: $e');
  }
}

Unity


using Firebase;
using Firebase.AI;

async Task SendAudioReceiveText() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Text })
  );

  LiveSession session = await model.ConnectAsync();

  // Start a coroutine to send audio from the Microphone
  var recordingCoroutine = StartCoroutine(SendAudio(session));

  // Receive the response
  await foreach (var message in session.ReceiveAsync()) {
    // Process the received message
    if (!string.IsNullOrEmpty(message.Text)) {
      UnityEngine.Debug.Log("Received message: " + message.Text);
    }
  }

  StopCoroutine(recordingCoroutine);
}

IEnumerator SendAudio(LiveSession liveSession) {
  string microphoneDeviceName = null;
  int recordingFrequency = 16000;
  int recordingBufferSeconds = 2;

  var recordingClip = Microphone.Start(microphoneDeviceName, true,
                                       recordingBufferSeconds, recordingFrequency);

  int lastSamplePosition = 0;
  while (true) {
    if (!Microphone.IsRecording(microphoneDeviceName)) {
      yield break;
    }

    int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);

    if (currentSamplePosition != lastSamplePosition) {
      // The Microphone uses a circular buffer, so we need to check if the
      // current position wrapped around to the beginning, and handle it
      // accordingly.
      int sampleCount;
      if (currentSamplePosition > lastSamplePosition) {
        sampleCount = currentSamplePosition - lastSamplePosition;
      } else {
        sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
      }

      if (sampleCount > 0) {
        // Get the audio chunk
        float[] samples = new float[sampleCount];
        recordingClip.GetData(samples, lastSamplePosition);

        // Send the data, discarding the resulting Task to avoid the warning
        _ = liveSession.SendAudioAsync(samples);

        lastSamplePosition = currentSamplePosition;
      }
    }

    // Wait for a short delay before reading the next sample from the Microphone
    const float MicrophoneReadDelay = 0.5f;
    yield return new WaitForSeconds(MicrophoneReadDelay);
  }
}

יצירת אודיו בסטרימינג מקלט טקסט בסטרימינג

אתם יכולים לשלוח קלט טקסט בסטרימינג ולקבל פלט אודיו בסטרימינג. חשוב ליצור מופע LiveModel ולהגדיר את אופן התגובה לערך Audio.

בהמשך הדף מוסבר איך להגדיר ולשנות את הקול של התשובה.

Swift


import FirebaseAILogic

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with audio
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio]
  )
)

do {
  let session = try await model.connect()

  // Provide a text prompt
  let text = "tell a short story"

  await session.sendTextRealtime(text)

  var outputText = ""
  for try await message in session.responses {
    if case let .content(content) = message.payload {
      content.modelTurn?.parts.forEach { part in
        if let part = part as? InlineDataPart, part.mimeType.starts(with: "audio/pcm") {
          // Handle 16bit pcm audio data at 24khz
          playAudio(part.data)
        }
      }
      // Optional: if you don't require to send more requests.
      if content.isTurnComplete {
        await session.close()
      }
    }
  }
} catch {
  fatalError(error.localizedDescription)
}

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with audio
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
   }
)

val session = model.connect()

// Provide a text prompt
val text = "tell a short story"

session.send(text)

session.receive().collect {
    if(it.turnComplete) {
        // Optional: if you don't require to send more requests.
        session.stopReceiving();
    }
    // Handle 16bit pcm audio data at 24khz
    playAudio(it.data)
}

Java


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.AUDIO)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();
class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {
    @Override
    public void onSubscribe(Subscription s) {
        s.request(Long.MAX_VALUE); // Request an unlimited number of items
    }
    @Override
    public void onNext(LiveContentResponse liveContentResponse) {
        // Handle 16bit pcm audio data at 24khz
	liveContentResponse.getData();
    }
    @Override
    public void onError(Throwable t) {
        System.err.println("Error: " + t.getMessage());
    }
    @Override
    public void onComplete() {
        System.out.println("Done receiving messages!");
    }
}
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	 LiveSessionFutures session = LiveSessionFutures.from(ses);
        // Provide a text prompt
        String text = "tell me a short story?";
        session.send(text);
        Publisher<LiveContentResponse> publisher = session.receive();
        publisher.subscribe(new LiveContentResponseSubscriber());
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with audio
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
  },
});

const session = await model.connect();

// Provide a text prompt
const prompt = "tell a short story";
session.send(prompt);

// Handle the model's audio output
const messages = session.receive();
for await (const message of messages) {
  switch (message.type) {
    case "serverContent":
      if (message.turnComplete) {
        // TODO(developer): Handle turn completion
      } else if (message.interrupted) {
        // TODO(developer): Handle the interruption
        break;
      } else if (message.modelTurn) {
        const parts = message.modelTurn?.parts;
        parts?.forEach((part) => {
          if (part.inlineData) {
            // TODO(developer): Play the audio chunk
          }
        });
      }
      break;
    case "toolCall":
      // Ignore
    case "toolCallCancellation":
      // Ignore
  }
}

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'dart:async';
import 'dart:typed_data';

late LiveModelSession _session;

Future<Stream<Uint8List>> textToAudio(String textPrompt) async {
  WidgetsFlutterBinding.ensureInitialized();

  await Firebase.initializeApp(
    options: DefaultFirebaseOptions.currentPlatform,
  );

  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  final model = FirebaseAI.googleAI().liveGenerativeModel(
    model: 'gemini-2.0-flash-live-preview-04-09',
    // Configure the model to respond with audio
    liveGenerationConfig: LiveGenerationConfig(responseModalities: ResponseModalities.audio),
  );

  _session = await model.connect();

  final prompt = Content.text(textPrompt);

  await _session.send(input: prompt);

  return _session.receive().asyncMap((response) async {
    if (response is LiveServerContent && response.modelTurn?.parts != null) {
       for (final part in response.modelTurn!.parts) {
         if (part is InlineDataPart) {
           return part.bytes;
         }
       }
    }
    throw Exception('Audio data not found');
  });
}

Future<void> main() async {
  try {
    final audioStream = await textToAudio('Convert this text to audio.');

    await for (final audioData in audioStream) {
      // Process the audio data (e.g., play it using an audio player package)
      print('Received audio data: ${audioData.length} bytes');
      // Example using flutter_sound (replace with your chosen package):
      // await _flutterSoundPlayer.startPlayer(fromDataBuffer: audioData);
    }
  } catch (e) {
    print('Error: $e');
  }
}

Unity


using Firebase;
using Firebase.AI;

async Task SendTextReceiveAudio() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with audio
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio })
  );

  LiveSession session = await model.ConnectAsync();

  // Provide a text prompt
  var prompt = ModelContent.Text("Convert this text to audio.");
  await session.SendAsync(content: prompt, turnComplete: true);

  // Start receiving the response
  await ReceiveAudio(session);
}

Queue<float> audioBuffer = new();

async Task ReceiveAudio(LiveSession session) {
  int sampleRate = 24000;
  int channelCount = 1;

  // Create a looping AudioClip to fill with the received audio data
  int bufferSamples = (int)(sampleRate * channelCount);
  AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
                                    sampleRate, true, OnAudioRead);

  // Attach the clip to an AudioSource and start playing it
  AudioSource audioSource = GetComponent<AudioSource>();
  audioSource.clip = clip;
  audioSource.loop = true;
  audioSource.Play();

  // Start receiving the response
  await foreach (var message in session.ReceiveAsync()) {
    // Process the received message
    foreach (float[] pcmData in message.AudioAsFloat) {
      lock (audioBuffer) {
        foreach (float sample in pcmData) {
          audioBuffer.Enqueue(sample);
        }
      }
    }
  }
}

// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
  int samplesToProvide = data.Length;
  int samplesProvided = 0;

  lock(audioBuffer) {
    while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
      data[samplesProvided] = audioBuffer.Dequeue();
      samplesProvided++;
    }
  }

  while (samplesProvided < samplesToProvide) {
    data[samplesProvided] = 0.0f;
    samplesProvided++;
  }
}

יצירת חוויות אינטראקטיביות ומושכות יותר

בקטע הזה מוסבר איך ליצור ולנהל תכונות יותר מושכות או אינטראקטיביות של Live API.

שינוי הקול של התשובה

‫Live API משתמש ב-Chirp 3 כדי לתמוך בתשובות של דיבור מסונתז. כשמשתמשים ב-Firebase AI Logic, אפשר לשלוח אודיו במגוון שפות וקולות באיכות HD. רשימה מלאה והדגמות של כל קול זמינות במאמר Chirp 3: HD voices.

כדי לציין קול, מגדירים את שם הקול באובייקט speechConfig כחלק מהגדרת המודל. אם לא מציינים קול, ברירת המחדל היא Puck.

Swift


import FirebaseAILogic

// ...

let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to use a specific voice for its audio response
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    speech: SpeechConfig(voiceName: "VOICE_NAME")
  )
)

// ...

Kotlin


// ...

val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to use a specific voice for its audio response
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        speechConfig = SpeechConfig(voice = Voice("VOICE_NAME"))
    }
)

// ...

Java


// ...

LiveModel model = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to use a specific voice for its audio response
    new LiveGenerationConfig.Builder()
        .setResponseModalities(ResponseModality.AUDIO)
        .setSpeechConfig(new SpeechConfig(new Voice("VOICE_NAME")))
        .build()
);

// ...

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to use a specific voice for its audio response
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: "VOICE_NAME" },
      },
    },
  },
});

Dart


// ...

final model = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to use a specific voice for its audio response
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: ResponseModalities.audio,
    speechConfig: SpeechConfig(voiceName: 'VOICE_NAME'),
  ),
);

// ...

Unity


var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  liveGenerationConfig: new LiveGenerationConfig(
    responseModalities: new[] { ResponseModality.Audio },
    speechConfig: SpeechConfig.UsePrebuiltVoice("VOICE_NAME"))
);

כדי לקבל את התוצאות הטובות ביותר כשמזינים הנחיה למודל ומבקשים ממנו להגיב בשפה שאינה אנגלית, צריך לכלול את ההנחיות הבאות כחלק מההוראות למערכת:

RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.

שמירה על ההקשר בין סשנים ובקשות

אתם יכולים להשתמש במבנה צ'אט כדי לשמור על ההקשר בין סשנים ובקשות. חשוב לזכור שהאפשרות הזו פועלת רק עם קלט ופלט של טקסט.

הגישה הזו מתאימה ביותר להקשרים קצרים. אפשר לשלוח אינטראקציות של תורות כדי לייצג את הרצף המדויק של האירועים. בהקשרים ארוכים יותר, מומלץ לספק סיכום של ההודעה כדי לפנות מקום בחלון ההקשר לאינטראקציות הבאות.

איך מתמודדים עם הפרעות

‫Firebase AI Logic עדיין לא תומך בטיפול בהפרעות. בדוק שוב בקרוב!

שימוש בהפעלת פונקציות (כלים)

אפשר להגדיר כלים, כמו פונקציות זמינות, לשימוש עם Live API בדיוק כמו שאפשר עם שיטות רגילות ליצירת תוכן. בקטע הזה מתוארים כמה ניואנסים שקשורים לשימוש ב-Live API עם קריאה לפונקציות. תיאור מלא ודוגמאות לשימוש בפונקציות זמינים במדריך לשימוש בפונקציות.

מתוך הנחיה אחת, המודל יכול ליצור כמה קריאות לפונקציות ואת הקוד שנדרש כדי לשרשר את הפלטים שלהן. הקוד הזה מופעל בסביבת ארגז חול, ויוצר הודעות BidiGenerateContentToolCall נוספות. הביצוע מושהה עד שהתוצאות של כל קריאה לפונקציה זמינות, וכך מובטח עיבוד רציף.

בנוסף, שימוש ב-Live API עם קריאה לפונקציה הוא יעיל במיוחד כי המודל יכול לבקש מהמשתמש מידע נוסף או הבהרות. לדוגמה, אם למודל אין מספיק מידע כדי לספק ערך פרמטר לפונקציה שהוא רוצה להפעיל, הוא יכול לבקש מהמשתמש לספק מידע נוסף או מידע שיבהיר את המצב.

הלקוח צריך להגיב עם BidiGenerateContentToolResponse.

הגבלות ודרישות

חשוב לזכור את המגבלות והדרישות הבאות של Live API.

תעתיק

‫Firebase AI Logic עדיין לא תומך בתמלילים. בדוק שוב בקרוב!

שפות

שפות קלט: רשימה מלאה של שפות קלט נתמכות למודלים של Gemini
שפות פלט: רשימת שפות הפלט המלאה זמינה במאמר Chirp 3: HD voices

פורמטים של אודיו

הפונקציה Live API תומכת בפורמטים הבאים של אודיו:

פורמט אודיו של הקלט: אודיו PCM גולמי של 16 ביט ב-16kHz little-endian
פורמט פלט האודיו: אודיו PCM גולמי של 16 ביט ב-24kHz little-endian

מגבלות קצב

ל-Live API יש מגבלות קצב גם על מספר הסשנים בו-זמנית לכל פרויקט Firebase וגם על מספר האסימונים לדקה (TPM).

Gemini Developer API:
- המגבלות משתנות בהתאם לGemini Developer API'רמת השימוש' בפרויקט (אפשר לעיין במסמכי המגבלות על קצב הבקשות).
Vertex AI Gemini API:
- ‫5,000 סשנים בו-זמניים לכל פרויקט Firebase
- ‫4 מיליון טוקנים בדקה

משך הסשן

ברירת המחדל למשך סשן היא 10 דקות. אם משך הסשן חורג מהמגבלה, החיבור מסתיים.

המודל מוגבל גם על ידי גודל ההקשר. שליחת נתונים בכמויות גדולות מדי עלולה לגרום לסיום מוקדם של הסשן.

זיהוי פעילות קולית (VAD)

המודל מבצע באופן אוטומטי זיהוי פעילות קולית (VAD) בזרם קלט אודיו רציף. התכונה VAD מופעלת כברירת מחדל.

ספירת טוקנים

אי אפשר להשתמש ב-CountTokens API עם Live API.

רוצה לתת משוב על חוויית השימוש ב-Firebase AI Logic?

סטרימינג דו-כיווני באמצעות Gemini Live API קל לארגן דפים בעזרת אוספים אפשר לשמור ולסווג תוכן על סמך ההעדפות שלך.

לפני שמתחילים

מודלים שתומכים ביכולת הזו

שימוש בתכונות הרגילות של Live API

יצירת טקסט בסטרימינג מקלט טקסט בסטרימינג

Swift

Kotlin

Java

Web

Dart

Unity

יצירת אודיו בסטרימינג מקלט אודיו בסטרימינג

Swift

Kotlin

Java

Web

Dart

Unity

יצירת טקסט בסטרימינג מקלט אודיו בסטרימינג

Swift

Kotlin

Java

Web

Dart

Unity

יצירת אודיו בסטרימינג מקלט טקסט בסטרימינג

Swift

Kotlin

Java

Web

Dart

Unity

יצירת חוויות אינטראקטיביות ומושכות יותר

שינוי הקול של התשובה

Swift

Kotlin

Java

Web

Dart

Unity

שמירה על ההקשר בין סשנים ובקשות

איך מתמודדים עם הפרעות

שימוש בהפעלת פונקציות (כלים)

הגבלות ודרישות

תעתיק

שפות

פורמטים של אודיו

מגבלות קצב

משך הסשן

זיהוי פעילות קולית (VAD)

ספירת טוקנים

סטרימינג דו-כיווני באמצעות Gemini Live API