All Gemini 1.0 and Gemini 1.5 models are now retired.
To avoid service disruption, update to a newer model (for example, gemini-2.5-flash-lite). Learn more.

이 페이지는 Cloud Translation API를 통해 번역되었습니다.

Gemini Live API를 사용한 양방향 스트리밍

Gemini Live API를 사용하면 Gemini와 지연 시간이 짧은 양방향 텍스트 및 음성 상호작용을 할 수 있습니다. Live API를 사용하면 최종 사용자에게 사람처럼 자연스러운 음성 대화 경험과 텍스트 또는 음성 명령을 사용하여 모델의 응답을 중단할 수 있는 기능을 제공할 수 있습니다. 모델은 텍스트 및 오디오 입력을 처리할 수 있으며 (동영상은 곧 지원 예정) 텍스트 및 오디오 출력을 제공할 수 있습니다.

Google AI Studio 또는 Vertex AI Studio에서 프롬프트와 Live API를 사용하여 프로토타입을 제작할 수 있습니다.

Live API는 클라이언트와 Gemini 서버 간에 세션을 설정하기 위해 WebSocket 연결을 만드는 스테이트풀(Stateful) API입니다. 자세한 내용은 Live API 참조 문서(Gemini Developer API | Vertex AI Gemini API)를 참고하세요.

시작하기 전에

Gemini API 제공업체를 클릭하여 이 페이지에서 제공업체별 콘텐츠와 코드를 확인합니다.

아직 완료하지 않았다면 Firebase 프로젝트를 설정하고, 앱을 Firebase에 연결하고, SDK를 추가하고, 선택한 Gemini API 제공업체의 백엔드 서비스를 초기화하고, LiveModel 인스턴스를 만드는 방법을 설명하는 시작 가이드를 완료합니다.

이 기능을 지원하는 모델

Live API를 지원하는 모델은 선택한 Gemini API 제공업체에 따라 다릅니다.

Gemini Developer API
- gemini-live-2.5-flash (비공개 GA^*)
- gemini-live-2.5-flash-preview
- gemini-2.0-flash-live-001
- gemini-2.0-flash-live-preview-04-09
Vertex AI Gemini API
- gemini-live-2.5-flash (비공개 GA^*)
- gemini-2.0-flash-live-preview-04-09 (us-central1에서만 액세스 가능)

Live API의 2.5 모델 이름의 경우 live 세그먼트가 gemini 세그먼트 바로 뒤에 옵니다.

^{* Google Cloud 계정팀 담당자에게 문의하여 액세스 권한을 요청하세요.}

Live API의 표준 기능 사용

이 섹션에서는 Live API의 표준 기능을 사용하여 다양한 유형의 입력과 출력을 스트리밍하는 방법을 설명합니다.

문자 보내기 및 문자 받기
오디오 전송 및 오디오 수신
오디오 보내기 및 텍스트 받기
텍스트 보내기 및 오디오 수신

스트리밍된 텍스트 입력에서 스트리밍된 텍스트 생성

이 샘플을 사용해 보기 전에 이 가이드의 시작하기 전에 섹션을 완료하여 프로젝트와 앱을 설정하세요.
이 섹션에서는 선택한 Gemini API 제공업체의 버튼을 클릭하여 이 페이지에 제공업체별 콘텐츠가 표시되도록 합니다.

스트리밍된 텍스트 입력을 전송하고 스트리밍된 텍스트 출력을 수신할 수 있습니다. liveModel 인스턴스를 만들고 응답 모달리티를 Text로 설정해야 합니다.

Swift


import FirebaseAI

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with text
  generationConfig: LiveGenerationConfig(
    responseModalities: [.text]
  )
)

do {
  let session = try await model.connect()

  // Provide a text prompt
  let text = "tell a short story"

  await session.sendTextRealtime(text)

  var outputText = ""
  for try await message in session.responses {
    if case let .content(content) = message.payload {
      content.modelTurn?.parts.forEach { part in
        if let part = part as? TextPart {
          outputText += part.text
        }
      }
      // Optional: if you don't require to send more requests.
      if content.isTurnComplete {
        await session.close()
      }
    }
  }

  // Output received from the server.
  print(outputText)
} catch {
  fatalError(error.localizedDescription)
}

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.TEXT 
   }
)

val session = model.connect()

// Provide a text prompt
val text = "tell a short story"

session.send(text)

var outputText = ""
session.receive().collect {
    if(it.turnComplete) {
        // Optional: if you don't require to send more requests.
        session.stopReceiving();
    }
    outputText = outputText + it.text
}

// Output received from the server.
println(outputText)

Java


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.TEXT)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();
class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {
    @Override
    public void onSubscribe(Subscription s) {
        s.request(Long.MAX_VALUE); // Request an unlimited number of items
    }
    @Override
    public void onNext(LiveContentResponse liveContentResponse) {
       // Handle the response from the server.
	System.out.println(liveContentResponse.getText());
    }
    @Override
    public void onError(Throwable t) {
        System.err.println("Error: " + t.getMessage());
    }
    @Override
    public void onComplete() {
        System.out.println("Done receiving messages!");
    }
}
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	  LiveSessionFutures session = LiveSessionFutures.from(ses);
        // Provide a text prompt
        String text = "tell me a short story?";
        session.send(text);
        Publisher<LiveContentResponse> publisher = session.receive();
        publisher.subscribe(new LiveContentResponseSubscriber());
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with text
  generationConfig: {
    responseModalities: [ResponseModality.TEXT],
  },
});

const session = await model.connect();

// Provide a text prompt
const prompt = "tell a short story";
session.send(prompt);

// Collect text from model's turn
let text = "";
const messages = session.receive();
for await (const message of messages) {
  switch (message.type) {
    case "serverContent":
      if (message.turnComplete) {
        console.log(text);
      } else {
        const parts = message.modelTurn?.parts;
        if (parts) {
          text += parts.map((part) => part.text).join("");
        }
      }
      break;
    case "toolCall":
      // Ignore
    case "toolCallCancellation":
      // Ignore
  }
}

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

late LiveModelSession _session;

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
final model = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to respond with text
  liveGenerationConfig: LiveGenerationConfig(responseModalities: [ResponseModalities.text]),
);

_session = await model.connect();

// Provide a text prompt
final prompt = Content.text('tell a short story');
await _session.send(input: prompt, turnComplete: true);

// In a separate thread, receive the response
await for (final message in _session.receive()) {
   // Process the received message
}

Unity


using Firebase;
using Firebase.AI;

async Task SendTextReceiveText() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Text })
  );

  LiveSession session = await model.ConnectAsync();

  // Provide a text prompt
  var prompt = ModelContent.Text("tell a short story");
  await session.SendAsync(content: prompt, turnComplete: true);

  // Receive the response
  await foreach (var message in session.ReceiveAsync()) {
    // Process the received message
    if (!string.IsNullOrEmpty(message.Text)) {
      UnityEngine.Debug.Log("Received message: " + message.Text);
    }
  }
}

스트리밍된 오디오 입력에서 스트리밍된 오디오 생성

스트리밍된 오디오 입력을 전송하고 스트리밍된 오디오 출력을 수신할 수 있습니다. LiveModel 인스턴스를 만들고 응답 모달리티를 Audio로 설정해야 합니다.

대답 음성을 구성하고 맞춤설정하는 방법을 알아보세요 (이 페이지 뒷부분 참고).

Swift


import FirebaseAI

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with audio
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio]
  )
)

do {
  let session = try await model.connect()

  // Load the audio file, or tap a microphone
  guard let audioFile = NSDataAsset(name: "audio.pcm") else {
    fatalError("Failed to load audio file")
  }

  // Provide the audio data
  await session.sendAudioRealtime(audioFile.data)

  var outputText = ""
  for try await message in session.responses {
    if case let .content(content) = message.payload {
      content.modelTurn?.parts.forEach { part in
        if let part = part as? InlineDataPart, part.mimeType.starts(with: "audio/pcm") {
          // Handle 16bit pcm audio data at 24khz
          playAudio(part.data)
        }
      }
      // Optional: if you don't require to send more requests.
      if content.isTurnComplete {
        await session.close()
      }
    }
  }
} catch {
  fatalError(error.localizedDescription)
}

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO 
   }
)

val session = model.connect()

// This is the recommended way.
// However, you can create your own recorder and handle the stream.
session.startAudioConversation()

Java


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.TEXT)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();

Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	 LiveSessionFutures session = LiveSessionFutures.from(ses);
        session.startAudioConversation();
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with audio
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
  },
});

const session = await model.connect();

// Start the audio conversation
const audioConversationController = await startAudioConversation(session);

// ... Later, to stop the audio conversation
// await audioConversationController.stop()

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';

late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
final model = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
   // Configure the model to respond with audio
   liveGenerationConfig: LiveGenerationConfig(responseModalities: [ResponseModalities.audio]),
);

_session = await model.connect();

final audioRecordStream = _audioRecorder.startRecordingStream();
// Map the Uint8List stream to InlineDataPart stream
final mediaChunkStream = audioRecordStream.map((data) {
  return InlineDataPart('audio/pcm', data);
});
await _session.startMediaStream(mediaChunkStream);

// In a separate thread, receive the audio response from the model
await for (final message in _session.receive()) {
   // Process the received message
}

Unity


using Firebase;
using Firebase.AI;

async Task SendTextReceiveAudio() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with audio
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio })
  );

  LiveSession session = await model.ConnectAsync();

  // Start a coroutine to send audio from the Microphone
  var recordingCoroutine = StartCoroutine(SendAudio(session));

  // Start receiving the response
  await ReceiveAudio(session);
}

IEnumerator SendAudio(LiveSession liveSession) {
  string microphoneDeviceName = null;
  int recordingFrequency = 16000;
  int recordingBufferSeconds = 2;

  var recordingClip = Microphone.Start(microphoneDeviceName, true,
                                       recordingBufferSeconds, recordingFrequency);

  int lastSamplePosition = 0;
  while (true) {
    if (!Microphone.IsRecording(microphoneDeviceName)) {
      yield break;
    }

    int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);

    if (currentSamplePosition != lastSamplePosition) {
      // The Microphone uses a circular buffer, so we need to check if the
      // current position wrapped around to the beginning, and handle it
      // accordingly.
      int sampleCount;
      if (currentSamplePosition > lastSamplePosition) {
        sampleCount = currentSamplePosition - lastSamplePosition;
      } else {
        sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
      }

      if (sampleCount > 0) {
        // Get the audio chunk
        float[] samples = new float[sampleCount];
        recordingClip.GetData(samples, lastSamplePosition);

        // Send the data, discarding the resulting Task to avoid the warning
        _ = liveSession.SendAudioAsync(samples);

        lastSamplePosition = currentSamplePosition;
      }
    }

    // Wait for a short delay before reading the next sample from the Microphone
    const float MicrophoneReadDelay = 0.5f;
    yield return new WaitForSeconds(MicrophoneReadDelay);
  }
}

Queue audioBuffer = new();

async Task ReceiveAudio(LiveSession liveSession) {
  int sampleRate = 24000;
  int channelCount = 1;

  // Create a looping AudioClip to fill with the received audio data
  int bufferSamples = (int)(sampleRate * channelCount);
  AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
                                    sampleRate, true, OnAudioRead);

  // Attach the clip to an AudioSource and start playing it
  AudioSource audioSource = GetComponent();
  audioSource.clip = clip;
  audioSource.loop = true;
  audioSource.Play();

  // Start receiving the response
  await foreach (var message in liveSession.ReceiveAsync()) {
    // Process the received message
    foreach (float[] pcmData in message.AudioAsFloat) {
      lock (audioBuffer) {
        foreach (float sample in pcmData) {
          audioBuffer.Enqueue(sample);
        }
      }
    }
  }
}

// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
  int samplesToProvide = data.Length;
  int samplesProvided = 0;

  lock(audioBuffer) {
    while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
      data[samplesProvided] = audioBuffer.Dequeue();
      samplesProvided++;
    }
  }

  while (samplesProvided < samplesToProvide) {
    data[samplesProvided] = 0.0f;
    samplesProvided++;
  }
}

스트리밍된 오디오 입력에서 스트리밍된 텍스트 생성

스트리밍된 오디오 입력을 전송하고 스트리밍된 텍스트 출력을 수신할 수 있습니다. LiveModel 인스턴스를 만들고 응답 모달리티를 Text로 설정해야 합니다.

Swift


import FirebaseAI

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with text
  generationConfig: LiveGenerationConfig(
    responseModalities: [.text]
  )
)

do {
  let session = try await model.connect()

  // Load the audio file, or tap a microphone
  guard let audioFile = NSDataAsset(name: "audio.pcm") else {
    fatalError("Failed to load audio file")
  }

  // Provide the audio data
  await session.sendAudioRealtime(audioFile.data)

  var outputText = ""
  for try await message in session.responses {
    if case let .content(content) = message.payload {
      content.modelTurn?.parts.forEach { part in
        if let part = part as? TextPart {
          outputText += part.text
        }
      }
      // Optional: if you don't require to send more requests.
      if content.isTurnComplete {
        await session.close()
      }
    }
  }

  // Output received from the server.
  print(outputText)
} catch {
  fatalError(error.localizedDescription)
}

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.TEXT
   }
)

val session = model.connect()

// Provide a text prompt
val audioContent = content("user") { audioData }

session.send(audioContent)

var outputText = ""
session.receive().collect {
    if(it.status == Status.TURN_COMPLETE) {
        // Optional: if you don't require to send more requests.
        session.stopReceiving();
    }
    outputText = outputText + it.text
}

// Output received from the server.
println(outputText)

Java


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.TEXT)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();
class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {
    @Override
    public void onSubscribe(Subscription s) {
        s.request(Long.MAX_VALUE); // Request an unlimited number of items
    }
    @Override
    public void onNext(LiveContentResponse liveContentResponse) {
        // Handle the response from the server.
	System.out.println(liveContentResponse.getText());
    }
    @Override
    public void onError(Throwable t) {
        System.err.println("Error: " + t.getMessage());
    }
    @Override
    public void onComplete() {
        System.out.println("Done receiving messages!");
    }
}
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	 LiveSessionFutures session = LiveSessionFutures.from(ses);
        // Send Audio data
	 session.send(new Content.Builder().addInlineData(audioData, "audio/pcm").build());

        session.send(text);
        Publisher<LiveContentResponse> publisher = session.receive();
        publisher.subscribe(new LiveContentResponseSubscriber());
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with text
  generationConfig: {
    responseModalities: [ResponseModality.TEXT],
  },
});

const session = await model.connect();

// TODO(developer): Collect audio data (16-bit 16kHz PCM)
// const audioData = ...

// Send audio
const audioPart = {
  inlineData: { data: audioData, mimeType: "audio/pcm" },
};
session.send([audioPart]);

// Collect text from model's turn
let text = "";
const messages = session.receive();
for await (const message of messages) {
  switch (message.type) {
    case "serverContent":
      if (message.turnComplete) {
        console.log(text);
      } else {
        const parts = message.modelTurn?.parts;
        if (parts) {
          text += parts.map((part) => part.text).join("");
        }
      }
      break;
    case "toolCall":
      // Ignore
    case "toolCallCancellation":
      // Ignore
  }
}

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';
import 'dart:async';

late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
final model = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to respond with text
  liveGenerationConfig: LiveGenerationConfig(responseModalities: ResponseModalities.text),
);

_session = await model.connect();

final audioRecordStream = _audioRecorder.startRecordingStream();
final mediaChunkStream = audioRecordStream.map((data) {
  return InlineDataPart('audio/pcm', data);
});

await _session.startMediaStream(mediaChunkStream);

final responseStream = _session.receive();

return responseStream.asyncMap((response) async {
  if (response.parts.isNotEmpty && response.parts.first.text != null) {
    return response.parts.first.text!;
  } else {
    throw Exception('Text response not found.');
  }
});

Future main() async {
  try {
    final textStream = await audioToText();

    await for (final text in textStream) {
      print('Received text: $text');
      // Handle the text response
    }
  } catch (e) {
    print('Error: $e');
  }
}

Unity


using Firebase;
using Firebase.AI;

async Task SendAudioReceiveText() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Text })
  );

  LiveSession session = await model.ConnectAsync();

  // Start a coroutine to send audio from the Microphone
  var recordingCoroutine = StartCoroutine(SendAudio(session));

  // Receive the response
  await foreach (var message in session.ReceiveAsync()) {
    // Process the received message
    if (!string.IsNullOrEmpty(message.Text)) {
      UnityEngine.Debug.Log("Received message: " + message.Text);
    }
  }

  StopCoroutine(recordingCoroutine);
}

IEnumerator SendAudio(LiveSession liveSession) {
  string microphoneDeviceName = null;
  int recordingFrequency = 16000;
  int recordingBufferSeconds = 2;

  var recordingClip = Microphone.Start(microphoneDeviceName, true,
                                       recordingBufferSeconds, recordingFrequency);

  int lastSamplePosition = 0;
  while (true) {
    if (!Microphone.IsRecording(microphoneDeviceName)) {
      yield break;
    }

    int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);

    if (currentSamplePosition != lastSamplePosition) {
      // The Microphone uses a circular buffer, so we need to check if the
      // current position wrapped around to the beginning, and handle it
      // accordingly.
      int sampleCount;
      if (currentSamplePosition > lastSamplePosition) {
        sampleCount = currentSamplePosition - lastSamplePosition;
      } else {
        sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
      }

      if (sampleCount > 0) {
        // Get the audio chunk
        float[] samples = new float[sampleCount];
        recordingClip.GetData(samples, lastSamplePosition);

        // Send the data, discarding the resulting Task to avoid the warning
        _ = liveSession.SendAudioAsync(samples);

        lastSamplePosition = currentSamplePosition;
      }
    }

    // Wait for a short delay before reading the next sample from the Microphone
    const float MicrophoneReadDelay = 0.5f;
    yield return new WaitForSeconds(MicrophoneReadDelay);
  }
}

스트리밍된 텍스트 입력에서 스트리밍된 오디오 생성

스트리밍된 텍스트 입력을 전송하고 스트리밍된 오디오 출력을 수신할 수 있습니다. LiveModel 인스턴스를 만들고 응답 모달리티를 Audio로 설정해야 합니다.

대답 음성을 구성하고 맞춤설정하는 방법을 알아보세요 (이 페이지 뒷부분 참고).

Swift


import FirebaseAI

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with audio
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio]
  )
)

do {
  let session = try await model.connect()

  // Provide a text prompt
  let text = "tell a short story"

  await session.sendTextRealtime(text)

  var outputText = ""
  for try await message in session.responses {
    if case let .content(content) = message.payload {
      content.modelTurn?.parts.forEach { part in
        if let part = part as? InlineDataPart, part.mimeType.starts(with: "audio/pcm") {
          // Handle 16bit pcm audio data at 24khz
          playAudio(part.data)
        }
      }
      // Optional: if you don't require to send more requests.
      if content.isTurnComplete {
        await session.close()
      }
    }
  }
} catch {
  fatalError(error.localizedDescription)
}

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with audio
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
   }
)

val session = model.connect()

// Provide a text prompt
val text = "tell a short story"

session.send(text)

session.receive().collect {
    if(it.turnComplete) {
        // Optional: if you don't require to send more requests.
        session.stopReceiving();
    }
    // Handle 16bit pcm audio data at 24khz
    playAudio(it.data)
}

Java


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.AUDIO)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();
class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {
    @Override
    public void onSubscribe(Subscription s) {
        s.request(Long.MAX_VALUE); // Request an unlimited number of items
    }
    @Override
    public void onNext(LiveContentResponse liveContentResponse) {
        // Handle 16bit pcm audio data at 24khz
	liveContentResponse.getData();
    }
    @Override
    public void onError(Throwable t) {
        System.err.println("Error: " + t.getMessage());
    }
    @Override
    public void onComplete() {
        System.out.println("Done receiving messages!");
    }
}
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	 LiveSessionFutures session = LiveSessionFutures.from(ses);
        // Provide a text prompt
        String text = "tell me a short story?";
        session.send(text);
        Publisher<LiveContentResponse> publisher = session.receive();
        publisher.subscribe(new LiveContentResponseSubscriber());
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveGenerativeModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to respond with audio
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
  },
});

const session = await model.connect();

// Provide a text prompt
const prompt = "tell a short story";
session.send(prompt);

// Handle the model's audio output
const messages = session.receive();
for await (const message of messages) {
  switch (message.type) {
    case "serverContent":
      if (message.turnComplete) {
        // TODO(developer): Handle turn completion
      } else if (message.interrupted) {
        // TODO(developer): Handle the interruption
        break;
      } else if (message.modelTurn) {
        const parts = message.modelTurn?.parts;
        parts?.forEach((part) => {
          if (part.inlineData) {
            // TODO(developer): Play the audio chunk
          }
        });
      }
      break;
    case "toolCall":
      // Ignore
    case "toolCallCancellation":
      // Ignore
  }
}

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'dart:async';
import 'dart:typed_data';

late LiveModelSession _session;

Future<Stream<Uint8List>> textToAudio(String textPrompt) async {
  WidgetsFlutterBinding.ensureInitialized();

  await Firebase.initializeApp(
    options: DefaultFirebaseOptions.currentPlatform,
  );

  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  final model = FirebaseAI.googleAI().liveGenerativeModel(
    model: 'gemini-2.0-flash-live-preview-04-09',
    // Configure the model to respond with audio
    liveGenerationConfig: LiveGenerationConfig(responseModalities: ResponseModalities.audio),
  );

  _session = await model.connect();

  final prompt = Content.text(textPrompt);

  await _session.send(input: prompt);

  return _session.receive().asyncMap((response) async {
    if (response is LiveServerContent && response.modelTurn?.parts != null) {
       for (final part in response.modelTurn!.parts) {
         if (part is InlineDataPart) {
           return part.bytes;
         }
       }
    }
    throw Exception('Audio data not found');
  });
}

Future<void> main() async {
  try {
    final audioStream = await textToAudio('Convert this text to audio.');

    await for (final audioData in audioStream) {
      // Process the audio data (e.g., play it using an audio player package)
      print('Received audio data: ${audioData.length} bytes');
      // Example using flutter_sound (replace with your chosen package):
      // await _flutterSoundPlayer.startPlayer(fromDataBuffer: audioData);
    }
  } catch (e) {
    print('Error: $e');
  }
}

Unity


using Firebase;
using Firebase.AI;

async Task SendTextReceiveAudio() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with audio
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio })
  );

  LiveSession session = await model.ConnectAsync();

  // Provide a text prompt
  var prompt = ModelContent.Text("Convert this text to audio.");
  await session.SendAsync(content: prompt, turnComplete: true);

  // Start receiving the response
  await ReceiveAudio(session);
}

Queue<float> audioBuffer = new();

async Task ReceiveAudio(LiveSession session) {
  int sampleRate = 24000;
  int channelCount = 1;

  // Create a looping AudioClip to fill with the received audio data
  int bufferSamples = (int)(sampleRate * channelCount);
  AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
                                    sampleRate, true, OnAudioRead);

  // Attach the clip to an AudioSource and start playing it
  AudioSource audioSource = GetComponent<AudioSource>();
  audioSource.clip = clip;
  audioSource.loop = true;
  audioSource.Play();

  // Start receiving the response
  await foreach (var message in session.ReceiveAsync()) {
    // Process the received message
    foreach (float[] pcmData in message.AudioAsFloat) {
      lock (audioBuffer) {
        foreach (float sample in pcmData) {
          audioBuffer.Enqueue(sample);
        }
      }
    }
  }
}

// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
  int samplesToProvide = data.Length;
  int samplesProvided = 0;

  lock(audioBuffer) {
    while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
      data[samplesProvided] = audioBuffer.Dequeue();
      samplesProvided++;
    }
  }

  while (samplesProvided < samplesToProvide) {
    data[samplesProvided] = 0.0f;
    samplesProvided++;
  }
}

더욱 매력적이고 상호작용적인 환경 만들기

이 섹션에서는 Live API의 참여도와 상호작용성을 높이는 기능을 만들고 관리하는 방법을 설명합니다.

대답 음성 변경

Live API는 Chirp 3를 사용하여 합성 음성 응답을 지원합니다. Firebase AI Logic을 사용하면 다양한 HD 음성 언어로 오디오를 보낼 수 있습니다. 각 음성의 전체 목록과 데모는 Chirp 3: HD 음성을 참고하세요.

음성을 지정하려면 모델 구성의 일부로 speechConfig 객체 내에서 음성 이름을 설정합니다. 음성을 지정하지 않으면 기본값은 Puck입니다.

Swift


import FirebaseAI
// ...

let model = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to use a specific voice for its audio response
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio],
    speech: SpeechConfig(voiceName: "VOICE_NAME")
  )
)

// ...

Kotlin


// ...

val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to use a specific voice for its audio response
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        speechConfig = SpeechConfig(voice = Voice("VOICE_NAME"))
    }
)

// ...

Java


// ...

LiveModel model = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to use a specific voice for its audio response
    new LiveGenerationConfig.Builder()
        .setResponseModalities(ResponseModality.AUDIO)
        .setSpeechConfig(new SpeechConfig(new Voice("VOICE_NAME")))
        .build()
);

// ...

Web


// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
const model = getLiveGenerativeModel(ai, {
  model: "gemini-2.0-flash-live-preview-04-09",
  // Configure the model to use a specific voice for its audio response
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
    speechConfig: {
      voiceConfig: {
        prebuiltVoiceConfig: { voiceName: "VOICE_NAME" },
      },
    },
  },
});

Dart


// ...

final model = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to use a specific voice for its audio response
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: ResponseModalities.audio,
    speechConfig: SpeechConfig(voiceName: 'VOICE_NAME'),
  ),
);

// ...

Unity


var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  liveGenerationConfig: new LiveGenerationConfig(
    responseModalities: new[] { ResponseModality.Audio },
    speechConfig: SpeechConfig.UsePrebuiltVoice("VOICE_NAME"))
);

모델에 프롬프트를 입력하고 영어 이외의 언어로 응답하도록 요청할 때 최상의 결과를 얻으려면 시스템 요청 사항에 다음을 포함하세요.

RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.

세션과 요청 전반에서 컨텍스트 유지

채팅 구조를 사용하여 세션과 요청 전반에서 컨텍스트를 유지할 수 있습니다. 이 방법은 텍스트 입력 및 텍스트 출력에만 적용됩니다.

이 접근 방식은 짧은 컨텍스트에 가장 적합합니다. 정확한 이벤트 순서를 나타내기 위해 차례대로 상호작용을 보낼 수 있습니다. 컨텍스트가 긴 경우 후속 상호작용을 위해 컨텍스트 윈도우를 확보할 수 있도록 단일 메시지 요약을 제공하는 것이 좋습니다.

방해 요소 처리

Firebase AI Logic은(는) 아직 인터럽트 처리를 지원하지 않습니다. 나중에 다시 확인해 주세요.

함수 호출 (도구) 사용

표준 콘텐츠 생성 방법과 마찬가지로 Live API에서 사용할 수 있는 함수와 같은 도구를 정의할 수 있습니다. 이 섹션에서는 함수 호출과 함께 Live API를 사용할 때의 몇 가지 미묘한 차이점을 설명합니다. 함수 호출에 대한 전체 설명과 예시는 함수 호출 가이드를 참고하세요.

모델은 단일 프롬프트에서 여러 함수 호출과 출력을 연결하는 데 필요한 코드를 생성할 수 있습니다. 이 코드는 샌드박스 환경에서 실행되어 후속 BidiGenerateContentToolCall 메시지를 생성합니다. 각 함수 호출의 결과가 나올 때까지 실행이 일시중지되므로 순차 처리가 보장됩니다.

또한 함수 호출과 함께 Live API를 사용하면 모델이 사용자에게 후속 조치 또는 명확한 정보를 요청할 수 있으므로 특히 강력합니다. 예를 들어 모델에 호출하려는 함수에 매개변수 값을 제공할 정보가 충분하지 않은 경우 모델은 사용자에게 추가 정보나 명확한 정보를 제공해 달라고 요청할 수 있습니다.

클라이언트는 BidiGenerateContentToolResponse로 응답해야 합니다.

제한사항 및 요구사항

Live API의 다음 제한사항 및 요구사항에 유의하세요.

스크립트 작성

Firebase AI Logic은(는) 아직 스크립트를 지원하지 않습니다. 나중에 다시 확인해 주세요.

언어

입력 언어: Gemini 모델에서 지원되는 입력 언어의 전체 목록을 참고하세요.
출력 언어: Chirp 3: HD 음성에서 사용 가능한 출력 언어의 전체 목록을 확인하세요.

오디오 형식

Live API는 다음 오디오 형식을 지원합니다.

입력 오디오 형식: 16kHz little-endian의 원시 16비트 PCM 오디오
출력 오디오 형식: 24kHz little-endian의 원시 16비트 PCM 오디오

비율 제한

Live API에는 Firebase 프로젝트당 동시 세션과 분당 토큰 수 (TPM)에 대한 속도 제한이 있습니다.

Gemini Developer API:
- 한도는 프로젝트의 Gemini Developer API '사용량 등급'에 따라 다릅니다(요청 빈도 제한 문서 참고).
Vertex AI Gemini API:
- Firebase 프로젝트당 동시 세션 5,000개
- 분당 토큰 4백만 개

세션 길이

세션의 기본 길이는 10분입니다. 세션 기간이 한도를 초과하면 연결이 종료됩니다.

모델은 컨텍스트 크기에 의해서도 제한됩니다. 입력의 큰 청크를 전송하면 세션이 조기에 종료될 수 있습니다.

음성 활동 감지(VAD)

이 모델은 연속 오디오 입력 스트림에서 음성 활동 감지(VAD)를 자동으로 실행합니다. VAD는 기본적으로 사용 설정되어 있습니다.

토큰 수 계산

Live API와 함께 CountTokens API를 사용할 수 없습니다.

Firebase AI Logic 사용 경험에 관한 의견 보내기

Gemini Live API를 사용한 양방향 스트리밍 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

시작하기 전에

이 기능을 지원하는 모델

Live API의 표준 기능 사용

스트리밍된 텍스트 입력에서 스트리밍된 텍스트 생성

Swift

Kotlin

Java

Web

Dart

Unity

스트리밍된 오디오 입력에서 스트리밍된 오디오 생성

Swift

Kotlin

Java

Web

Dart

Unity

스트리밍된 오디오 입력에서 스트리밍된 텍스트 생성

Swift

Kotlin

Java

Web

Dart

Unity

스트리밍된 텍스트 입력에서 스트리밍된 오디오 생성

Swift

Kotlin

Java

Web

Dart

Unity

더욱 매력적이고 상호작용적인 환경 만들기

대답 음성 변경

Swift

Kotlin

Java

Web

Dart

Unity

세션과 요청 전반에서 컨텍스트 유지

방해 요소 처리

함수 호출 (도구) 사용

제한사항 및 요구사항

스크립트 작성

언어

오디오 형식

비율 제한

세션 길이

음성 활동 감지(VAD)

토큰 수 계산

Gemini Live API를 사용한 양방향 스트리밍