The latest Gemini models, like Gemini 3.1 Flash Image (Nano Banana 2), are available to use with Firebase AI Logic on all platforms!

Gemini 2.0 Flash and Flash-Lite models will be retired on June 1, 2026. To avoid service disruption, update to a newer model like gemini-2.5-flash-lite. Also, Gemini 3 Pro Preview (gemini-3-pro-preview) will be retired on March 9, 2026 (update to Gemini 3.1 Pro Preview: gemini-3.1-pro-preview). Learn more.

Gemini API를 사용하여 이미지 파일 분석

Gemini 모델에 인라인 (base64 인코딩) 또는 URL을 통해 제공하는 이미지 파일을 분석해 달라고 요청할 수 있습니다. Firebase AI Logic를 사용하는 경우 앱에서 직접 이 요청을 할 수 있습니다.

이 기능을 사용하면 다음과 같은 작업을 할 수 있습니다.

이미지에 대한 설명을 만들거나 질문에 답변하기
이미지에 관한 짧은 이야기나 시를 써 줘
이미지에서 객체를 감지하고 객체의 경계 상자 좌표를 반환합니다.
감정, 스타일 또는 기타 특징에 따라 이미지 세트에 라벨을 지정하거나 분류합니다.

코드 샘플로 이동 스트리밍된 응답의 코드로 이동

이미지 작업을 위한 추가 옵션은 다른 가이드를 참고하세요.
구조화된 출력 생성 멀티턴 채팅 온디바이스 이미지 분석 이미지 생성

시작하기 전에

Gemini API 제공업체를 클릭하여 이 페이지에서 제공업체별 콘텐츠와 코드를 확인합니다.

아직 완료하지 않았다면 Firebase 프로젝트를 설정하고, 앱을 Firebase에 연결하고, SDK를 추가하고, 선택한 Gemini API 제공업체의 백엔드 서비스를 초기화하고, GenerativeModel 인스턴스를 만드는 방법을 설명하는 시작 가이드를 완료합니다.

프롬프트를 테스트하고 반복하려면 Google AI Studio를 사용하는 것이 좋습니다.

샘플 이미지 파일이 필요하신가요?

MIME 유형이 image/jpeg인 공개 파일을 사용할 수 있습니다(파일 보기 또는 다운로드). https://storage.googleapis.com/cloud-samples-data/generative-ai/image/scones.jpg

참고: Firebase AI Logic은 아직 입력 미디어 해상도 구성을 지원하지 않지만 곧 지원될 예정입니다.

이미지 파일에서 텍스트 생성 (base64 인코딩)

이 샘플을 사용해 보기 전에 이 가이드의 시작하기 전에 섹션을 완료하여 프로젝트와 앱을 설정하세요.
이 섹션에서는 선택한 Gemini API 제공업체의 버튼을 클릭하여 이 페이지에 제공업체별 콘텐츠가 표시되도록 합니다.

텍스트와 이미지로 프롬프트를 표시하여 Gemini 모델에 텍스트를 생성해 달라고 요청할 수 있습니다. 각 입력 파일의 mimeType와 파일 자체를 제공하면 됩니다. 이 페이지의 뒷부분에서 입력 파일 요구사항 및 권장사항을 확인하세요.

Swift

generateContent()을 호출하여 텍스트와 이미지의 멀티모달 입력에서 텍스트를 생성할 수 있습니다.

단일 파일 입력


import FirebaseAILogic

// Initialize the Gemini Developer API backend service
let ai = FirebaseAI.firebaseAI(backend: .googleAI())

// Create a `GenerativeModel` instance with a model that supports your use case
let model = ai.generativeModel(modelName: "gemini-2.5-flash")


guard let image = UIImage(systemName: "bicycle") else { fatalError() }

// Provide a text prompt to include with the image
let prompt = "What's in this picture?"

// To generate text output, call generateContent and pass in the prompt
let response = try await model.generateContent(image, prompt)
print(response.text ?? "No text in response.")

여러 파일 입력


import FirebaseAILogic

// Initialize the Gemini Developer API backend service
let ai = FirebaseAI.firebaseAI(backend: .googleAI())

// Create a `GenerativeModel` instance with a model that supports your use case
let model = ai.generativeModel(modelName: "gemini-2.5-flash")


guard let image1 = UIImage(systemName: "car") else { fatalError() }
guard let image2 = UIImage(systemName: "car.2") else { fatalError() }

// Provide a text prompt to include with the images
let prompt = "What's different between these pictures?"

// To generate text output, call generateContent and pass in the prompt
let response = try await model.generateContent(image1, image2, prompt)
print(response.text ?? "No text in response.")

Kotlin

generateContent()을 호출하여 텍스트와 이미지의 멀티모달 입력에서 텍스트를 생성할 수 있습니다.

^{Kotlin의 경우 이 SDK의 메서드는 정지 함수이며 코루틴 범위에서 호출해야 합니다.}

단일 파일 입력


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val model = Firebase.ai(backend = GenerativeBackend.googleAI())
                        .generativeModel("gemini-2.5-flash")


// Loads an image from the app/res/drawable/ directory
val bitmap: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky)

// Provide a prompt that includes the image specified above and text
val prompt = content {
  image(bitmap)
  text("What developer tool is this mascot from?")
}

// To generate text output, call generateContent with the prompt
val response = model.generateContent(prompt)
print(response.text)

여러 파일 입력

^{Kotlin의 경우 이 SDK의 메서드는 정지 함수이며 코루틴 범위에서 호출해야 합니다.}


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val model = Firebase.ai(backend = GenerativeBackend.googleAI())
                        .generativeModel("gemini-2.5-flash")


// Loads an image from the app/res/drawable/ directory
val bitmap1: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky)
val bitmap2: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky_eats_pizza)

// Provide a prompt that includes the images specified above and text
val prompt = content {
  image(bitmap1)
  image(bitmap2)
  text("What is different between these pictures?")
}

// To generate text output, call generateContent with the prompt
val response = model.generateContent(prompt)
print(response.text)

Java

generateContent()을 호출하여 텍스트와 이미지의 멀티모달 입력에서 텍스트를 생성할 수 있습니다.

^{Java의 경우 이 SDK의 메서드는 ListenableFuture를 반환합니다.}

단일 파일 입력


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI())
        .generativeModel("gemini-2.5-flash");

// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures model = GenerativeModelFutures.from(ai);


Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.sparky);

// Provide a prompt that includes the image specified above and text
Content content = new Content.Builder()
        .addImage(bitmap)
        .addText("What developer tool is this mascot from?")
        .build();

// To generate text output, call generateContent with the prompt
ListenableFuture<GenerateContentResponse> response = model.generateContent(content);
Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
    @Override
    public void onSuccess(GenerateContentResponse result) {
        String resultText = result.getText();
        System.out.println(resultText);
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

여러 파일 입력


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI())
        .generativeModel("gemini-2.5-flash");

// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures model = GenerativeModelFutures.from(ai);


Bitmap bitmap1 = BitmapFactory.decodeResource(getResources(), R.drawable.sparky);
Bitmap bitmap2 = BitmapFactory.decodeResource(getResources(), R.drawable.sparky_eats_pizza);

// Provide a prompt that includes the images specified above and text
Content prompt = new Content.Builder()
    .addImage(bitmap1)
    .addImage(bitmap2)
    .addText("What's different between these pictures?")
    .build();

// To generate text output, call generateContent with the prompt
ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt);
Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
    @Override
    public void onSuccess(GenerateContentResponse result) {
        String resultText = result.getText();
        System.out.println(resultText);
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

Web

generateContent()을 호출하여 텍스트와 이미지의 멀티모달 입력에서 텍스트를 생성할 수 있습니다.

단일 파일 입력


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, { model: "gemini-2.5-flash" });


// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the image
  const prompt = "What do you see?";

  const fileInputEl = document.querySelector("input[type=file]");
  const imagePart = await fileToGenerativePart(fileInputEl.files[0]);

  // To generate text output, call generateContent with the text and image
  const result = await model.generateContent([prompt, imagePart]);

  const response = result.response;
  const text = response.text();
  console.log(text);
}

run();

여러 파일 입력


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, { model: "gemini-2.5-flash" });


// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the images
  const prompt = "What's different between these pictures?";

  // Prepare images for input
  const fileInputEl = document.querySelector("input[type=file]");
  const imageParts = await Promise.all(
    [...fileInputEl.files].map(fileToGenerativePart)
  );

  // To generate text output, call generateContent with the text and images
  const result = await model.generateContent([prompt, ...imageParts]);

  const response = result.response;
  const text = response.text();
  console.log(text);
}

run();

Dart

generateContent()을 호출하여 텍스트와 이미지의 멀티모달 입력에서 텍스트를 생성할 수 있습니다.

단일 파일 입력


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

// Initialize FirebaseApp
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final model =
      FirebaseAI.googleAI().generativeModel(model: 'gemini-2.5-flash');


// Provide a text prompt to include with the image
final prompt = TextPart("What's in the picture?");
// Prepare images for input
final image = await File('image0.jpg').readAsBytes();
final imagePart = InlineDataPart('image/jpeg', image);

// To generate text output, call generateContent with the text and image
final response = await model.generateContent([
  Content.multi([prompt,imagePart])
]);
print(response.text);

여러 파일 입력


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

// Initialize FirebaseApp
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final model =
      FirebaseAI.googleAI().generativeModel(model: 'gemini-2.5-flash');


final (firstImage, secondImage) = await (
  File('image0.jpg').readAsBytes(),
  File('image1.jpg').readAsBytes()
).wait;
// Provide a text prompt to include with the images
final prompt = TextPart("What's different between these pictures?");
// Prepare images for input
final imageParts = [
  InlineDataPart('image/jpeg', firstImage),
  InlineDataPart('image/jpeg', secondImage),
];

// To generate text output, call generateContent with the text and images
final response = await model.generateContent([
  Content.multi([prompt, ...imageParts])
]);
print(response.text);

Unity

GenerateContentAsync()을 호출하여 텍스트와 이미지의 멀티모달 입력에서 텍스트를 생성할 수 있습니다.

단일 파일 입력


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
var ai = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());

// Create a `GenerativeModel` instance with a model that supports your use case
var model = ai.GetGenerativeModel(modelName: "gemini-2.5-flash");


// Convert a Texture2D into InlineDataParts
var grayImage = ModelContent.InlineData("image/png",
      UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.grayTexture));

// Provide a text prompt to include with the image
var prompt = ModelContent.Text("What's in this picture?");

// To generate text output, call GenerateContentAsync and pass in the prompt
var response = await model.GenerateContentAsync(new [] { grayImage, prompt });
UnityEngine.Debug.Log(response.Text ?? "No text in response.");

여러 파일 입력


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
var ai = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());

// Create a `GenerativeModel` instance with a model that supports your use case
var model = ai.GetGenerativeModel(modelName: "gemini-2.5-flash");


// Convert Texture2Ds into InlineDataParts
var blackImage = ModelContent.InlineData("image/png",
      UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.blackTexture));
var whiteImage = ModelContent.InlineData("image/png",
      UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.whiteTexture));

// Provide a text prompt to include with the images
var prompt = ModelContent.Text("What's different between these pictures?");

// To generate text output, call GenerateContentAsync and pass in the prompt
var response = await model.GenerateContentAsync(new [] { blackImage, whiteImage, prompt });
UnityEngine.Debug.Log(response.Text ?? "No text in response.");

사용 사례와 앱에 적합한 모델과 선택적으로 모델 위치를 선택하는 방법을 알아보세요.

대답 스트리밍

모델 생성의 전체 결과를 기다리지 않고 스트리밍을 사용하여 부분 결과를 처리하면 상호작용 속도를 높일 수 있습니다. 대답을 스트리밍하려면 generateContentStream를 호출합니다.

예시 보기: 이미지 파일에서 생성된 텍스트 스트리밍

Swift

generateContentStream()을 호출하여 텍스트와 이미지의 멀티모달 입력에서 생성된 텍스트를 스트리밍할 수 있습니다.

단일 파일 입력


import FirebaseAILogic

// Initialize the Gemini Developer API backend service
let ai = FirebaseAI.firebaseAI(backend: .googleAI())

// Create a `GenerativeModel` instance with a model that supports your use case
let model = ai.generativeModel(modelName: "gemini-2.5-flash")


guard let image = UIImage(systemName: "bicycle") else { fatalError() }

// Provide a text prompt to include with the image
let prompt = "What's in this picture?"

// To stream generated text output, call generateContentStream and pass in the prompt
let contentStream = try model.generateContentStream(image, prompt)
for try await chunk in contentStream {
  if let text = chunk.text {
    print(text)
  }
}

여러 파일 입력


import FirebaseAILogic

// Initialize the Gemini Developer API backend service
let ai = FirebaseAI.firebaseAI(backend: .googleAI())

// Create a `GenerativeModel` instance with a model that supports your use case
let model = ai.generativeModel(modelName: "gemini-2.5-flash")


guard let image1 = UIImage(systemName: "car") else { fatalError() }
guard let image2 = UIImage(systemName: "car.2") else { fatalError() }

// Provide a text prompt to include with the images
let prompt = "What's different between these pictures?"

// To stream generated text output, call generateContentStream and pass in the prompt
let contentStream = try model.generateContentStream(image1, image2, prompt)
for try await chunk in contentStream {
  if let text = chunk.text {
    print(text)
  }
}

Kotlin

generateContentStream()을 호출하여 텍스트와 이미지의 멀티모달 입력에서 생성된 텍스트를 스트리밍할 수 있습니다.

^{Kotlin의 경우 이 SDK의 메서드는 정지 함수이며 코루틴 범위에서 호출해야 합니다.}

단일 파일 입력


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val model = Firebase.ai(backend = GenerativeBackend.googleAI())
                        .generativeModel("gemini-2.5-flash")


// Loads an image from the app/res/drawable/ directory
val bitmap: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky)

// Provide a prompt that includes the image specified above and text
val prompt = content {
  image(bitmap)
  text("What developer tool is this mascot from?")
}

// To stream generated text output, call generateContentStream with the prompt
var fullResponse = ""
model.generateContentStream(prompt).collect { chunk ->
  print(chunk.text)
  fullResponse += chunk.text
}

여러 파일 입력


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val model = Firebase.ai(backend = GenerativeBackend.googleAI())
                        .generativeModel("gemini-2.5-flash")


// Loads an image from the app/res/drawable/ directory
val bitmap1: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky)
val bitmap2: Bitmap = BitmapFactory.decodeResource(resources, R.drawable.sparky_eats_pizza)

// Provide a prompt that includes the images specified above and text
val prompt = content {
    image(bitmap1)
    image(bitmap2)
    text("What's different between these pictures?")
}

// To stream generated text output, call generateContentStream with the prompt
var fullResponse = ""
model.generateContentStream(prompt).collect { chunk ->
  print(chunk.text)
  fullResponse += chunk.text
}

Java

generateContentStream()을 호출하여 텍스트와 이미지의 멀티모달 입력에서 생성된 텍스트를 스트리밍할 수 있습니다.

^{Java의 경우 이 SDK의 스트리밍 메서드는 반응형 스트림 라이브러리에서 Publisher 유형을 반환합니다.}

단일 파일 입력


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI())
        .generativeModel("gemini-2.5-flash");

// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures model = GenerativeModelFutures.from(ai);


Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.sparky);

// Provide a prompt that includes the image specified above and text
Content prompt = new Content.Builder()
        .addImage(bitmap)
        .addText("What developer tool is this mascot from?")
        .build();

// To stream generated text output, call generateContentStream with the prompt
Publisher<GenerateContentResponse> streamingResponse = model.generateContentStream(prompt);

final String[] fullResponse = {""};

streamingResponse.subscribe(new Subscriber<GenerateContentResponse>() {
    @Override
    public void onNext(GenerateContentResponse generateContentResponse) {
        String chunk = generateContentResponse.getText();
        fullResponse[0] += chunk;
    }

    @Override
    public void onComplete() {
        System.out.println(fullResponse[0]);
    }

    @Override
    public void onError(Throwable t) {
        t.printStackTrace();
    }

    @Override
    public void onSubscribe(Subscription s) {
    }
});

여러 파일 입력


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI())
        .generativeModel("gemini-2.5-flash");

// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures model = GenerativeModelFutures.from(ai);


Bitmap bitmap1 = BitmapFactory.decodeResource(getResources(), R.drawable.sparky);
Bitmap bitmap2 = BitmapFactory.decodeResource(getResources(), R.drawable.sparky_eats_pizza);

// Provide a prompt that includes the images specified above and text
Content prompt = new Content.Builder()
    .addImage(bitmap1)
    .addImage(bitmap2)
    .addText("What's different between these pictures?")
    .build();

// To stream generated text output, call generateContentStream with the prompt
Publisher<GenerateContentResponse> streamingResponse = model.generateContentStream(prompt);

final String[] fullResponse = {""};

streamingResponse.subscribe(new Subscriber<GenerateContentResponse>() {
    @Override
    public void onNext(GenerateContentResponse generateContentResponse) {
        String chunk = generateContentResponse.getText();
        fullResponse[0] += chunk;
    }

    @Override
    public void onComplete() {
        System.out.println(fullResponse[0]);
    }

    @Override
    public void onError(Throwable t) {
        t.printStackTrace();
    }

    @Override
    public void onSubscribe(Subscription s) {
    }
});

Web

generateContentStream()을 호출하여 텍스트와 이미지의 멀티모달 입력에서 생성된 텍스트를 스트리밍할 수 있습니다.

단일 파일 입력


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, { model: "gemini-2.5-flash" });


// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the image
  const prompt = "What do you see?";

  // Prepare image for input
  const fileInputEl = document.querySelector("input[type=file]");
  const imagePart = await fileToGenerativePart(fileInputEl.files[0]);

  // To stream generated text output, call generateContentStream with the text and image
  const result = await model.generateContentStream([prompt, imagePart]);

  for await (const chunk of result.stream) {
    const chunkText = chunk.text();
    console.log(chunkText);
  }
}

run();

여러 파일 입력


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, { model: "gemini-2.5-flash" });


// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(',')[1]);
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the images
  const prompt = "What's different between these pictures?";

  const fileInputEl = document.querySelector("input[type=file]");
  const imageParts = await Promise.all(
    [...fileInputEl.files].map(fileToGenerativePart)
  );

  // To stream generated text output, call generateContentStream with the text and images
  const result = await model.generateContentStream([prompt, ...imageParts]);

  for await (const chunk of result.stream) {
    const chunkText = chunk.text();
    console.log(chunkText);
  }
}

run();

Dart

generateContentStream()을 호출하여 텍스트와 이미지의 멀티모달 입력에서 생성된 텍스트를 스트리밍할 수 있습니다.

단일 파일 입력


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

// Initialize FirebaseApp
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final model =
      FirebaseAI.googleAI().generativeModel(model: 'gemini-2.5-flash');


// Provide a text prompt to include with the image
final prompt = TextPart("What's in the picture?");
// Prepare images for input
final image = await File('image0.jpg').readAsBytes();
final imagePart = InlineDataPart('image/jpeg', image);

// To stream generated text output, call generateContentStream with the text and image
final response = await model.generateContentStream([
  Content.multi([prompt,imagePart])
]);
await for (final chunk in response) {
  print(chunk.text);
}

여러 파일 입력


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

// Initialize FirebaseApp
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final model =
      FirebaseAI.googleAI().generativeModel(model: 'gemini-2.5-flash');


final (firstImage, secondImage) = await (
  File('image0.jpg').readAsBytes(),
  File('image1.jpg').readAsBytes()
).wait;
// Provide a text prompt to include with the images
final prompt = TextPart("What's different between these pictures?");
// Prepare images for input
final imageParts = [
  InlineDataPart('image/jpeg', firstImage),
  InlineDataPart('image/jpeg', secondImage),
];

// To stream generated text output, call generateContentStream with the text and images
final response = await model.generateContentStream([
  Content.multi([prompt, ...imageParts])
]);
await for (final chunk in response) {
  print(chunk.text);
}

Unity

GenerateContentStreamAsync()을 호출하여 텍스트와 이미지의 멀티모달 입력에서 생성된 텍스트를 스트리밍할 수 있습니다.

단일 파일 입력


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
var ai = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());

// Create a `GenerativeModel` instance with a model that supports your use case
var model = ai.GetGenerativeModel(modelName: "gemini-2.5-flash");


// Convert a Texture2D into InlineDataParts
var gray = ModelContent.InlineData("image/png",
      UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.grayTexture));

// Provide a text prompt to include with the image
var prompt = ModelContent.Text("What's in this picture?");

// To stream generated text output, call GenerateContentStreamAsync and pass in the prompt
var responseStream = model.GenerateContentStreamAsync(new [] { gray, prompt });
await foreach (var response in responseStream) {
  if (!string.IsNullOrWhiteSpace(response.Text)) {
    UnityEngine.Debug.Log(response.Text);
  }
}

여러 파일 입력


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
var ai = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());

// Create a `GenerativeModel` instance with a model that supports your use case
var model = ai.GetGenerativeModel(modelName: "gemini-2.5-flash");


// Convert Texture2Ds into InlineDataParts
var black = ModelContent.InlineData("image/png",
      UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.blackTexture));
var white = ModelContent.InlineData("image/png",
      UnityEngine.ImageConversion.EncodeToPNG(UnityEngine.Texture2D.whiteTexture));

// Provide a text prompt to include with the images
var prompt = ModelContent.Text("What's different between these pictures?");

// To stream generated text output, call GenerateContentStreamAsync and pass in the prompt
var responseStream = model.GenerateContentStreamAsync(new [] { black, white, prompt });
await foreach (var response in responseStream) {
  if (!string.IsNullOrWhiteSpace(response.Text)) {
    UnityEngine.Debug.Log(response.Text);
  }
}

사용 사례와 앱에 적합한 모델과 선택적으로 모델 위치를 선택하는 방법을 알아보세요.

입력 이미지 파일 요구사항 및 권장사항

인라인 데이터로 제공된 파일은 전송 중에 base64로 인코딩되므로 요청 크기가 증가합니다. 요청이 너무 크면 HTTP 413 오류가 발생합니다.

'지원되는 입력 파일 및 요구사항' 페이지에서 다음 사항에 관한 자세한 정보를 확인하세요.

요청에서 파일을 제공하는 다양한 옵션(인라인 또는 파일의 URL 사용)
이미지 파일 요구사항 및 권장사항

지원되는 이미지 MIME 유형

Gemini 멀티모달 모델은 다음과 같은 이미지 MIME 유형을 지원합니다.

PNG - image/png
JPEG - image/jpeg
WebP - image/webp

요청당 한도

이미지의 픽셀 수에는 제한이 없습니다. 그러나 큰 이미지는 원래 가로세로 비율을 유지하면서 최대 해상도인 3072 x 3072에 맞게 축소 및 패딩됩니다.

요청당 최대 파일 수: 이미지 파일 3,000개

또 뭘 할 수 있어?

모델에 긴 프롬프트를 전송하기 전에 토큰 수를 집계하는 방법을 알아보세요.
멀티모달 요청에 대용량 파일을 포함하고 프롬프트에 파일을 제공하기 위한 관리형 솔루션을 사용할 수 있도록 Cloud Storage for Firebase을 설정합니다. 파일에는 이미지, PDF, 동영상, 오디오가 포함될 수 있습니다.
프로덕션 준비를 시작합니다 (프로덕션 체크리스트 참고). 다음을 포함합니다.
- 승인되지 않은 클라이언트의 악용으로부터 Gemini API를 보호하기 위해 Firebase App Check 설정
- Firebase Remote Config 통합: 새 앱 버전을 출시하지 않고 앱의 값을 업데이트합니다 (예: 모델 이름).

다른 기능 사용해 보기

멀티턴 대화 (채팅)를 빌드합니다.
텍스트 전용 프롬프트에서 텍스트를 생성합니다.
텍스트 프롬프트와 멀티모달 프롬프트 모두에서 구조화된 출력 (예: JSON)을 생성합니다.
텍스트 프롬프트에서 이미지 생성(Gemini 또는 Imagen)
함수 호출 및 Google 검색을 통한 그라운딩과 같은 도구를 사용하여 Gemini 모델을 앱의 다른 부분과 외부 시스템 및 정보에 연결합니다.

콘텐츠 생성 제어 방법 알아보기

권장사항, 전략, 예시 프롬프트를 비롯한 프롬프트 설계 이해하기
온도 및 최대 출력 토큰 (Gemini의 경우) 또는 가로세로 비율 및 인물 생성 (Imagen의 경우)과 같은 모델 파라미터를 구성합니다.
안전 설정 사용을 사용하여 유해한 것으로 간주될 수 있는 대답을 받을 가능성을 조정합니다.

Google AI Studio를 사용하여 프롬프트와 모델 구성을 실험하고 생성된 코드 스니펫을 가져올 수도 있습니다.

지원되는 모델 자세히 알아보기

다양한 사용 사례에 사용할 수 있는 모델과 할당량, 가격에 대해 알아봅니다.

Firebase AI Logic 사용 경험에 관한 의견 보내기

Gemini API를 사용하여 이미지 파일 분석 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

시작하기 전에

이미지 파일에서 텍스트 생성 (base64 인코딩)

Swift

단일 파일 입력

여러 파일 입력

Kotlin

단일 파일 입력

여러 파일 입력

Java

단일 파일 입력

여러 파일 입력

Web

단일 파일 입력

여러 파일 입력

Dart

단일 파일 입력

여러 파일 입력

Unity

단일 파일 입력

여러 파일 입력

대답 스트리밍

예시 보기: 이미지 파일에서 생성된 텍스트 스트리밍

Swift

단일 파일 입력

여러 파일 입력

Kotlin

단일 파일 입력

여러 파일 입력

Java

단일 파일 입력

여러 파일 입력

Web

단일 파일 입력

여러 파일 입력

Dart

단일 파일 입력

여러 파일 입력

Unity

단일 파일 입력

여러 파일 입력

입력 이미지 파일 요구사항 및 권장사항

지원되는 이미지 MIME 유형

요청당 한도

또 뭘 할 수 있어?

다른 기능 사용해 보기

콘텐츠 생성 제어 방법 알아보기

지원되는 모델 자세히 알아보기

Gemini API를 사용하여 이미지 파일 분석