Trang này được dịch bởi Cloud Translation API.

Phân tích tài liệu (chẳng hạn như tệp PDF) bằng API Gemini

Bạn có thể yêu cầu mô hình Gemini phân tích các tệp tài liệu (chẳng hạn như tệp PDF và tệp văn bản thuần tuý) mà bạn cung cấp trực tiếp (được mã hoá base64) hoặc thông qua URL. Khi sử dụng Firebase AI Logic, bạn có thể đưa ra yêu cầu này ngay từ ứng dụng của mình.

Với tính năng này, bạn có thể làm những việc như:

Phân tích sơ đồ, biểu đồ và bảng trong tài liệu
Trích xuất thông tin thành các định dạng đầu ra có cấu trúc
Trả lời câu hỏi về nội dung trực quan và nội dung văn bản trong tài liệu
Tóm tắt tài liệu
Chuyển nội dung tài liệu thành văn bản (ví dụ: thành HTML), giữ nguyên bố cục và định dạng để sử dụng trong các ứng dụng tiếp theo (chẳng hạn như trong các quy trình RAG)

Chuyển đến mã mẫu Chuyển đến mã cho các phản hồi được truyền trực tuyến

Xem các hướng dẫn khác để biết thêm các lựa chọn làm việc với tài liệu (chẳng hạn như PDF)
Tạo đầu ra có cấu trúc Cuộc trò chuyện nhiều lượt

Trước khi bắt đầu

Nhấp vào nhà cung cấp Gemini API để xem nội dung và mã dành riêng cho nhà cung cấp trên trang này.

Nếu bạn chưa thực hiện, hãy hoàn tất hướng dẫn bắt đầu sử dụng. Hướng dẫn này mô tả cách thiết lập dự án Firebase, kết nối ứng dụng với Firebase, thêm SDK, khởi động dịch vụ phụ trợ cho nhà cung cấp Gemini API mà bạn chọn và tạo một thực thể GenerativeModel.

Để kiểm thử và lặp lại các câu lệnh, thậm chí nhận được một đoạn mã được tạo, bạn nên sử dụng Google AI Studio.

Bạn cần một tệp PDF mẫu?

Bạn có thể sử dụng tệp có sẵn công khai này với loại MIME là application/pdf (xem hoặc tải tệp xuống). https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf

Tạo văn bản từ tệp PDF (được mã hoá bằng base64)

Trước khi dùng thử mẫu này, hãy hoàn tất phần Trước khi bắt đầu của hướng dẫn này để thiết lập dự án và ứng dụng của bạn.
Trong phần đó, bạn cũng sẽ nhấp vào một nút cho nhà cung cấp Gemini API mà bạn chọn để xem nội dung dành riêng cho nhà cung cấp trên trang này.

Bạn có thể yêu cầu mô hình Gemini tạo văn bản bằng cách đưa ra lời nhắc bằng văn bản và tệp PDF, cung cấp mimeType của từng tệp đầu vào và chính tệp đó. Tìm các yêu cầu và đề xuất đối với tệp đầu vào ở phần sau của trang này.

Swift

Bạn có thể gọi generateContent() để tạo văn bản từ dữ liệu đầu vào đa phương thức gồm văn bản và tệp PDF.


import FirebaseAI

// Initialize the Gemini Developer API backend service
let ai = FirebaseAI.firebaseAI(backend: .googleAI())

// Create a `GenerativeModel` instance with a model that supports your use case
let model = ai.generativeModel(modelName: "gemini-2.5-flash")


// Provide the PDF as `Data` with the appropriate MIME type
let pdf = try InlineDataPart(data: Data(contentsOf: pdfURL), mimeType: "application/pdf")

// Provide a text prompt to include with the PDF file
let prompt = "Summarize the important results in this report."

// To generate text output, call `generateContent` with the PDF file and text prompt
let response = try await model.generateContent(pdf, prompt)

// Print the generated text, handling the case where it might be nil
print(response.text ?? "No text in response.")

Kotlin

Bạn có thể gọi generateContent() để tạo văn bản từ dữ liệu đầu vào đa phương thức gồm văn bản và tệp PDF.

^{Đối với Kotlin, các phương thức trong SDK này là hàm tạm ngưng và cần được gọi qua Phạm vi Coroutine.}


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
val model = Firebase.ai(backend = GenerativeBackend.googleAI())
                        .generativeModel("gemini-2.5-flash")


val contentResolver = applicationContext.contentResolver

// Provide the URI for the PDF file you want to send to the model
val inputStream = contentResolver.openInputStream(pdfUri)

if (inputStream != null) {  // Check if the PDF file loaded successfully
    inputStream.use { stream ->
        // Provide a prompt that includes the PDF file specified above and text
        val prompt = content {
            inlineData(
                bytes = stream.readBytes(),
                mimeType = "application/pdf" // Specify the appropriate PDF file MIME type
            )
            text("Summarize the important results in this report.")
        }

        // To generate text output, call `generateContent` with the prompt
        val response = generativeModel.generateContent(prompt)

        // Log the generated text, handling the case where it might be null
        Log.d(TAG, response.text ?: "")
    }
} else {
    Log.e(TAG, "Error getting input stream for file.")
    // Handle the error appropriately
}

Java

Bạn có thể gọi generateContent() để tạo văn bản từ dữ liệu đầu vào đa phương thức gồm văn bản và tệp PDF.

^{Đối với Java, các phương thức trong SDK này sẽ trả về ListenableFuture.}


// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
GenerativeModel ai = FirebaseAI.getInstance(GenerativeBackend.googleAI())
        .generativeModel("gemini-2.5-flash");

// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures model = GenerativeModelFutures.from(ai);


ContentResolver resolver = getApplicationContext().getContentResolver();

// Provide the URI for the PDF file you want to send to the model
try (InputStream stream = resolver.openInputStream(pdfUri)) {
    if (stream != null) {
        byte[] audioBytes = stream.readAllBytes();
        stream.close();

        // Provide a prompt that includes the PDF file specified above and text
        Content prompt = new Content.Builder()
              .addInlineData(audioBytes, "application/pdf")  // Specify the appropriate PDF file MIME type
              .addText("Summarize the important results in this report.")
              .build();

        // To generate text output, call `generateContent` with the prompt
        ListenableFuture<GenerateContentResponse> response = model.generateContent(prompt);
        Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
            @Override
            public void onSuccess(GenerateContentResponse result) {
                String text = result.getText();
                Log.d(TAG, (text == null) ? "" : text);
            }
            @Override
            public void onFailure(Throwable t) {
                Log.e(TAG, "Failed to generate a response", t);
            }
        }, executor);
    } else {
        Log.e(TAG, "Error getting input stream for file.");
        // Handle the error appropriately
    }
} catch (IOException e) {
    Log.e(TAG, "Failed to read the pdf file", e);
} catch (URISyntaxException e) {
    Log.e(TAG, "Invalid pdf file", e);
}

Web

Bạn có thể gọi generateContent() để tạo văn bản từ dữ liệu đầu vào đa phương thức gồm văn bản và tệp PDF.


import { initializeApp } from "firebase/app";
import { getAI, getGenerativeModel, GoogleAIBackend } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `GenerativeModel` instance with a model that supports your use case
const model = getGenerativeModel(ai, { model: "gemini-2.5-flash" });


// Converts a File object to a Part object.
async function fileToGenerativePart(file) {
  const base64EncodedDataPromise = new Promise((resolve) => {
    const reader = new FileReader();
    reader.onloadend = () => resolve(reader.result.split(','));
    reader.readAsDataURL(file);
  });
  return {
    inlineData: { data: await base64EncodedDataPromise, mimeType: file.type },
  };
}

async function run() {
  // Provide a text prompt to include with the PDF file
  const prompt = "Summarize the important results in this report.";

  // Prepare PDF file for input
  const fileInputEl = document.querySelector("input[type=file]");
  const pdfPart = await fileToGenerativePart(fileInputEl.files);

  // To generate text output, call `generateContent` with the text and PDF file
  const result = await model.generateContent([prompt, pdfPart]);

  // Log the generated text, handling the case where it might be undefined
  console.log(result.response.text() ?? "No text in response.");
}

run();

Dart

Bạn có thể gọi generateContent() để tạo văn bản từ dữ liệu đầu vào đa phương thức gồm văn bản và tệp PDF.


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

// Initialize FirebaseApp
await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `GenerativeModel` instance with a model that supports your use case
final model =
      FirebaseAI.googleAI().generativeModel(model: 'gemini-2.5-flash');


// Provide a text prompt to include with the PDF file
final prompt = TextPart("Summarize the important results in this report.");

// Prepare the PDF file for input
final doc = await File('document0.pdf').readAsBytes();

// Provide the PDF file as `Data` with the appropriate PDF file MIME type
final docPart = InlineDataPart('application/pdf', doc);

// To generate text output, call `generateContent` with the text and PDF file
final response = await model.generateContent([
  Content.multi([prompt,docPart])
]);

// Print the generated text
print(response.text);

Unity

Bạn có thể gọi GenerateContentAsync() để tạo văn bản từ dữ liệu đầu vào đa phương thức gồm văn bản và tệp PDF.


using Firebase;
using Firebase.AI;

// Initialize the Gemini Developer API backend service
var ai = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());

// Create a `GenerativeModel` instance with a model that supports your use case
var model = ai.GetGenerativeModel(modelName: "gemini-2.5-flash");


// Provide a text prompt to include with the PDF file
var prompt = ModelContent.Text("Summarize the important results in this report.");

// Provide the PDF file as `data` with the appropriate PDF file MIME type
var doc = ModelContent.InlineData("application/pdf",
      System.IO.File.ReadAllBytes(System.IO.Path.Combine(
        UnityEngine.Application.streamingAssetsPath, "document0.pdf")));

// To generate text output, call `GenerateContentAsync` with the text and PDF file
var response = await model.GenerateContentAsync(new [] { prompt, doc });

// Print the generated text
UnityEngine.Debug.Log(response.Text ?? "No text in response.");

Tìm hiểu cách chọn một mô hình phù hợp với trường hợp sử dụng và ứng dụng của bạn.

Hiện câu trả lời theo thời gian thực

Bạn có thể đạt được các lượt tương tác nhanh hơn bằng cách không đợi toàn bộ kết quả từ quá trình tạo mô hình mà thay vào đó, hãy sử dụng tính năng truyền phát trực tiếp để xử lý kết quả một phần. Để truyền trực tuyến câu trả lời, hãy gọi generateContentStream.