Sample

Description

Returns a non-deterministic sample from the results of the previous stage.

There are two supported modes:

  • documents: pick n documents randomly.
  • percent: pick n percent of documents randomly.

Examples

Web

let results;

// Get a sample of 100 documents in a database
results = await execute(db.pipeline()
  .database()
  .sample(100)
);

// Randomly shuffle a list of 3 documents
results = await execute(db.pipeline()
  .documents([
    doc(db, "cities", "SF"),
    doc(db, "cities", "NY"),
    doc(db, "cities", "DC"),
  ])
  .sample(3)
);
Swift
var results: Pipeline.Snapshot

// Get a sample of 100 documents in a database
results = try await db.pipeline()
  .database()
  .sample(count: 100)
  .execute()

// Randomly shuffle a list of 3 documents
results = try await db.pipeline()
  .documents([
    db.collection("cities").document("SF"),
    db.collection("cities").document("NY"),
    db.collection("cities").document("DC"),
  ])
  .sample(count: 3)
  .execute()

Kotlin

var results: Task<Pipeline.Snapshot>

// Get a sample of 100 documents in a database
results = db.pipeline()
    .database()
    .sample(100)
    .execute()

// Randomly shuffle a list of 3 documents
results = db.pipeline()
    .documents(
        db.collection("cities").document("SF"),
        db.collection("cities").document("NY"),
        db.collection("cities").document("DC")
    )
    .sample(3)
    .execute()

Java

Task<Pipeline.Snapshot> results;

// Get a sample of 100 documents in a database
results = db.pipeline()
    .database()
    .sample(100)
    .execute();

// Randomly shuffle a list of 3 documents
results = db.pipeline()
    .documents(
        db.collection("cities").document("SF"),
        db.collection("cities").document("NY"),
        db.collection("cities").document("DC")
    )
    .sample(3)
    .execute();
Python
# Get a sample of 100 documents in a database
results = client.pipeline().database().sample(100).execute()

# Randomly shuffle a list of 3 documents
results = (
    client.pipeline()
    .documents(
        client.collection("cities").document("SF"),
        client.collection("cities").document("NY"),
        client.collection("cities").document("DC"),
    )
    .sample(3)
    .execute()
)
Java
// Get a sample of 100 documents in a database
Pipeline.Snapshot results1 = firestore.pipeline().database().sample(100).execute().get();

// Randomly shuffle a list of 3 documents
Pipeline.Snapshot results2 =
    firestore
        .pipeline()
        .documents(
            firestore.collection("cities").document("SF"),
            firestore.collection("cities").document("NY"),
            firestore.collection("cities").document("DC"))
        .sample(3)
        .execute()
        .get();

Modes

Documents Mode

The documents mode picks up to n documents randomly from its input, where each document (along with the order of documents) is equally as likely to be chosen. To achieve this, Cloud Firestore needs to still scan and process all documents so this can still end up being an expensive operation.

For example, for the following collection:

Node.js

await db.collection("cities").doc("SF").set({name: "San Francsico", state: "California"});
await db.collection("cities").doc("NYC").set({name: "New York City", state: "New York"});
await db.collection("cities").doc("CHI").set({name: "Chicago", state: "Illinois"});

The sample stage in document mode can be used to retrieve a non-deterministic subset of results from this collection.

Node.js

const sampled = await db.pipeline()
    .collection("/cities")
    .sample(1)
    .execute();

In this example, only 1 document at random would be returned at random.

  { name: "New York City", state: "New York" }

If the supplied number is greater than the total number of documents returned, all documents are returned in a random order.

Node.js

const sampled = await db.pipeline()
    .collection("/cities")
    .sample(5)
    .execute();

This will result in the following documents:

  { name: "New York City", state: "New York" }
  { name: "Chicago", state: "Illinois" }
  { name: "San Francisco", state: "California" }

Percent Mode

The percent mode attempts to pick n percent of all documents from its input. This results in the stage producing approximately # documents * percent / 100 documents. Just like in documents mode, Cloud Firestore ensures that each document is equally as likely to be returned. This does require that Cloud Firestore needs to scan and process all documents so this can still end up being an expensive operation, even when the result set is small.

Unlike documents mode, the order here is not random and instead preserves the pre-existing document order. This percent input must be a double value between 0.0 and 1.0.

For example, for the following collection:

Node.js

await db.collection("cities").doc("SF").set({name: "San Francsico", state: "California"});
await db.collection("cities").doc("NYC").set({name: "New York City", state: "New York"});
await db.collection("cities").doc("CHI").set({name: "Chicago", state: "Illinois"});
await db.collection("cities").doc("ATL").set({name: "Atlanta", state: "Georgia"});

The sample stage in percent mode can be used to retrieve (on average) 50% of the documents from the collection stage.

Node.js

  const sampled = await db.pipeline()
    .collection("/cities")
    .sample({ percent: 0.5 })
    .execute();

This will result in a non-deterministic sample of (on average) 50% of documents from the cities collection. The following is one possible output.

  { name: "New York City", state: "New York" }
  { name: "Chicago", state: "Illinois" }

In percent mode, because each document has the same probability of being selected, it is possible for no documents or all documents to be returned.

Client examples

Web

// Get a sample of on average 50% of the documents in the database
const results = await execute(db.pipeline()
  .database()
  .sample({ percentage: 0.5 })
);
Swift
// Get a sample of on average 50% of the documents in the database
let results = try await db.pipeline()
  .database()
  .sample(percentage: 0.5)
  .execute()

Kotlin

// Get a sample of on average 50% of the documents in the database
val results = db.pipeline()
    .database()
    .sample(SampleStage.withPercentage(0.5))
    .execute()

Java

// Get a sample of on average 50% of the documents in the database
Task<Pipeline.Snapshot> results = db.pipeline()
    .database()
    .sample(SampleStage.withPercentage(0.5))
    .execute();
Python
from google.cloud.firestore_v1.pipeline_stages import SampleOptions

# Get a sample of on average 50% of the documents in the database
results = (
    client.pipeline().database().sample(SampleOptions.percentage(0.5)).execute()
)
Java
// Get a sample of on average 50% of the documents in the database
Pipeline.Snapshot results =
    firestore.pipeline().database().sample(Sample.withPercentage(0.5)).execute().get();