Description
Returns a non-deterministic sample from the results of the previous stage.
There are two supported modes:
documents: pickndocuments randomly.percent: picknpercent of documents randomly.
Examples
Web
let results; // Get a sample of 100 documents in a database results = await execute(db.pipeline() .database() .sample(100) ); // Randomly shuffle a list of 3 documents results = await execute(db.pipeline() .documents([ doc(db, "cities", "SF"), doc(db, "cities", "NY"), doc(db, "cities", "DC"), ]) .sample(3) );
Swift
var results: Pipeline.Snapshot // Get a sample of 100 documents in a database results = try await db.pipeline() .database() .sample(count: 100) .execute() // Randomly shuffle a list of 3 documents results = try await db.pipeline() .documents([ db.collection("cities").document("SF"), db.collection("cities").document("NY"), db.collection("cities").document("DC"), ]) .sample(count: 3) .execute()
Kotlin
var results: Task<Pipeline.Snapshot> // Get a sample of 100 documents in a database results = db.pipeline() .database() .sample(100) .execute() // Randomly shuffle a list of 3 documents results = db.pipeline() .documents( db.collection("cities").document("SF"), db.collection("cities").document("NY"), db.collection("cities").document("DC") ) .sample(3) .execute()
Java
Task<Pipeline.Snapshot> results; // Get a sample of 100 documents in a database results = db.pipeline() .database() .sample(100) .execute(); // Randomly shuffle a list of 3 documents results = db.pipeline() .documents( db.collection("cities").document("SF"), db.collection("cities").document("NY"), db.collection("cities").document("DC") ) .sample(3) .execute();
Python
# Get a sample of 100 documents in a database results = client.pipeline().database().sample(100).execute() # Randomly shuffle a list of 3 documents results = ( client.pipeline() .documents( client.collection("cities").document("SF"), client.collection("cities").document("NY"), client.collection("cities").document("DC"), ) .sample(3) .execute() )
Java
// Get a sample of 100 documents in a database Pipeline.Snapshot results1 = firestore.pipeline().database().sample(100).execute().get(); // Randomly shuffle a list of 3 documents Pipeline.Snapshot results2 = firestore .pipeline() .documents( firestore.collection("cities").document("SF"), firestore.collection("cities").document("NY"), firestore.collection("cities").document("DC")) .sample(3) .execute() .get();
Modes
Documents Mode
The documents mode picks up to n documents randomly from its input, where
each document (along with the order of documents) is equally as likely to be
chosen. To achieve this, Cloud Firestore needs to still scan and
process all documents so this can still end up being an expensive operation.
For example, for the following collection:
Node.js
await db.collection("cities").doc("SF").set({name: "San Francsico", state: "California"});
await db.collection("cities").doc("NYC").set({name: "New York City", state: "New York"});
await db.collection("cities").doc("CHI").set({name: "Chicago", state: "Illinois"});
The sample stage in document mode can be used to retrieve a non-deterministic subset of results from this collection.
Node.js
const sampled = await db.pipeline()
.collection("/cities")
.sample(1)
.execute();
In this example, only 1 document at random would be returned at random.
{ name: "New York City", state: "New York" }
If the supplied number is greater than the total number of documents returned, all documents are returned in a random order.
Node.js
const sampled = await db.pipeline()
.collection("/cities")
.sample(5)
.execute();
This will result in the following documents:
{ name: "New York City", state: "New York" }
{ name: "Chicago", state: "Illinois" }
{ name: "San Francisco", state: "California" }
Percent Mode
The percent mode attempts to pick n percent of all documents from its input.
This results in the stage producing approximately # documents * percent / 100
documents. Just like in documents mode, Cloud Firestore ensures that
each document is equally as likely to be returned. This does require that
Cloud Firestore needs to scan and process all documents so this can
still end up being an expensive operation, even when the result set is small.
Unlike documents mode, the order here is not random and instead preserves the
pre-existing document order. This percent input must be a double value between
0.0 and 1.0.
For example, for the following collection:
Node.js
await db.collection("cities").doc("SF").set({name: "San Francsico", state: "California"});
await db.collection("cities").doc("NYC").set({name: "New York City", state: "New York"});
await db.collection("cities").doc("CHI").set({name: "Chicago", state: "Illinois"});
await db.collection("cities").doc("ATL").set({name: "Atlanta", state: "Georgia"});
The sample stage in percent mode can be used to retrieve (on average) 50% of the documents from the collection stage.
Node.js
const sampled = await db.pipeline()
.collection("/cities")
.sample({ percent: 0.5 })
.execute();
This will result in a non-deterministic sample of (on average) 50% of documents
from the cities collection. The following is one possible output.
{ name: "New York City", state: "New York" }
{ name: "Chicago", state: "Illinois" }
In percent mode, because each document has the same probability of being selected, it is possible for no documents or all documents to be returned.
Client examples
Web
// Get a sample of on average 50% of the documents in the database const results = await execute(db.pipeline() .database() .sample({ percentage: 0.5 }) );
Swift
// Get a sample of on average 50% of the documents in the database let results = try await db.pipeline() .database() .sample(percentage: 0.5) .execute()
Kotlin
// Get a sample of on average 50% of the documents in the database val results = db.pipeline() .database() .sample(SampleStage.withPercentage(0.5)) .execute()
Java
// Get a sample of on average 50% of the documents in the database Task<Pipeline.Snapshot> results = db.pipeline() .database() .sample(SampleStage.withPercentage(0.5)) .execute();
Python
from google.cloud.firestore_v1.pipeline_stages import SampleOptions # Get a sample of on average 50% of the documents in the database results = ( client.pipeline().database().sample(SampleOptions.percentage(0.5)).execute() )
Java
// Get a sample of on average 50% of the documents in the database Pipeline.Snapshot results = firestore.pipeline().database().sample(Sample.withPercentage(0.5)).execute().get();