Description
Performs a nearest neighbor vector search on the given embedding field using the
requested distance_measure
Syntax
Node.js
const results = await db.pipeline()
.collection("cities")
.findNearest({
field: 'embedding',
vectorValue: vector([1.5, 2.345]),
distanceMeasure: 'euclidean',
})
.execute();
Client examples
Node.js
const results = await db.pipeline() .collection("cities") .findNearest({ field: "embedding", vectorValue: [1.5, 2.345], distanceMeasure: "euclidean" }) .execute();
Web
const results = await execute(db.pipeline() .collection("cities") .findNearest({ field: "embedding", vectorValue: [1.5, 2.345], distanceMeasure: "euclidean" }));
Swift
let results = try await db.pipeline() .collection("cities") .findNearest( field: Field("embedding"), vectorValue: VectorValue([1.5, 2.345]), distanceMeasure: .euclidean ) .execute()
Kotlin
val results = db.pipeline() .collection("cities") .findNearest( "embedding", FieldValue.vector(doubleArrayOf(1.5, 2.345)), FindNearestStage.DistanceMeasure.EUCLIDEAN ) .execute()
Java
Task<Pipeline.Snapshot> results = db.pipeline() .collection("cities") .findNearest( "embedding", new double[] {1.5, 2.345}, FindNearestStage.DistanceMeasure.EUCLIDEAN ) .execute();
Python
from google.cloud.firestore_v1.vector import Vector from google.cloud.firestore_v1.base_vector_query import DistanceMeasure results = ( client.pipeline() .collection("cities") .find_nearest( field="embedding", vector_value=Vector([1.5, 2.345]), distance_measure=DistanceMeasure.EUCLIDEAN, ) .execute() )
Java
Pipeline.Snapshot results = firestore .pipeline() .collection("cities") .findNearest( "embedding", new double[] {1.5, 2.345}, FindNearest.DistanceMeasure.EUCLIDEAN, new FindNearestOptions()) .execute() .get();
Behavior
Distance Measure
The find_nearest stage supports the following options for vector distance:
euclidean: Measures theeuclideandistance between the vectors. To learn more, see Euclidean.cosine: Compares vectors based on the angle between them which lets you measure similarity that isn't based on the vectors magnitude. We recommend usingdot_productwith unit normalized vectors instead of COSINE distance, which is mathematically equivalent with better performance. To learn more see Cosine similarity.dot_product: Similar tocosinebut is affected by the magnitude of the vectors. To learn more, see Dot product.
Choose the distance measure
Depending on whether or not all your vector embeddings are normalized, you can determine which distance measure to use to find the distance measure. A normalized vector embedding has a magnitude (length) of exactly 1.0.
In addition, if you know which distance measure your model was trained with, use that distance measure to compute the distance between your vector embeddings.
Normalized data
If you have a dataset where all vector embeddings are normalized, then all three
distance measures provide the same semantic search results. In essence, although each
distance measure returns a different value, those values sort the same way. When
embeddings are normalized, dot_product is usually the most computationally
efficient, but the difference is negligible in most cases. However, if your
application is highly performance sensitive, dot_product might help with
performance tuning.
Non-normalized data
If you have a dataset where vector embeddings aren't normalized,
then it's not mathematically correct to use dot_product as a distance
measure because dot product doesn't measure distance. Depending
on how the embeddings were generated and what type of search is preferred,
either the cosine or euclidean distance measure produces
search results that are subjectively better than the other distance measures.
Experimentation with either cosine or euclidean might
be necessary to determine which is best for your use case.
Unsure if data is normalized or non-normalized
If you're unsure whether or not your data is normalized and you want to use
dot_product, we recommend that you use cosine instead.
cosine is like dot_product with normalization built in.
Distance measured using cosine ranges from 0 to 2. A result
that is close to 0 indicates the vectors are very similar.
Limit the results
You can limit the number of documents returned by the query by setting the limit field.
Node.js
const results = await db.pipeline()
.collection("cities")
.findNearest({
field: 'embedding',
vectorValue: vector([1.5, 2.345]),
distanceMeasure: 'euclidean',
limit: 10,
})
.execute();
Retrieving the Calculated Vector Distance
You can retrieve the calculated vector distance by assigning a
distance_field output property name on the find_nearest stage, as shown in
the following example:
As an example, for the following collection:
Node.js
await db.collection('cities').doc('SF').set({name: 'San Francisco', embedding: vector([1.0, -1.0])});
await db.collection('cities').doc('TO').set({name: 'Toronto', embedding: vector([5.0, -10.0])});
await db.collection('cities').doc('AT').set({name: 'Atlantis', embedding: vector([2.0, -4.0])});
Perform a vector search with a requested output distance_field:
Node.js
const results = await db.pipeline()
.collection("cities")
.findNearest({
field: 'embedding',
vectorValue: vector([1.3, 2.345]),
distanceMeasure: 'euclidean',
distanceField: 'computedDistance',
})
.execute();
Which produces the following documents:
{name: 'San Francisco', embedding: vector([1.0, -1.0]), computedDistance: 3.3584259705999178},
{name: 'Atlantis', embedding: vector([2.0, -4.0]), computedDistance: 6.383496299051172},
{name: 'Toronto', embedding: vector([5.0, -10.0]), computedDistance: 12.887553103673328}
Limitations
As you work with vector embeddings, note the following limitation:
- The maximum supported embedding dimension is 2048. To store larger indexes, use dimensionality reduction.