Find nearest

Description

Performs a nearest neighbor vector search on the given embedding field using the requested distance_measure

Syntax

Node.js

const results = await db.pipeline()
  .collection("cities")
  .findNearest({
      field: 'embedding',
      vectorValue: vector([1.5, 2.345]),
      distanceMeasure: 'euclidean',
  })
  .execute();

Client examples

Node.js
const results = await db.pipeline()
  .collection("cities")
  .findNearest({
      field: "embedding",
      vectorValue: [1.5, 2.345],
      distanceMeasure: "euclidean"
  })
  .execute();

Web

const results = await execute(db.pipeline()
  .collection("cities")
  .findNearest({
      field: "embedding",
      vectorValue: [1.5, 2.345],
      distanceMeasure: "euclidean"
  }));
Swift
let results = try await db.pipeline()
  .collection("cities")
  .findNearest(
    field: Field("embedding"),
    vectorValue: VectorValue([1.5, 2.345]),
    distanceMeasure: .euclidean
  )
  .execute()

Kotlin

val results = db.pipeline()
    .collection("cities")
    .findNearest(
        "embedding",
        FieldValue.vector(doubleArrayOf(1.5, 2.345)),
        FindNearestStage.DistanceMeasure.EUCLIDEAN
    )
    .execute()

Java

Task<Pipeline.Snapshot> results = db.pipeline()
        .collection("cities")
        .findNearest(
            "embedding",
            new double[] {1.5, 2.345},
            FindNearestStage.DistanceMeasure.EUCLIDEAN
        )
        .execute();
Python
from google.cloud.firestore_v1.vector import Vector
from google.cloud.firestore_v1.base_vector_query import DistanceMeasure

results = (
    client.pipeline()
    .collection("cities")
    .find_nearest(
        field="embedding",
        vector_value=Vector([1.5, 2.345]),
        distance_measure=DistanceMeasure.EUCLIDEAN,
    )
    .execute()
)
Java
Pipeline.Snapshot results =
    firestore
        .pipeline()
        .collection("cities")
        .findNearest(
            "embedding",
            new double[] {1.5, 2.345},
            FindNearest.DistanceMeasure.EUCLIDEAN,
            new FindNearestOptions())
        .execute()
        .get();

Behavior

Distance Measure

The find_nearest stage supports the following options for vector distance:

  • euclidean: Measures the euclidean distance between the vectors. To learn more, see Euclidean.
  • cosine: Compares vectors based on the angle between them which lets you measure similarity that isn't based on the vectors magnitude. We recommend using dot_product with unit normalized vectors instead of COSINE distance, which is mathematically equivalent with better performance. To learn more see Cosine similarity.
  • dot_product: Similar to cosine but is affected by the magnitude of the vectors. To learn more, see Dot product.

Choose the distance measure

Depending on whether or not all your vector embeddings are normalized, you can determine which distance measure to use to find the distance measure. A normalized vector embedding has a magnitude (length) of exactly 1.0.

In addition, if you know which distance measure your model was trained with, use that distance measure to compute the distance between your vector embeddings.

Normalized data

If you have a dataset where all vector embeddings are normalized, then all three distance measures provide the same semantic search results. In essence, although each distance measure returns a different value, those values sort the same way. When embeddings are normalized, dot_product is usually the most computationally efficient, but the difference is negligible in most cases. However, if your application is highly performance sensitive, dot_product might help with performance tuning.

Non-normalized data

If you have a dataset where vector embeddings aren't normalized, then it's not mathematically correct to use dot_product as a distance measure because dot product doesn't measure distance. Depending on how the embeddings were generated and what type of search is preferred, either the cosine or euclidean distance measure produces search results that are subjectively better than the other distance measures. Experimentation with either cosine or euclidean might be necessary to determine which is best for your use case.

Unsure if data is normalized or non-normalized

If you're unsure whether or not your data is normalized and you want to use dot_product, we recommend that you use cosine instead. cosine is like dot_product with normalization built in. Distance measured using cosine ranges from 0 to 2. A result that is close to 0 indicates the vectors are very similar.

Limit the results

You can limit the number of documents returned by the query by setting the limit field.

Node.js

const results = await db.pipeline()
  .collection("cities")
  .findNearest({
      field: 'embedding',
      vectorValue: vector([1.5, 2.345]),
      distanceMeasure: 'euclidean',
      limit: 10,
  })
  .execute();

Retrieving the Calculated Vector Distance

You can retrieve the calculated vector distance by assigning a distance_field output property name on the find_nearest stage, as shown in the following example:

As an example, for the following collection:

Node.js

await db.collection('cities').doc('SF').set({name: 'San Francisco', embedding: vector([1.0, -1.0])});
await db.collection('cities').doc('TO').set({name: 'Toronto', embedding: vector([5.0, -10.0])});
await db.collection('cities').doc('AT').set({name: 'Atlantis', embedding: vector([2.0, -4.0])});

Perform a vector search with a requested output distance_field:

Node.js

const results = await db.pipeline()
  .collection("cities")
  .findNearest({
      field: 'embedding',
      vectorValue: vector([1.3, 2.345]),
      distanceMeasure: 'euclidean',
      distanceField: 'computedDistance',
  })
  .execute();

Which produces the following documents:

{name: 'San Francisco', embedding: vector([1.0, -1.0]), computedDistance: 3.3584259705999178},
{name: 'Atlantis', embedding: vector([2.0, -4.0]), computedDistance: 6.383496299051172},
{name: 'Toronto', embedding: vector([5.0, -10.0]), computedDistance: 12.887553103673328}

Limitations

As you work with vector embeddings, note the following limitation: