Collection Indexes

To prepare your application for production scale and performance, you need to create Collection Indexes. Without an Index, ANN searches are slow because they perform a brute-force scan. Creating an Index makes searches against the indexed vector fields very fast.

Choosing a distance metric

Choosing the right distance metric for your Index is crucial for achieving accurate and relevant similarity search results. The optimal choice depends primarily on the characteristics of your vector embeddings and the nature of your data. The most critical rule is to use the distance metric that your embedding model was trained on. Embedding models are optimized to produce vector representations where similarity is best captured by a specific distance calculation. Using a different metric can lead to suboptimal or incorrect search results.

Check your embedding model's documentation:This is the most reliable way to determine the intended distance metric.
Consider your use case:For finding semantically similar text or images, Cosine Similarity is often the best choice. If the "strength" or "intensity" represented by the vector's magnitude is important, consider L2 Distance.
Analyze your vectors:Determine if your vectors are normalized. If they are, you can use either Cosine Similarity or Dot Product and expect similar ranking.

By carefully considering these factors, you can select the most appropriate distance metric for your Index, leading to more accurate and meaningful similarity search results.

Creating an ANN Index

You can create an ANN Index on a specific embedding field. By default, all Data Object string, numeric, and boolean fields are pushed down to the Index to allow for inline filtering.

To optimize compute costs, you can specify exactly which fields should be filterable ( filter_fields ) and which should just be stored as payload only ( store_fields ).

The following example demonstrates how to create an Index, plot_index , in the Collection movies .

  curl -X POST \ 'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/movies/indexes?indexId=plot_index' \ 
 -H 'Bearer $(gcloud auth print-access-token)' \ 
 -H 'Content-Type: application/json' \ 
 -d '{ \ 
 "index_field": "plot_embedding", \ 
 "filter_fields": [ \ 
 "year", \ 
 "genre" \ 
 ], \ 
 "store_fields": [ \ 
 "title" \ 
 ] \ 
 }'

In the example, the request specifies that year and genre are filterable (passed as filter fields to the index), and the payload field title is non-filterable.

Getting an Index

The following demonstrates how to get an existing Index, plot_index , stored in the Collection movies .

  curl -X GET \ 
 'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/movies/indexes/plot_index' \ 
 -H 'Bearer $(gcloud auth print-access-token)' \ 
 -H 'Content-Type: application/json'

Listing Indexes

The following example demonstrates how to list all Indexes in the Collection movies .

  curl -X GET  \ 
 'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/movies/indexes' \ 
 -H 'Bearer $(gcloud auth print-access-token)' \ 
 -H 'Content-Type: application/json'

Deleting an Index

The following example demonstrates how to delete an existing Index, plot_index , from the Collection movies .

  curl -X DELETE  \ 
 'https://vectorsearch.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/collections/movies/indexes/plot_index'  \ 
 -H 'Bearer $(gcloud auth print-access-token)' \ 
 -H 'Content-Type: application/json'

What's next?

Learn how to query Data Objects.
Learn how to search for Data Objects using semantic search or hybrid search.