To configure indexes for similarity searches, you need to configure the following fields.
For instructions on how to configure an index, see Configure index parameters .
NearestNeighborSearch
Fields | |
---|---|
contentsDeltaUri
|
Allows inserting, updating or deleting the contents of the
Vector Search If you set this field when calling |
isCompleteOverwrite
|
If this field is set together with |
config
|
The configuration of the Vector Search |
NearestNeighborSearchConfig
dimensions
int32
Required. The number of dimensions of the input vectors. Used for dense embeddings only.
approximateNeighborsCount
int32
Required if tree-AH algorithm is used.
The default number of neighbors to find through approximate search before exact reordering is performed. Exact reordering is a procedure where results returned by an approximate search algorithm are reordered using a more expensive distance computation.
ShardSize
ShardSize
The size of each shard. When an index is large, it is sharded based on the specified shard size. During serving, each shard is served on a separate node and scales independently.
distanceMeasureType
The distance measure used in nearest neighbor search.
featureNormType
Type of normalization to be carried out on each vector.
algorithmConfig
oneOf:
The configuration for the algorithms that Vector Search uses for efficient search. Used for dense embeddings only.
-
TreeAhConfig
: Configuration options for using the tree-AH algorithm. For more information, see this blog Scaling deep retrieval with TensorFlow Recommenders and Vector Search -
BruteForceConfig
: This option implements the standard linear search in the database for each query. There are no fields to configure for a brute force search. To select this algorithm, pass an empty object forBruteForceConfig
.
DistanceMeasureType
Enums | |
---|---|
SQUARED_L2_DISTANCE
|
Euclidean (L 2 ) Distance |
L1_DISTANCE
|
Manhattan (L 1 ) Distance |
DOT_PRODUCT_DISTANCE
|
Default value. Defined as a negative of the dot product. |
COSINE_DISTANCE
|
Cosine Distance. We strongly suggest using DOT_PRODUCT_DISTANCE + UNIT_L2_NORM instead of the COSINE distance. Our algorithms have been more optimized for the DOT_PRODUCT distance, and when combined with UNIT_L2_NORM, it offers the same ranking and mathematical equivalence as the COSINE distance. |
ShardSize
Enums | |
---|---|
SHARD_SIZE_SMALL
|
2 GiB per shard |
SHADE_SIZE_MEDIUM
|
20 GiB per shard |
SHADE_SIZE_LARGE
|
50 GiB per shard |
FeatureNormType
Enums | |
---|---|
UNIT_L2_NORM
|
Unit L2 normalization type. |
NONE
|
Default value. No normalization type is specified. |
TreeAhConfig
These are the fields to select for the tree-AH algorithm.
Fields | |
---|---|
fractionLeafNodesToSearch
|
double
|
The default fraction of leaf nodes that any query may be searched. Must be in range 0.0 - 1.0, exclusive. The default value is 0.05 if not set. | |
leafNodeEmbeddingCount
|
int32
|
Number of embeddings on each leaf node. The default value is 1000 if not set. | |
leafNodesToSearchPercent
|
int32
|
Deprecated, use fractionLeafNodesToSearch
.The default percentage of leaf nodes that any query may be searched. Must be in range 1-100, inclusive. The default value is 10 (means 10%) if not set. |
BruteForceConfig
This option implements the standard linear search in the database for
each query. There are no fields to configure for a brute force search.
To select this algorithm, pass an empty object for BruteForceConfig
to algorithmConfig
.