This guide provides information about optional metadata for vector embeddings. Vector Search lets you define metadata for each embedding.
Metadata is non-filterable, arbitrary information that Vector Search can store for each embedding. This can provide embeddings with useful context such as:
-
Product details, such as name, price, and an image URL.
-
Descriptions, snippets, dates, and authorship for text embeddings.
-
User information for user embeddings.
-
Coordinates for place embeddings.
Key features and benefits
Features and benefits of using metadata include:
-
Context with results: Information can be provided directly in your search results, which eliminates the need for separate lookups and reduces latency.
-
Flexible structure: Metadata is provided as a JSON object, which allows the metadata to be defined as complex, nested data.
-
Non-Filterable: Vector embedding metadata is for storing and retrieving non-filterable information that's distinct from
restrictsandnumeric_restricts. -
Efficient updates: The
update_maskfield lets you specify that APIs only update metadata to avoid resubmitting embedding vectors. -
Decoupled Information: Non-filterable information can be separated from filterable attributes like
restricts. -
Streamlined development: Search responses include metadata associated with a vector embedding, while reducing the complexity needed for features such as displaying rich search results and performing context-based post-processing.
Data format
An optional embedding_metadata
field holds a JSON object that
flexibly associates rich, non-filterable information with embeddings in
Vector Search. This can streamline applications by returning context
with results and allows efficient metadata-only updates using update_mask
for the upsertDatapoints
API.
Example data point structure:
{
"id"
:
"movie_001"
,
"embedding"
:
[
0.1
,
0.2
,
...
,
0.3
],
"sparse_embedding"
:
{
"values"
:
[
-0.4
,
0.2
,
-1.3
],
"dimensions"
:
[
10
,
20
,
30
]
},
"numeric_restricts"
:
[{
'
na
mespace'
:
'year'
,
'value_i
nt
'
:
2022
}],
"restricts"
:
[{
'
na
mespace'
:
'ge
nre
'
,
'allow'
:
[
'ac
t
io
n
'
,
'comedy'
]}],
#
---
New
embeddi
n
g_me
ta
da
ta
f
ield
---
"embedding_metadata"
:
{
"title"
:
"Ballet Train"
,
"runtime"
:
{
"hours"
:
2
,
"minutes"
:
6
},
"review_info"
:
{
"review"
:
"This movie is fun and..."
,
"rotten_potatoes_rating"
:
76
}
}
#
},
#
...
o
t
her
da
ta
poi
nts
Ingesting data with embedding_metadata
When adding data points, you can include embedding_metadata
when one of the
following actions occurs:
- Uploading a file (Cloud Storage):
- Use JSONor AVRO formats
. CSV isn't supportedfor
embedding_metadata.
- Use JSONor AVRO formats
. CSV isn't supportedfor
- Using the
upsertDatapointsAPI:- Pass data point objects (including
embedding_metadata) in the API request payload.
- Pass data point objects (including
Retrieving embedding_metadata
during queries
When performing a standard nearest-neighbor search using the findNeighbors
API, the embedding_metadata
field for each neighbor is automatically includedin the response if returnFullDatapoint
is set to True
.
curl
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://${PUBLIC_ENDPOINT_DOMAIN}/v1/projects/${PROJECT_ID}/locations/${LOCATION}/indexEndpoints/${INDEX_ENDPOINT_ID}:findNeighbors" \
-d '{deployedIndexId:"${DEPLOYED_INDEX_ID}", "queries":[{datapoint:{"featureVector":"<FEATURE_VECTOR>"}}], returnFullDatapoint:true}'
Updating embedding_metadata
Update metadata using the upsertDatapoints
API and an update_mask
using the value embedding_metadata
. The update_mask
field might also
include additional mask values. For uses of a field mask, see Update embedding metadata
.
The update_mask
field helps to ensure that only embedding_metadata
is updated,
avoiding resubmission of restrict and embedding fields.
The following example demonstrates how to define and update metadata to create a
targeted IndexDatapoint
, specifying update_mask
, and calling upsertDatapoints
.
curl
curl -H "Content-Type: application/json" -H "Authorization: Bearer `gcloud auth print-access-token`" https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/indexes/${INDEX_ID}:upsertDatapoints \
-d '{
datapoints:[
{
datapoint_id: "'${DATAPOINT_ID_1}'",
feature_vector: [...],
embedding_metadata:{"title": "updated title", "rating": 4.5, "tags": ["updated", "reviewed"]
}, update_mask: "embedding_metadata"}'

