This page shows how to use the schema-based metadata search feature in Vertex AI RAG Engine. You can define a metadata schema for a corpus, attach metadata to files within that corpus, and use this metadata to filter contexts during retrieval.
Before you begin
To use metadata search, you must have an existing RAG corpus and RAG files. To learn how to create a corpus and upload files, see Manage your RAG knowledge base .
When you create a corpus or upload a file, the API returns a numeric RAG_CORPUS_ID
and RAG_FILE_ID
(for example, 123456789
). You must have these IDs on hand to manage your metadata resources.
Metadata resources
The metadata search feature uses the following resource hierarchy:
- RagCorpus:The top-level resource that contains your RAG files and their configurations.
- RagDataSchema:A child resource of
RagCorpus. It defines the structure for a single metadata field used within the corpus. Each schema specifies a key (for example,year) and a data type (INTEGER,FLOAT,STRING,DATETIME,BOOLEAN, orLIST). You must define aRagDataSchemafor each metadata key that you intend to use for filtering. - RagFile:A child resource of
RagCorpusrepresenting a single ingested document.- RagMetadata:A child resource of
RagFile. It represents a single key-value metadata pair attached to that specific file (for example,year=2024). The key used in aRagMetadataresource must correspond to a key defined in aRagDataSchemafor the parent corpus. MultipleRagMetadataresources can be attached to a single file.
- RagMetadata:A child resource of
- RagDataSchema:A child resource of
Manage metadata schemas
After you create a RAG corpus, define the metadata structure using the RagDataSchema
resource at the corpus level. You can also view and delete the schemas you define.
Define metadata schemas
When defining a schema, you specify a key and a data type. The valid types a key can be are:
-
INTEGER: Passed in metadata values as anint_value. -
FLOAT: Passed in metadata values as afloat_value. -
STRING: Passed in metadata values as astr_value. -
DATETIME: Passed in metadata values as adatetime_value(RFC 3339 formatted string, for example, "2024-01-01T00:00:00Z"). -
BOOLEAN: Passed in metadata values as abool_value. -
LIST: Passed in metadata values as alist_value.
The following REST sample shows how to batch create metadata schemas that define the year
, and month
fields for a corpus.
REST
Replace the following variables:
- PROJECT_ID : Your project ID.
- LOCATION : The region to process the request.
- RAG_CORPUS_ID : The ID of the RAG corpus.
PROJECT_ID
=
PROJECT_ID
LOCATION
=
LOCATION
RAG_CORPUS_ID
=
RAG_CORPUS_ID
curl
-X
POST
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json"
\
https:// ${
LOCATION
}
-aiplatform.googleapis.com/v1beta1/projects/ ${
PROJECT_ID
}
/locations/ ${
LOCATION
}
/ragCorpora/ ${
RAG_CORPUS_ID
}
/ragDataSchemas:batchCreate
\
-d
'{
"parent": "projects/'
"
${
PROJECT_ID
}
"
'/locations/'
"
${
LOCATION
}
"
'/ragCorpora/'
"
${
RAG_CORPUS_ID
}
"
'",
"requests": [
{
"parent": "projects/'
"
${
PROJECT_ID
}
"
'/locations/'
"
${
LOCATION
}
"
'/ragCorpora/'
"
${
RAG_CORPUS_ID
}
"
'",
"rag_data_schema": {
"key": "year",
"schema_details": {"type": "INTEGER"}
}
},
{
"parent": "projects/'
"
${
PROJECT_ID
}
"
'/locations/'
"
${
LOCATION
}
"
'/ragCorpora/'
"
${
RAG_CORPUS_ID
}
"
'",
"rag_data_schema": {
"key": "month",
"schema_details": {"type": "STRING"}
}
}
]
}'
List metadata schemas
The following REST sample shows how to list the metadata schemas defined for a corpus.
REST
Replace the following variables:
- PROJECT_ID : Your project ID.
- LOCATION : The region to process the request.
- RAG_CORPUS_ID : The ID of the RAG corpus.
PROJECT_ID
=
PROJECT_ID
LOCATION
=
LOCATION
RAG_CORPUS_ID
=
RAG_CORPUS_ID
curl
-X
GET
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
https:// ${
LOCATION
}
-aiplatform.googleapis.com/v1beta1/projects/ ${
PROJECT_ID
}
/locations/ ${
LOCATION
}
/ragCorpora/ ${
RAG_CORPUS_ID
}
/ragDataSchemas
Delete metadata schemas
The following REST sample shows how to batch delete metadata schemas from a corpus.
REST
Replace the following variables:
- PROJECT_ID : Your project ID.
- LOCATION : The region to process the request.
- RAG_CORPUS_ID : The ID of the RAG corpus.
- RAG_DATA_SCHEMA_KEY_1 : The key name of the first metadata schema to delete.
- RAG_DATA_SCHEMA_KEY_2 : The key name of the second metadata schema to delete.
PROJECT_ID
=
PROJECT_ID
LOCATION
=
LOCATION
RAG_CORPUS_ID
=
RAG_CORPUS_ID
RAG_DATA_SCHEMA_KEY_1
=
RAG_DATA_SCHEMA_KEY_1
RAG_DATA_SCHEMA_KEY_2
=
RAG_DATA_SCHEMA_KEY_2
curl
-X
POST
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json"
\
https:// ${
LOCATION
}
-aiplatform.googleapis.com/v1beta1/projects/ ${
PROJECT_ID
}
/locations/ ${
LOCATION
}
/ragCorpora/ ${
RAG_CORPUS_ID
}
/ragDataSchemas:batchDelete
\
-d
'{
"names": [
"projects/'
"
${
PROJECT_ID
}
"
'/locations/'
"
${
LOCATION
}
"
'/ragCorpora/'
"
${
RAG_CORPUS_ID
}
"
'/ragDataSchemas/'
"
${
RAG_DATA_SCHEMA_KEY_1
}
"
'",
"projects/'
"
${
PROJECT_ID
}
"
'/locations/'
"
${
LOCATION
}
"
'/ragCorpora/'
"
${
RAG_CORPUS_ID
}
"
'/ragDataSchemas/'
"
${
RAG_DATA_SCHEMA_KEY_2
}
"
'"
]
}'
Manage file metadata
After you upload or import files to your corpus, you can attach, update, view, and delete metadata associated with those files.
Attach metadata to a file
The attached metadata must conform to the schema defined for the corpus. The following REST sample shows how to attach multiple metadata values to a file.
REST
Replace the following variables:
- PROJECT_ID : Your project ID.
- LOCATION : The region to process the request.
- RAG_CORPUS_ID : The ID of the RAG corpus.
- RAG_FILE_ID : The ID of the RAG file.
PROJECT_ID
=
PROJECT_ID
LOCATION
=
LOCATION
RAG_CORPUS_ID
=
RAG_CORPUS_ID
RAG_FILE_ID
=
RAG_FILE_ID
curl
-X
POST
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json"
\
https:// ${
LOCATION
}
-aiplatform.googleapis.com/v1beta1/projects/ ${
PROJECT_ID
}
/locations/ ${
LOCATION
}
/ragCorpora/ ${
RAG_CORPUS_ID
}
/ragFiles/ ${
RAG_FILE_ID
}
/ragMetadata:batchCreate
\
-d
'{
"parent": "projects/'
"
${
PROJECT_ID
}
"
'/locations/'
"
${
LOCATION
}
"
'/ragCorpora/'
"
${
RAG_CORPUS_ID
}
"
'/ragFiles/'
"
${
RAG_FILE_ID
}
"
'",
"requests": [
{
"parent": "projects/'
"
${
PROJECT_ID
}
"
'/locations/'
"
${
LOCATION
}
"
'/ragCorpora/'
"
${
RAG_CORPUS_ID
}
"
'/ragFiles/'
"
${
RAG_FILE_ID
}
"
'",
"rag_metadata": {
"user_specified_metadata": { "key": "year", "value": { "int_value": 2024 } }
}
},
{
"parent": "projects/'
"
${
PROJECT_ID
}
"
'/locations/'
"
${
LOCATION
}
"
'/ragCorpora/'
"
${
RAG_CORPUS_ID
}
"
'/ragFiles/'
"
${
RAG_FILE_ID
}
"
'",
"rag_metadata": {
"user_specified_metadata": { "key": "month", "value": { "str_value": "May" } }
}
}
]
}'
List metadata for a file
The following REST sample shows how to list the metadata attached to a specific file.
REST
Replace the following variables:
- PROJECT_ID : Your project ID.
- LOCATION : The region to process the request.
- RAG_CORPUS_ID : The ID of the RAG corpus.
- RAG_FILE_ID : The ID of the RAG file.
PROJECT_ID
=
PROJECT_ID
LOCATION
=
LOCATION
RAG_CORPUS_ID
=
RAG_CORPUS_ID
RAG_FILE_ID
=
RAG_FILE_ID
curl
-X
GET
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
https:// ${
LOCATION
}
-aiplatform.googleapis.com/v1beta1/projects/ ${
PROJECT_ID
}
/locations/ ${
LOCATION
}
/ragCorpora/ ${
RAG_CORPUS_ID
}
/ragFiles/ ${
RAG_FILE_ID
}
/ragMetadata
Update metadata
You can update the metadata values attached to a file. The following REST sample shows how to update the metadata value associated with the year
key. The RAG_METADATA_KEY
is the string used when you created the metadata.
REST
Replace the following variables:
- PROJECT_ID : Your project ID.
- LOCATION : The region to process the request.
- RAG_CORPUS_ID : The ID of the RAG corpus.
- RAG_FILE_ID : The ID of the RAG file.
- RAG_METADATA_KEY : The key name of the metadata to update.
PROJECT_ID
=
PROJECT_ID
LOCATION
=
LOCATION
RAG_CORPUS_ID
=
RAG_CORPUS_ID
RAG_FILE_ID
=
RAG_FILE_ID
RAG_METADATA_KEY
=
RAG_METADATA_KEY
curl
-X
PATCH
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json"
\
https:// ${
LOCATION
}
-aiplatform.googleapis.com/v1beta1/projects/ ${
PROJECT_ID
}
/locations/ ${
LOCATION
}
/ragCorpora/ ${
RAG_CORPUS_ID
}
/ragFiles/ ${
RAG_FILE_ID
}
/ragMetadata/ ${
RAG_METADATA_KEY
}
\
-d
'{
"user_specified_metadata": { "key": "'
"
${
RAG_METADATA_KEY
}
"
'", "value": { "int_value": 2025 } }
}'
Delete metadata from a file
The following REST sample shows how to batch delete metadata values from a file.
REST
Replace the following variables:
- PROJECT_ID : Your project ID.
- LOCATION : The region to process the request.
- RAG_CORPUS_ID : The ID of the RAG corpus.
- RAG_FILE_ID : The ID of the RAG file.
- RAG_METADATA_KEY_1 : The key name of the first metadata value to delete.
- RAG_METADATA_KEY_2 : The key name of the second metadata value to delete.
PROJECT_ID
=
PROJECT_ID
LOCATION
=
LOCATION
RAG_CORPUS_ID
=
RAG_CORPUS_ID
RAG_FILE_ID
=
RAG_FILE_ID
RAG_METADATA_KEY_1
=
RAG_METADATA_KEY_1
RAG_METADATA_KEY_2
=
RAG_METADATA_KEY_2
curl
-X
POST
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json"
\
https:// ${
LOCATION
}
-aiplatform.googleapis.com/v1beta1/projects/ ${
PROJECT_ID
}
/locations/ ${
LOCATION
}
/ragCorpora/ ${
RAG_CORPUS_ID
}
/ragFiles/ ${
RAG_FILE_ID
}
/ragMetadata:batchDelete
\
-d
'{
"names": [
"projects/'
"
${
PROJECT_ID
}
"
'/locations/'
"
${
LOCATION
}
"
'/ragCorpora/'
"
${
RAG_CORPUS_ID
}
"
'/ragFiles/'
"
${
RAG_FILE_ID
}
"
'/ragMetadata/'
"
${
RAG_METADATA_KEY_1
}
"
'",
"projects/'
"
${
PROJECT_ID
}
"
'/locations/'
"
${
LOCATION
}
"
'/ragCorpora/'
"
${
RAG_CORPUS_ID
}
"
'/ragFiles/'
"
${
RAG_FILE_ID
}
"
'/ragMetadata/'
"
${
RAG_METADATA_KEY_2
}
"
'"
]
}'
Filter contexts with metadata
To improve the relevance of your results, use file metadata to narrow down the search space during context retrieval. Add a metadata_filter
expression to your RetrieveContexts
request using CEL (Common Expression Language)
(for example, year == 2024 && month == "May"
). Only files with metadata that matches the filter expression are considered for retrieval.
The following REST sample shows how to retrieve contexts using a metadata filter.
REST
Replace the following variables:
- PROJECT_ID : Your project ID.
- LOCATION : The region to process the request.
- RAG_CORPUS_ID : The ID of the RAG corpus.
PROJECT_ID
=
PROJECT_ID
LOCATION
=
LOCATION
RAG_CORPUS_ID
=
RAG_CORPUS_ID
QUERY
=
"except share amounts"
curl
-X
POST
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json"
\
https:// ${
LOCATION
}
-aiplatform.googleapis.com/v1beta1/projects/ ${
PROJECT_ID
}
/locations/ ${
LOCATION
}
:retrieveContexts
-d
'{
"vertex_rag_store": {
"rag_resources": [
{
"rag_corpus": "projects/'
"
${
PROJECT_ID
}
"
'/locations/'
"
${
LOCATION
}
"
'/ragCorpora/'
"
${
RAG_CORPUS_ID
}
"
'"
}
]
},
"query": {
"text": "'
"
${
QUERY
}
"
'",
"rag_retrieval_config": {
"top_k": 10,
"filter": {
"vector_distance_threshold": 0.5,
"metadata_filter": "year == 2024 && month == \"May\""
}
}
}
}'
What's next
- To learn more about uploading and managing files, see Manage your RAG knowledge base .
- To learn more about standard vector search in RAG, see Use Vector Search with RAG .

