You can use REST APIs or the Python SDK to reference content stored in a context cache in a generative AI application. Before it can be used, you must first create the context cache .
The context cache object you use in your code includes the following properties:
-
name
- The context cache resource name. Its format isprojects/ PROJECT_NUMBER /locations/ LOCATION /cachedContents/ CACHE_ID
. When you create a context cache, you can find its resource name is in the response. The project number is a unique identifier for your project. The cache ID is an ID for your cache. When you specify a context cache in your code, you must use the full context cache resource name. The following is an example that shows how you specify a cached content resource name in a request body:"cached_content" : "projects/123456789012/locations/us-central1/123456789012345678"
-
model
- The resource name for the model used to create the cache. Its format isprojects/ PROJECT_NUMBER /locations/ LOCATION /publishers/ PUBLISHER_NAME /models/ MODEL_ID
. -
createTime
- ATimestamp
that specifies the create time of the context cache. -
updateTime
- ATimestamp
that specifies the most recent update time of a context cache. After a context cache is created, and before it's updated, itscreateTime
andupdateTime
are the same. -
expireTime
- ATimestamp
that specifies when a context cache expires. The defaultexpireTime
is 60 minutes after thecreateTime
. You can update the cache with a new expiration time. For more information, see Update the context cache . After a cache expires, it's marked for deletion and you shouldn't assume that it can be used or updated. If you need to use a context cache that expired, you need to recreate it with an appropriate expiration time.
Context cache use restrictions
The following features can be specified when you create a context cache. You shouldn't specify them again in your request:
-
The
GenerativeModel.system_instructions
property. This property is used to specify instructions to the model before the model receives instructions from a user. For more information, see System instructions . -
The
GenerativeModel.tool_config
property. Thetool_config
property is used to specify tools used by the Gemini model, such as a tool used by the function calling feature. -
The
GenerativeModel.tools
property. TheGenerativeModel.tools
property is used to specify functions to create a function calling application. For more information, see Function calling .
Use a context cache sample
The following shows how to use a context cache. When you use a context cache, you can't specify the following properties:
-
GenerativeModel.system_instructions
-
GenerativeModel.tool_config
-
GenerativeModel.tools
Python
Install
pip install --upgrade google-genai
To learn more, see the SDK reference documentation .
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values # with appropriate values for your project. export GOOGLE_CLOUD_PROJECT = GOOGLE_CLOUD_PROJECT export GOOGLE_CLOUD_LOCATION = us-central1 export GOOGLE_GENAI_USE_VERTEXAI = True
Go
Learn how to install or update the Go .
To learn more, see the SDK reference documentation .
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values # with appropriate values for your project. export GOOGLE_CLOUD_PROJECT = GOOGLE_CLOUD_PROJECT export GOOGLE_CLOUD_LOCATION = us-central1 export GOOGLE_GENAI_USE_VERTEXAI = True
Java
Learn how to install or update the Java .
To learn more, see the SDK reference documentation .
Set environment variables to use the Gen AI SDK with Vertex AI:
# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values # with appropriate values for your project. export GOOGLE_CLOUD_PROJECT = GOOGLE_CLOUD_PROJECT export GOOGLE_CLOUD_LOCATION = us-central1 export GOOGLE_GENAI_USE_VERTEXAI = True
REST
You can use REST to use a context cache with a prompt by using the Vertex AI API to send a POST request to the publisher model endpoint.
Before using any of the request data, make the following replacements:
- PROJECT_ID : Your project ID .
- LOCATION : The region where the request to create the context cache was processed.
- MIME_TYPE : The text prompt to submit to the model.
HTTP method and URL:
POST https:// LOCATION -aiplatform.googleapis.com/v1/projects/ PROJECT_ID /locations/ LOCATION /publishers/google/models/gemini-2.0-flash-001:generateContent
Request JSON body:
{ "cachedContent": "projects/ PROJECT_NUMBER /locations/ LOCATION /cachedContents/ CACHE_ID ", "contents": [ {"role":"user","parts":[{"text":" PROMPT_TEXT "}]} ], "generationConfig": { "maxOutputTokens": 8192, "temperature": 1, "topP": 0.95, }, "safetySettings": [ { "category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_MEDIUM_AND_ABOVE" }, { "category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" }, { "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" }, { "category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_MEDIUM_AND_ABOVE" } ], }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https:// LOCATION -aiplatform.googleapis.com/v1/projects/ PROJECT_ID /locations/ LOCATION /publishers/google/models/gemini-2.0-flash-001:generateContent"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https:// LOCATION -aiplatform.googleapis.com/v1/projects/ PROJECT_ID /locations/ LOCATION /publishers/google/models/gemini-2.0-flash-001:generateContent" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Example curl command
LOCATION
=
"us-central1"
MODEL_ID
=
"gemini-2.0-flash-001"
PROJECT_ID
=
"test-project"
curl
-X
POST
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json"
\
"https://
${
LOCATION
}
-aiplatform.googleapis.com/v1/projects/
${
PROJECT_ID
}
/locations/
${
LOCATION
}
/publishers/google/models/
${
MODEL_ID
}
:generateContent"
-d
\
'{
"cachedContent": "projects/${PROJECT_NUMBER}/locations/${LOCATION}/cachedContents/${CACHE_ID}",
"contents": [
{"role":"user","parts":[{"text":"What are the benefits of exercise?"}]}
],
"generationConfig": {
"maxOutputTokens": 8192,
"temperature": 1,
"topP": 0.95,
},
"safetySettings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}
],
}'
- Learn how to update the expiration time of a context cache .
- Learn how to create a new context cache .
- Learn how to get information about all context caches associated with a Google Cloud project .
- Learn how to delete a context cache .