Prefix caching is a feature that reduces inference time by storing and reusing the intermediate LLM state of processing a shared and recurring prompt prefix part. To enable prefix caching, you only have to separate the static prefix from the dynamic suffix in your API request.
Prefix caching currently only supports text-only input, so you shouldn't use this feature if you're providing an image in your prompt.
Implement prefix caching
To enable prefix caching for Prompt API, add the shared portion of the prompt
into the promptPrefix
field, as shown in the following code snippet:
Kotlin
val
promptPrefix
=
"Reverse the given sentence: "
val
dynamicSuffix
=
"Hello World"
val
result
=
generativeModel
.
generateContent
(
generateContentRequest
(
TextPart
(
dynamicSuffix
))
{
promptPrefix
=
PromptPrefix
(
promptPrefix
)
}
)
Java
String
promptPrefix
=
"Reverse the given sentence: "
;
String
dynamicSuffix
=
"Hello World"
;
GenerateContentResponse
response
=
generativeModelFutures
.
generateContent
(
new
GenerateContentRequest
.
Builder
(
new
TextPart
(
dynamicSuffix
))
.
setPromptPrefix
(
new
PromptPrefix
(
promptPrefix
))
.
build
())
.
get
();
In the preceding snippet, the dynamicSuffix
is passed as the main content, and
the promptPrefix
is provided separately.
Estimated performance gains
| Without prefix caching |
With prefix cache-hit (Prefix cache-miss may occur when prefix is used for the first time) |
|
|---|---|---|
| Pixel 9 with 300-token fixed prefix and a 50-token dynamic suffix prompt |
0.82 seconds |
0.45 seconds |
| Pixel 9 with a 1,000-token fixed prefix and a 100-token dynamic suffix prompt |
2.11 seconds |
0.5 seconds |
Storage considerations
With prefix caching, cache files are saved on the client application's private storage, which increases your app's storage usage. Encrypted cache files and their associated metadata, including original prefix text, are stored. Keep the following storage considerations in mind:
- The number of caches is managed by an LRU (Least Recently Used) mechanism. Least used caches are deleted automatically when exceeding the max total cache amount.
- Prompt cache sizes are dependent on the length of the prefix.
-
To clear all caches created from prefix caching, use the
generativeMode.clearCaches()method.


