GenerativeModel

public interface GenerativeModel 

Provides an interface for performing content generation.

It supports both standard and streaming inferences, as well as methods for preparing and cleaning up model resources.

Typical usage:

val request = generateContentRequest { text("Your input text here.") }

try {
val result = generativeModel.generateContent(request)
println(result.text)
} catch (e: GenAiException) {
// Handle exception
}

Summary

Public methods

abstract @ FeatureStatus int

Checks the current availability status of the content generation feature.

abstract void

Clears all caches created by implicit prefix caching.

abstract void
close ()

Releases resources associated with the content generation engine.

abstract @ NonNull CountTokensResponse

Counts the number of tokens in the request.

abstract @ NonNull Flow <@ NonNull DownloadStatus >

Downloads the required model assets for the content generation feature if they are not already available.

default @ NonNull GenerateContentResponse

Performs asynchronous content generation on the provided input prompt.

abstract @ NonNull GenerateContentResponse

Performs asynchronous content generation on the provided input request.

default @ NonNull GenerateContentResponse
generateContent (
    @ NonNull String  prompt,
    @ NonNull StreamingCallback  streamingCallback
)

Performs streaming content generation inference on the provided input prompt.

abstract @ NonNull GenerateContentResponse
generateContent (
    @ NonNull GenerateContentRequest  request,
    @ NonNull StreamingCallback  streamingCallback
)

Performs streaming content generation inference on the provided input request.

default @ NonNull Flow <@ NonNull GenerateContentResponse >

Performs streaming content generation inference on the provided input prompt.

abstract @ NonNull Flow <@ NonNull GenerateContentResponse >

Performs streaming content generation inference on the provided input request.

abstract @ NonNull String

Returns the name of the base model used by this generator instance.

abstract @ NonNull Caches

Provides explicit management of cached contexts.

abstract int

Returns total token limit for the API including both input and output tokens.

abstract boolean

Checks if the caching feature is available.

abstract void
warmup ()

Warms up the inference engine for use by loading necessary models and initializing runtime components.

Public methods

checkStatus

abstract @ FeatureStatus 
int  checkStatus 
()

Checks the current availability status of the content generation feature.

Returns
@ FeatureStatus int

a feature status indicating the feature's readiness.

clearImplicitCaches

abstract void  clearImplicitCaches 
()

Clears all caches created by implicit prefix caching.

This experimental method clears all caches created by implicit prefix caching. When promptPrefix is provided in generateContent or generateContentStream , the system caches its processing implicitly to reduce inference time for subsequent requests sharing the same prefix. This method clears all such created caches.

close

abstract void  close 
()

Releases resources associated with the content generation engine.

This should be called when the GenerativeModel is no longer needed. Can be safely called multiple times.

countTokens

abstract @ NonNull 
 CountTokensResponse 
 countTokens 
(@ NonNull 
 GenerateContentRequest 
 request)

Counts the number of tokens in the request.

The number of tokens counted includes only input request tokens. The result can be compared with getTokenLimit to check if the request is within the token limit.

Parameters
@ NonNull GenerateContentRequest  request

a non-null GenerateContentRequest containing input content.

Returns
@ NonNull CountTokensResponse

a CountTokensResponse containing the number of tokens in the request.

download

abstract @ NonNull 
 Flow 
<@ NonNull 
 DownloadStatus 
 download 
()

Downloads the required model assets for the content generation feature if they are not already available.

Use this method to proactively download models before inference. The returned Flow emits DownloadStatus to report progress and completion status.

Returns
@ NonNull Flow <@ NonNull DownloadStatus >

a Flow which will emit DownloadStatus s for download progress updates.

generateContent

default @ NonNull 
 GenerateContentResponse 
 generateContent 
(@ NonNull 
 String 
 prompt)

Performs asynchronous content generation on the provided input prompt.

This is a convenience method that wraps the input prompt in a GenerateContentRequest with default generation parameters.

Parameters
@ NonNull String  prompt

the input prompt text.

Returns
@ NonNull GenerateContentResponse

a GenerateContentResponse containing the generated content.

Throws
GenAiException

if the inference fails.

See also
generateContent

(request: GenerateContentRequest)

generateContent

abstract @ NonNull 
 GenerateContentResponse 
 generateContent 
(@ NonNull 
 GenerateContentRequest 
 request)

Performs asynchronous content generation on the provided input request.

This is the standard, non-streaming version of inference. The full generation suggestions are returned once the model completes processing.

This method is non-blocking. To handle the result, callers should use try / catch to handle the returned GenerateContentResponse or a potential GenAiException.

The coroutine that runs generateContent is cancellable. If the inference is no longer needed (e.g., the user navigates away or input changes), the coroutine can be cancelled.

Note that inference requests may fail under certain conditions such as:

Parameters
@ NonNull GenerateContentRequest  request

a non-null GenerateContentRequest containing input content.

Returns
@ NonNull GenerateContentResponse

a GenerateContentResponse containing the generated content.

Throws
GenAiException

if the inference fails.

generateContent

default @ NonNull 
 GenerateContentResponse 
 generateContent 
(
    @ NonNull String  prompt,
    @ NonNull StreamingCallback  streamingCallback
)

Performs streaming content generation inference on the provided input prompt.

This is a convenience method that wraps the input prompt in a GenerateContentRequest with default generation parameters.

Parameters
@ NonNull String  prompt

the input prompt text.

@ NonNull StreamingCallback  streamingCallback

a non-null StreamingCallback for receiving streamed results.

Returns
@ NonNull GenerateContentResponse

a GenerateContentResponse containing the final generated content.

Throws
GenAiException

if the inference fails.

See also
generateContent

(request: GenerateContentRequest, streamingCallback: StreamingCallback)

generateContent

abstract @ NonNull 
 GenerateContentResponse 
 generateContent 
(
    @ NonNull GenerateContentRequest  request,
    @ NonNull StreamingCallback  streamingCallback
)

Performs streaming content generation inference on the provided input request.

Partial results are delivered incrementally through the provided StreamingCallback . The function suspends until all results are received and returns the complete, final GenerateContentResponse . If streaming is interrupted by a GenAiException, consider removing any already streamed partial output from the UI.

The coroutine that runs generateContent is cancellable. If the inference is no longer needed (e.g., the user navigates away or input changes), the coroutine can be cancelled.

Note that inference requests may fail under certain conditions such as:

Parameters
@ NonNull GenerateContentRequest  request

a non-null GenerateContentRequest containing input content.

@ NonNull StreamingCallback  streamingCallback

a non-null StreamingCallback for receiving streamed results.

Returns
@ NonNull GenerateContentResponse

a GenerateContentResponse containing the final generated content.

Throws
GenAiException

if the inference fails.

generateContentStream

default @ NonNull 
 Flow 
<@ NonNull 
 GenerateContentResponse 
 generateContentStream 
(@ NonNull 
 String 
 prompt)

Performs streaming content generation inference on the provided input prompt.

This is a convenience method that wraps the input prompt in a GenerateContentRequest with default generation parameters.

Parameters
@ NonNull String  prompt

the input prompt text.

Returns
@ NonNull Flow <@ NonNull GenerateContentResponse >

a Flow which will emit GenerateContentResponse s as they are returned from the model.

Throws
kotlin.IllegalArgumentException

if request.candidateCount is greater than 1. If you need to receive multiple candidates in the final result, please use the generateContent(prompt: String, streamingCallback: StreamingCallback) method instead.

See also
generateContentStream

(request: GenerateContentRequest)

generateContentStream

abstract @ NonNull 
 Flow 
<@ NonNull 
 GenerateContentResponse 
 generateContentStream 
(@ NonNull 
 GenerateContentRequest 
 request)

Performs streaming content generation inference on the provided input request.

Partial results are delivered incrementally through the returned Flow . Each GenerateContentResponse contains a single Candidate . The last emitted value contains a Candidate with a FinishReason that is a non-null FinishReason (e.g., FinishReason.STOP or FinishReason.MAX_TOKENS)

This streaming mode is useful to build a more responsive UI. Streaming can be interrupted by a GenAiException. In that case, consider removing any already streamed partial output from the UI.

The coroutine collecting the Flow is cancellable. If the inference is no longer needed (e.g., the user navigates away or input changes), the coroutine can be cancelled

Note that inference requests may fail under certain conditions such as:

Important: This function currently only supports a candidateCount of 1 in the GenerateContentRequest . Providing a candidateCount greater than 1 will result in an IllegalArgumentException . If you need to receive multiple candidates in the final result, please use the generateContent(request: GenerateContentRequest, streamingCallback: StreamingCallback) method instead.

Parameters
@ NonNull GenerateContentRequest  request

a non-null GenerateContentRequest containing the input content.

Returns
@ NonNull Flow <@ NonNull GenerateContentResponse >

a Flow which will emit GenerateContentResponse s as they are returned from the model.

Throws
kotlin.IllegalArgumentException

if request.candidateCount is greater than 1.

getBaseModelName

abstract @ NonNull 
 String 
 getBaseModelName 
()

Returns the name of the base model used by this generator instance.

The model name may be used for logging, debugging, or feature gating purposes.

Returns
@ NonNull String

a String representing the base model name.

getCaches

abstract @ NonNull 
 Caches 
 getCaches 
()

Provides explicit management of cached contexts.

This experimental property offers direct control over cached contexts to accelerate inference.

The GenerativeModel supports two types of caching:

  1. Implicit Caching: Automatically created when a promptPrefix is provided in generateContent or generateContentStream . The system manages their lifecycle. These caches cannot be accessed directly but can be cleared using clearImplicitCaches .

  2. Explicit Caching: Managed via this caches property. Users can explicitly create, retrieve, and manage the lifecycle of cache entries.

Use isCachingFeatureAvailable to check if the caching feature is available. Attempting to use the caches property when the feature is not available will result in an exception.

getTokenLimit

abstract int  getTokenLimit 
()

Returns total token limit for the API including both input and output tokens.

This limit can be used with countTokens to check if a request is within limits before running inference. The input size returned by countTokens plusing the output size specified by GenerateContentRequest.maxOutputTokens should be no larger than the limit returned by this method.

Returns
int

token limit.

isCachingFeatureAvailable

abstract boolean  isCachingFeatureAvailable 
()

Checks if the caching feature is available.

Returns
boolean

true if the caching feature is available, false otherwise.

warmup

abstract void  warmup 
()

Warms up the inference engine for use by loading necessary models and initializing runtime components.

While calling this method is optional, we recommend invoking it well before the first inference call to reduce the latency of the initial inference.

Throws
GenAiException

if the preparation fails.

Create a Mobile Website
View Site in Mobile | Classic
Share by: