Use generateContent
or streamGenerateContent
to generate content with
Gemini.
The Gemini model family includes models that work with multimodal prompt requests. The term multimodal indicates that you can use more than one modality, or type of input, in a prompt. Models that aren't multimodal accept prompts only with text. Modalities can include text, audio, video, and more.
Create a Google Cloud account to get started
To start using the Gemini API in Vertex AI, create a Google Cloud account .
After creating your account, use this document to review the Gemini model request body , model parameters , response body , and some sample requests .
When you're ready, see the Gemini API in Vertex AI quickstart to learn how to send a request to the Gemini API in Vertex AI using a programming language SDK or the REST API.
Supported models
All Gemini models support content generation.
Parameter list
See examples for implementation details.
Request body
{ "cachedContent" : s tr i n g , "contents" : [ { "role" : s tr i n g , "parts" : [ { // Union field data can be only one of the following: "text" : s tr i n g , "inlineData" : { "mimeType" : s tr i n g , "data" : s tr i n g }, "fileData" : { "mimeType" : s tr i n g , "fileUri" : s tr i n g }, // End of list of possible types for union field data. "videoMetadata" : { "startOffset" : { "seconds" : i nte ger , "nanos" : i nte ger }, "endOffset" : { "seconds" : i nte ger , "nanos" : i nte ger }, "fps" : double } } ] } ], "systemInstruction" : { "role" : s tr i n g , "parts" : [ { "text" : s tr i n g } ] }, "tools" : [ { "functionDeclarations" : [ { "name" : s tr i n g , "description" : s tr i n g , "parameters" : { objec t ( Ope n API Objec t Schema) } } ] } ], "safetySettings" : [ { "category" : e nu m (HarmCa te gory) , "threshold" : e nu m (HarmBlockThreshold) } ], "generationConfig" : { "temperature" : nu mber , "topP" : nu mber , "topK" : nu mber , "candidateCount" : i nte ger , "maxOutputTokens" : i nte ger , "presencePenalty" : fl oa t , "frequencyPenalty" : fl oa t , "stopSequences" : [ s tr i n g ], "responseMimeType" : s tr i n g , "responseSchema" : schema , "seed" : i nte ger , "responseLogprobs" : boolea n , "logprobs" : i nte ger , "audioTimestamp" : boolea n }, "labels" : { s tr i n g : s tr i n g } }
The request body contains data with the following parameters:
cachedContent
Optional: string
The name of the cached content used as context to
serve the prediction. Format: projects/{project}/locations/{location}/cachedContents/{cachedContent}
contents
Required: Content
The content of the current conversation with the model.
For single-turn queries, this is a single instance. For multi-turn queries, this is a repeated field that contains conversation history and the latest request.
systemInstruction
Optional: Content
Available for gemini-2.0-flash
and gemini-2.0-flash-lite
.
Instructions for the model to steer it toward better performance. For example, "Answer as concisely as possible" or "Don't use technical terms in your response".
The text
strings count toward the token limit.
The role
field of systemInstruction
is ignored and doesn't affect the performance of the model.
tools
Optional. A piece of code that enables the system to interact with external systems to perform an action, or set of actions, outside of knowledge and scope of the model. See Function calling .
toolConfig
Optional. See Function calling .
safetySettings
Optional: SafetySetting
Per request settings for blocking unsafe content.
Enforced on GenerateContentResponse.candidates
.
generationConfig
Optional: GenerationConfig
Generation configuration settings.
labels
Optional: string
Metadata that you can add to the API call in the format of key-value pairs.
contents
The base structured data type containing multi-part content of a message.
This class consists of two main properties: role
and parts
. The role
property denotes the individual producing the content, while the parts
property contains multiple elements, each representing a segment of data within
a message.
role
string
The identity of the entity that creates the message. The following values are supported:
-
user
: This indicates that the message is sent by a real person, typically a user-generated message. -
model
: This indicates that the message is generated by the model.
The model
value is used to insert messages from the model into the conversation during multi-turn conversations.
parts
Part
A list of ordered parts that make up a single message. Different parts may have different IANA MIME types .
For limits on the inputs, such as the maximum number of tokens or the number of images, see the model specifications on the Google models page.
To compute the number of tokens in your request, see Get token count .
parts
A data type containing media that is part of a multi-part Content
message.
text
Optional: string
A text prompt or code snippet.
inlineData
Optional: Blob
Inline data in raw bytes.
For gemini-2.0-flash-lite
and gemini-2.0-flash
, you can specify up to 3000 images by using inlineData
.
fileData
Optional: fileData
Data stored in a file.
functionCall
Optional: FunctionCall
.
It contains a string representing the FunctionDeclaration.name
field and a structured JSON object containing any parameters for the function call predicted by the model.
See Function calling .
functionResponse
Optional: FunctionResponse
.
The result output of a FunctionCall
that contains a string representing the FunctionDeclaration.name
field and a structured JSON object containing any output from the function call. It is used as context to the model.
See Function calling .
videoMetadata
Optional: VideoMetadata
For video input, the start and end offset of the video in Duration format, and the frame rate of the video . For example, to specify a 10 second clip starting at 1:00 with a frame rate of 10 frames per second, set the following:
-
"startOffset": { "seconds": 60 }
-
"endOffset": { "seconds": 70 }
-
"fps": 10.0
The metadata should only be specified while the video data is presented
in inlineData
or fileData
.
blob
Content blob. If possible send as text rather than raw bytes.
mimeType
string
data
or fileUri
fields. Acceptable values include the following: Click to expand MIME types
-
application/pdf
-
audio/mpeg
-
audio/mp3
-
audio/wav
-
image/png
-
image/jpeg
-
image/webp
-
text/plain
-
video/mov
-
video/mpeg
-
video/mp4
-
video/mpg
-
video/avi
-
video/wmv
-
video/mpegps
-
video/flv
For gemini-2.0-flash-lite
and gemini-2.0-flash
, the maximum length of an audio file is 8.4 hours and the maximum length of a video file (without audio) is one hour. For more information, see Gemini audio
and video
requirements.
Text files must be UTF-8 encoded. The contents of the text file count toward the token limit.
There is no limit on image resolution.
data
bytes
The base64 encoding
of the image, PDF, or video
to include inline in the prompt. When including media inline, you must also specify the media
type ( mimeType
) of the data.
Size limit: 20MB
FileData
URI or web-URL data.
mimeType
string
IANA MIME type of the data.
fileUri
string
The URI or URL of the file to include in the prompt. Acceptable values include the following:
- Cloud Storage bucket URI:
The object must either be publicly readable or reside in
the same Google Cloud project that's sending the request. For
gemini-2.0-flash
andgemini-2.0-flash-lite
, the size limit is 2 GB. - HTTP URL: The file URL must be publicly readable. You can specify one video file, one audio file, and up to 10 image files per request. Audio files, video files, and documents can't exceed 15 MB.
- YouTube video URL: The YouTube video must be either owned by the account that you used to sign in to the Google Cloud console or is public. Only one YouTube video URL is supported per request.
When specifying a fileURI
, you must also specify the media type
( mimeType
) of the file. If VPC Service Controls is enabled, specifying a media file
URL for fileURI
is not supported.
functionCall
A predicted functionCall
returned from the model that contains a string
representing the functionDeclaration.name
and a structured JSON object
containing the parameters and their values.
name
string
The name of the function to call.
args
Struct
The function parameters and values in JSON object format.
See Function calling for parameter details.
functionResponse
The resulting output from a FunctionCall
that contains a string representing the FunctionDeclaration.name
. Also contains a structured JSON object with the
output from the function (and uses it as context for the model). This should contain the
result of a FunctionCall
made based on model prediction.
name
string
The name of the function to call.
response
Struct
The function response in JSON object format.
videoMetadata
Metadata describing the input video content.
startOffset
Optional: google.protobuf.Duration
The start offset of the video.
endOffset
Optional: google.protobuf.Duration
The end offset of the video.
fps
Optional: double
The frame rate of the video sent to the model. Defaults to 1.0
if not specified. The minimum accepted value is as low
as, but not including, 0.0
. The maximum value is 24.0
.
safetySetting
Safety settings.
category
Optional: HarmCategory
The safety category to configure a threshold for. Acceptable values include the following:
Click to expand safety categories
-
HARM_CATEGORY_SEXUALLY_EXPLICIT
-
HARM_CATEGORY_HATE_SPEECH
-
HARM_CATEGORY_HARASSMENT
-
HARM_CATEGORY_DANGEROUS_CONTENT
threshold
Optional: HarmBlockThreshold
The threshold for blocking responses that could belong to the specified safety category based on probability.
-
OFF
-
BLOCK_NONE
-
BLOCK_LOW_AND_ABOVE
-
BLOCK_MEDIUM_AND_ABOVE
-
BLOCK_ONLY_HIGH
method
Optional: HarmBlockMethod
Specify if the threshold is used for probability or severity score. If not specified, the threshold is used for probability score.
harmCategory
Harm categories that block content.
HARM_CATEGORY_UNSPECIFIED
The harm category is unspecified.
HARM_CATEGORY_HATE_SPEECH
The harm category is hate speech.
HARM_CATEGORY_DANGEROUS_CONTENT
The harm category is dangerous content.
HARM_CATEGORY_HARASSMENT
The harm category is harassment.
HARM_CATEGORY_SEXUALLY_EXPLICIT
The harm category is sexually explicit content.
harmBlockThreshold
Probability thresholds levels used to block a response.
HARM_BLOCK_THRESHOLD_UNSPECIFIED
Unspecified harm block threshold.
BLOCK_LOW_AND_ABOVE
Block low threshold and higher (i.e. block more).
BLOCK_MEDIUM_AND_ABOVE
Block medium threshold and higher.
BLOCK_ONLY_HIGH
Block only high threshold (i.e. block less).
BLOCK_NONE
Block none.
OFF
Switches off safety if all categories are turned OFF
harmBlockMethod
A probability threshold that blocks a response based on a combination of probability and severity.
HARM_BLOCK_METHOD_UNSPECIFIED
The harm block method is unspecified.
SEVERITY
The harm block method uses both probability and severity scores.
PROBABILITY
The harm block method uses the probability score.
generationConfig
Configuration settings used when generating the prompt.
temperature
Optional: float
The temperature is used for sampling during response generation, which occurs when topP
and topK
are applied. Temperature controls the degree of randomness in token selection.
Lower temperatures are good for prompts that require a less open-ended or creative response, while
higher temperatures can lead to more diverse or creative results. A temperature of 0
means that the highest probability tokens are always selected. In this case, responses for a given
prompt are mostly deterministic, but a small amount of variation is still possible.
If the model returns a response that's too generic, too short, or the model gives a fallback response, try increasing the temperature.
- Range for
gemini-2.0-flash-lite
:0.0 - 2.0
(default:1.0
) - Range for
gemini-2.0-flash
:0.0 - 2.0
(default:1.0
)
For more information, see Content generation parameters .
topP
Optional: float
If specified, nucleus sampling is used.
Top-P
changes how the model selects tokens for output. Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5
, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.
Specify a lower value for less random responses and a higher value for more random responses.
- Range:
0.0 - 1.0
- Default for
gemini-2.0-flash-lite
:0.95
- Default for
gemini-2.0-flash
:0.95
candidateCount
Optional: int
The number of response variations to return. For each request, you're charged for the output tokens of all candidates, but are only charged once for the input tokens.
Specifying multiple candidates is a Preview feature that works with generateContent
( streamGenerateContent
is not supported). The following models are supported:
-
gemini-2.0-flash-lite
:1
-8
, default:1
-
gemini-2.0-flash
:1
-8
, default:1
maxOutputTokens
Optional: int
Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.
Specify a lower value for shorter responses and a higher value for potentially longer responses.
For more information, see Content generation parameters .
stopSequences
Optional: List[string]
Specifies a list of strings that tells the model to stop generating text if one
of the strings is encountered in the response. If a string appears multiple
times in the response, then the response truncates where it's first encountered.
The strings are case-sensitive.
For example, if the following is the returned response when stopSequences
isn't specified: public
static string reverse(string myString)
Then the returned response with stopSequences
set to ["Str",
"reverse"]
is: public static string
Maximum 5 items in the list.
For more information, see Content generation parameters .
presencePenalty
Optional: float
Positive penalties.
Positive values penalize tokens that already appear in the generated text, increasing the probability of generating more diverse content.
The maximum value for presencePenalty
is up to, but not including, 2.0
. Its minimum value is -2.0
.
frequencyPenalty
Optional: float
Positive values penalize tokens that repeatedly appear in the generated text, decreasing the probability of repeating content.
This maximum value for frequencyPenalty
is up to, but not including, 2.0
. Its minimum value is -2.0
.
responseMimeType
Optional: string (enum)
The output response MIME type of the generated candidate text.
The following MIME types are supported:
-
application/json
: JSON response in the candidates. -
text/plain
(default): Plain text output. -
text/x.enum
: For classification tasks, output an enum value as defined in the response schema.
Specify the appropriate response type to avoid unintended behaviors. For
example, if you require a JSON-formatted response, specify application/json
and not text/plain
.
text/plain
isn't supported for use with responseSchema
. responseSchema
Optional: schema
The schema that generated candidate text must follow. For more information, see Control generated output .
To use this parameter, you must specify a supported mime type other
than text/plain
for the responseMimeType
parameter.
seed
Optional: int
When seed is fixed to a specific value, the model makes a best effort to provide the same response for repeated requests. Deterministic output isn't guaranteed. Also, changing the model or parameter settings, such as the temperature, can cause variations in the response even when you use the same seed value. By default, a random seed value is used.
responseLogprobs
Optional: boolean
If true, returns the log probabilities of the tokens that were chosen
by the model at each step. By default, this parameter is set to false
.
logprobs
Optional: int
Returns the log probabilities of the top candidate tokens at each generation step. The model's
chosen token might not be the same as the top candidate token at each step. Specify the number of
candidates to return by using an integer value in the range of 1
- 20
.
You must enable responseLogprobs
to use
this parameter.
audioTimestamp
Optional: boolean
Available for the following models:
- Gemini 2.0 Flash-Lite
- Gemini 2.0 Flash
Enables timestamp understanding for audio-only files.
This is a preview feature.
Response body
{ "candidates" : [ { "content" : { "parts" : [ { "text" : string } ] }, "finishReason" : enum ( FinishReason ), "safetyRatings" : [ { "category" : enum ( HarmCategory ), "probability" : enum ( HarmProbability ), "blocked" : boolean } ], "citationMetadata" : { "citations" : [ { "startIndex" : integer , "endIndex" : integer , "uri" : string , "title" : string , "license" : string , "publicationDate" : { "year" : integer , "month" : integer , "day" : integer } } ] }, "avgLogprobs" : double , "logprobsResult" : { "topCandidates" : [ { "candidates" : [ { "token" : string , "logProbability" : float } ] } ], "chosenCandidates" : [ { "token" : string , "logProbability" : float } ] } } ], "usageMetadata" : { "promptTokenCount" : integer , "candidatesTokenCount" : integer , "totalTokenCount" : integer }, "modelVersion" : string }
modelVersion
gemini-2.0-flash-lite-001
.text
finishReason
-
FINISH_REASON_STOP
: Natural stop point of the model or provided stop sequence. -
FINISH_REASON_MAX_TOKENS
: The maximum number of tokens as specified in the request was reached. -
FINISH_REASON_SAFETY
: Token generation was stopped because the response was flagged for safety reasons. Note thatCandidate.content
is empty if content filters block the output. -
FINISH_REASON_RECITATION
: The token generation was stopped because the response was flagged for unauthorized citations. -
FINISH_REASON_BLOCKLIST
: Token generation was stopped because the response includes blocked terms. -
FINISH_REASON_PROHIBITED_CONTENT
: Token generation was stopped because the response was flagged for prohibited content, such as child sexual abuse material (CSAM). -
FINISH_REASON_SPII
: Token generation was stopped because the response was flagged for sensitive personally identifiable information (SPII). -
FINISH_REASON_MALFORMED_FUNCTION_CALL
: Candidates were blocked because of malformed and unparsable function call. -
FINISH_REASON_OTHER
: All other reasons that stopped the token -
FINISH_REASON_UNSPECIFIED
: The finish reason is unspecified.
category
Click to expand safety categories
-
HARM_CATEGORY_SEXUALLY_EXPLICIT
-
HARM_CATEGORY_HATE_SPEECH
-
HARM_CATEGORY_HARASSMENT
-
HARM_CATEGORY_DANGEROUS_CONTENT
probability
-
HARM_PROBABILITY_UNSPECIFIED
-
NEGLIGIBLE
-
LOW
-
MEDIUM
-
HIGH
blocked
startIndex
content
. The startIndex
is in bytes and calculated from the response encoded in UTF-8.endIndex
content
. The endIndex
is in bytes and calculated from the response encoded in UTF-8.url
title
license
publicationDate
YYYY
, YYYY-MM
, and YYYY-MM-DD
.avgLogprobs
logprobsResult
topCandidates
) and the
actual chosen tokens ( chosenCandidates
) at each step.token
logProbability
promptTokenCount
candidatesTokenCount
totalTokenCount
Examples
Text Generation
Generate a text response from a text input.
Gen AI SDK for Python
Python (OpenAI)
You can call the Inference API by using the OpenAI library. For more information, see Call Vertex AI models by using the OpenAI library .
Go
Using multimodal prompt
Generate a text response from a multimodal input, such as text and an image.
Gen AI SDK for Python
Python (OpenAI)
You can call the Inference API by using the OpenAI library. For more information, see Call Vertex AI models by using the OpenAI library .
Go
Streaming text response
Generate a streaming model response from a text input.
Gen AI SDK for Python
Python (OpenAI)
You can call the Inference API by using the OpenAI library. For more information, see Call Vertex AI models by using the OpenAI library .
Go
Model versions
To use the auto-updated version
,
specify the model name without the trailing version number, for example gemini-2.0-flash
instead of gemini-2.0-flash-001
.
For more information, see Gemini model versions and lifecycle .
What's next
- Learn more about the Gemini API in Vertex AI .
- Learn more about Function calling .
- Learn more about Grounding responses for Gemini models .