Gemini 3is our most intelligent model family to date, built on a foundation of state-of-the-art reasoning. It is designed to bring any idea to life by mastering agentic workflows, autonomous coding, and complex multimodal tasks.
This guide provides a consolidated, practical path to get started with Gemini 3 on Vertex AI, highlighting Gemini 3's key features and best practices.
Quickstart
Before you begin, you must authenticate to Vertex AI using an API key or application default credentials (ADC). See authentication methods for more information.
Install the Google Gen AI SDK
Gemini 3 API features require Gen AI SDK for Python version 1.51.0 or later.
pip
install
--upgrade
google-genai
Set environment variables to use the Gen AI SDK with Vertex AI
Replace the GOOGLE_CLOUD_PROJECT
value with your Google Cloud Project ID.
The Gemini 3 Pro Preview model gemini-3-pro-preview
and
Gemini 3 Flash Preview model gemini-3-flash-preview
are only available
on globalendpoints:
export
GOOGLE_CLOUD_PROJECT
=
GOOGLE_CLOUD_PROJECT
export
GOOGLE_CLOUD_LOCATION
=
global export
GOOGLE_GENAI_USE_VERTEXAI
=
True
Make your first request
By default, Gemini 3 Pro and Gemini 3 Flash use dynamic thinking to reason through
prompts. For faster, lower-latency responses when complex reasoning isn't
required, you can constrain the model's thinking_level
. Low thinking is
ideal for high-throughput tasks where speed is paramount.
For fast, low-latency responses:
from
google
import
genai
from
google.genai
import
types
client
=
genai
.
Client
()
response
=
client
.
models
.
generate_content
(
model
=
"gemini-3-pro-preview"
,
contents
=
"How does AI work?"
,
config
=
types
.
GenerateContentConfig
(
thinking_config
=
types
.
ThinkingConfig
(
thinking_level
=
types
.
ThinkingLevel
.
LOW
# For fast and low latency response
)
),
)
print
(
response
.
text
)
Try complex reasoning tasks
Gemini 3 excels at advanced reasoning. For complex tasks like multi-step planning, verified code generation, or deep tool use, use high thinking levels. Use these configurations for tasks that previously required specialized reasoning models.
For slower, high-reasoning tasks:
from
google
import
genai
from
google.genai
import
types
client
=
genai
.
Client
()
prompt
=
"""
You are tasked with implementing the classic Thread-Safe Double-Checked Locking (DCL) Singleton pattern in modern C++. This task is non-trivial and requires specialized concurrency knowledge to prevent memory reordering issues.
Write a complete, runnable C++ program named `dcl_singleton.cpp` that defines a class `Singleton` with a private constructor and a static `getInstance()` method.
Your solution MUST adhere to the following strict constraints:
1. The Singleton instance pointer (`static Singleton*`) must be wrapped in `std::atomic` to correctly manage memory visibility across threads.
2. The `getInstance()` method must use `std::memory_order_acquire` when reading the instance pointer in the outer check.
3. The instance creation and write-back must use `std::memory_order_release` when writing to the atomic pointer.
4. A standard `std::mutex` must be used only to protect the critical section (the actual instantiation).
5. The `main` function must demonstrate safe, concurrent access by launching at least three threads, each calling `Singleton::getInstance()`, and printing the address of the returned instance to prove all threads received the same object.
"""
response
=
client
.
models
.
generate_content
(
model
=
"gemini-3-pro-preview"
,
contents
=
prompt
,
config
=
types
.
GenerateContentConfig
(
thinking_config
=
types
.
ThinkingConfig
(
thinking_level
=
types
.
ThinkingLevel
.
HIGH
# Dynamic thinking for high reasoning tasks
)
),
)
print
(
response
.
text
)
Other thinking levels
Gemini 3 Flash introduces two new thinking levels: MINIMAL
and MEDIUM
to give you even more control over how the model handles complex
reasoning tasks.
MINIMAL
offers close-to-zero thinking budget options for tasks optimized for
throughput, rather than reasoning. MEDIUM
allows for a balance between speed
and reasoning that allows for some reasoning capability but still prioritizes
low-latency operations.
New API features
Gemini 3 introduces powerful API enhancements and new parameters designed to give developers granular control over performance (latency, cost), model behavior, and multimodal fidelity.
This table summarizes the core new features and parameters available, along with direct links to their detailed documentation:
| New Feature/API Change | Documentation |
|---|---|
Model: gemini-3-pro-preview
|
Model card Model Garden |
| Thinking level | Thinking |
| Media resolution | Image understanding Video understanding Audio understanding Document understanding |
| Thought signature | Thought signatures |
| Temperature | API reference |
| Multimodal function responses | Function calling: Multimodal function responses |
| Streaming function calling | Function calling: Streaming function calling |
Thinking level
The thinking_level
parameter lets you specify a thinking budget for
the model's response generation. By selecting one of two states, you can
explicitly balance the trade-offs between response quality and reasoning
complexity and latency and cost.
-
MINIMAL: (Gemini 3 Flash only)Constrains the model to use as few tokens as possible for thinking and is best used for low-complexity tasks that wouldn't benefit from extensive reasoning.MINIMALis as close as possible to a zero budget for thinking, but still requires thought signatures . -
LOW: Constrains the model to use fewer tokens for thinking and is suitable for simpler tasks where extensive reasoning is not required.LOWis ideal for high-throughput tasks where speed is essential. -
MEDIUM: (Gemini 3 Flash only)Offers a balanced approach suitable for tasks of moderate complexity that benefit from reasoning but don't require deep, multi-step planning. It provides more reasoning capability thanLOWwhile maintaining lower latency thanHIGH. -
HIGH: Allows the model to use more tokens for thinking and is suitable for complex prompts requiring deep reasoning, such as multi-step planning, verified code generation, or advanced function calling scenarios. This is the default level for Gemini 3 Pro and Gemini 3 Flash. Use this configuration when replacing tasks you might have previously relied on specialized reasoning models for.
Gen AI SDK example
from
google
import
genai
from
google.genai
import
types
client
=
genai
.
Client
()
response
=
client
.
models
.
generate_content
(
model
=
"gemini-3-pro-preview"
,
contents
=
"Find the race condition in this multi-threaded C++ snippet: [code here]"
,
config
=
types
.
GenerateContentConfig
(
thinking_config
=
types
.
ThinkingConfig
(
thinking_level
=
types
.
ThinkingLevel
.
HIGH
# Default, dynamic thinking
)
),
)
print
(
response
.
text
)
OpenAI compatibility example
For users utilizing the OpenAI compatibility layer, standard parameters are automatically mapped to Gemini 3 equivalents:
-
reasoning_effortmaps tothinking_level. -
none: maps tothinking_levelminimal (Gemini 3 Flash only). -
medium: maps tothinking_levelmedium for Gemini 3 Flash, andthinking_levelhigh for Gemini 3 Pro.
import
openai
from
google.auth
import
default
from
google.auth.transport.requests
import
Request
credentials
,
_
=
default
(
scopes
=
[
"https://www.googleapis.com/auth/cloud-platform"
])
client
=
openai
.
OpenAI
(
base_url
=
f
"https://aiplatform.googleapis.com/v1/projects/
{
PROJECT_ID
}
/locations/global/endpoints/openapi"
,
api_key
=
credentials
.
token
,
)
prompt
=
"""
Write a bash script that takes a matrix represented as a string with
format '[1,2],[3,4],[5,6]' and prints the transpose in the same format.
"""
response
=
client
.
chat
.
completions
.
create
(
model
=
"gemini-3-pro-preview"
,
reasoning_effort
=
"medium"
,
# Map to thinking_level high.
messages
=
[{
"role"
:
"user"
,
"content"
:
prompt
}],
)
print
(
response
.
choices
[
0
]
.
message
.
content
)
Media resolution
Gemini 3 introduces granular control over multimodal vision
processing using the media_resolution
parameter. Higher resolutions improve
the model's ability to read fine text or identify small details, but increase
token usage and latency. The media_resolution
parameter determines the maximum number of tokens allocated per input image, PDF page or video
frame.
You can set the resolution to low
, medium
, or high
globally
(using generation_config
) or for individual media parts. The ultra_high
resolution can only be set for individual media parts. If unspecified, the
model uses optimal defaults based on the media type.
Token counts
This table summarizes the approximate token counts for each media_resolution
value and media type.
| Media Resolution | Image | Video | |
|---|---|---|---|
UNSPECIFIED
(Default) |
1120 | 70 | 560 |
LOW
|
280 | 70 | 280 + Text |
MEDIUM
|
560 | 70 | 560 + Text |
HIGH
|
1120 | 280 | 1120 + Text |
ULTRA_HIGH
|
2240 | N/A | N/A |
Recommended settings
| Media Resolution | Max Tokens | Usage Guidance |
|---|---|---|
ultra_high
|
2240 | Tasks requiring analysis of fine details in images, such as processing screen recording stills or high-resolution photos. |
high
|
1120 | Image analysis tasks to ensure maximum quality. |
medium
|
560 | |
low
|
Image: 280 Video: 70 | Sufficient for most tasks. Note:For Video low
is a maximum of 70 tokens per frame. |
Setting media_resolution
per individual part
You can set media_resolution
per individual media part:
from
google
import
genai
from
google.genai
import
types
client
=
genai
.
Client
()
response
=
client
.
models
.
generate_content
(
model
=
"gemini-3-pro-preview"
,
contents
=
[
types
.
Part
(
file_data
=
types
.
FileData
(
file_uri
=
"gs://cloud-samples-data/generative-ai/image/a-man-and-a-dog.png"
,
mime_type
=
"image/jpeg"
,
),
media_resolution
=
types
.
PartMediaResolution
(
level
=
types
.
PartMediaResolutionLevel
.
MEDIA_RESOLUTION_HIGH
# High resolution
),
),
Part
(
file_data
=
types
.
FileData
(
file_uri
=
"gs://cloud-samples-data/generative-ai/video/behind_the_scenes_pixel.mp4"
,
mime_type
=
"video/mp4"
,
),
media_resolution
=
types
.
PartMediaResolution
(
level
=
types
.
PartMediaResolutionLevel
.
MEDIA_RESOLUTION_LOW
# Low resolution
),
),
"When does the image appear in the video? What is the context?"
,
],
)
print
(
response
.
text
)
Setting media_resolution
globally
You can also set media_resolution
globally (using GenerateContentConfig
):
from
google
import
genai
from
google.genai
import
types
client
=
genai
.
Client
()
response
=
client
.
models
.
generate_content
(
model
=
"gemini-3-pro-preview"
,
contents
=
[
types
.
Part
(
file_data
=
types
.
FileData
(
file_uri
=
"gs://cloud-samples-data/generative-ai/image/a-man-and-a-dog.png"
,
mime_type
=
"image/jpeg"
,
),
),
"What is in the image?"
,
],
config
=
types
.
GenerateContentConfig
(
media_resolution
=
types
.
MediaResolution
.
MEDIA_RESOLUTION_LOW
,
# Global setting
),
)
print
(
response
.
text
)
Thought signatures
Thought signatures are encrypted tokens that preserve the model's reasoning state during multi-turn conversations, specifically when using function calling.
When a thinking model decides to call an external tool, it pauses its internal reasoning process. The thought signature acts as a "save state," allowing the model to resume its chain of thought seamlessly once you provide the function's result.
For more information, see Thought signatures .
Why are thought signatures important?
Without thought signatures, the model "forgets" its specific reasoning steps during the tool execution phase. Passing the signature back ensures:
- Context continuity: The model preserves the reason for why the tool was called.
- Complex reasoning: Enables multi-step tasks where the output of one tool informs the reasoning for the next.
Where are thought signatures returned?
Gemini 3 Pro and Gemini 3 Flash enforce stricter validation and updated handling on thought signatures which were originally introduced in Gemini 2.5. To ensure the model maintains context across multiple turns of a conversation, you must return the thought signatures in your subsequent requests.
- Model responses with a function call will always return a thought
signature, even when using the
MINIMALthinking level. - When there are parallel function calls, the first function call part returned by the model response will have a thought signature.
- When there are sequential function calls (multi-step), each function call will have a signature and clients are expected to pass signature back
- Model responses without a function call will return a thought signature inside the last part returned by the model.
How to handle thought signatures?
There are two primary ways to handle thought signatures: automatically using the Gen AI SDKs or OpenAI API, or manually if you are interacting with the API directly.
Automated handling (recommended)
If you are using the Google Gen AI SDKs (Python, Node.js, Go,
Java) or OpenAI Chat Completions API, and utilizing the standard chat history
features or appending the full model response, thought_signatures
are
handled automatically. You don't need to make any changes to your code.
Manual function calling example
When using the Gen AI SDK, thought signatures are handled automatically by appending the full model response in sequential model requests:
from
google
import
genai
from
google.genai
import
types
client
=
genai
.
Client
()
# 1. Define your tool
get_weather_declaration
=
types
.
FunctionDeclaration
(
name
=
"get_weather"
,
description
=
"Gets the current weather temperature for a given location."
,
parameters
=
{
"type"
:
"object"
,
"properties"
:
{
"location"
:
{
"type"
:
"string"
}},
"required"
:
[
"location"
],
},
)
get_weather_tool
=
types
.
Tool
(
function_declarations
=
[
get_weather_declaration
])
# 2. Send a message that triggers the tool
prompt
=
"What's the weather like in London?"
response
=
client
.
models
.
generate_content
(
model
=
"gemini-3-pro-preview"
,
contents
=
prompt
,
config
=
types
.
GenerateContentConfig
(
tools
=
[
get_weather_tool
],
thinking_config
=
types
.
ThinkingConfig
(
include_thoughts
=
True
)
),
)
# 4. Handle the function call
function_call
=
response
.
function_calls
[
0
]
location
=
function_call
.
args
[
"location"
]
print
(
f
"Model wants to call:
{
function_call
.
name
}
"
)
# Execute your tool (e.g., call an API)
# (This is a mock response for the example)
print
(
f
"Calling external tool for:
{
location
}
"
)
function_response_data
=
{
"location"
:
location
,
"temperature"
:
"30C"
,
}
# 5. Send the tool's result back
# Append this turn's messages to history for a final response.
# The `content` object automatically attaches the required thought_signature behind the scenes.
history
=
[
types
.
Content
(
role
=
"user"
,
parts
=
[
types
.
Part
(
text
=
prompt
)]),
response
.
candidates
[
0
]
.
content
,
# Signature preserved here
types
.
Content
(
role
=
"tool"
,
parts
=
[
types
.
Part
.
from_function_response
(
name
=
function_call
.
name
,
response
=
function_response_data
,
)
],
)
]
response_2
=
client
.
models
.
generate_content
(
model
=
"gemini-3-pro-preview"
,
contents
=
history
,
config
=
types
.
GenerateContentConfig
(
tools
=
[
get_weather_tool
],
thinking_config
=
types
.
ThinkingConfig
(
include_thoughts
=
True
)
),
)
# 6. Get the final, natural-language answer
print
(
f
"
\n
Final model response:
{
response_2
.
text
}
"
)
Automatic function calling example
When using the Gen AI SDK in automatic function calling, thought signatures are handled automatically:
from
google
import
genai
from
google.genai
import
types
def
get_current_temperature
(
location
:
str
)
-
> dict
:
"""Gets the current temperature for a given location.
Args:
location: The city and state, for example San Francisco, CA
Returns:
A dictionary containing the temperature and unit.
"""
# ... (implementation) ...
return
{
"temperature"
:
25
,
"unit"
:
"Celsius"
}
client
=
genai
.
Client
()
response
=
client
.
models
.
generate_content
(
model
=
"gemini-3-pro-preview"
,
contents
=
"What's the temperature in Boston?"
,
config
=
types
.
GenerateContentConfig
(
tools
=
[
get_current_temperature
],
)
)
print
(
response
.
text
)
# The SDK handles the function call and thought signature, and returns the final text
OpenAI compatibility example
When using OpenAI Chat Completions API, thought signatures are handled automatically by appending the full model response in sequential model requests:
...
# Append user prompt and assistant response including thought signatures
messages
.
append
(
response1
.
choices
[
0
]
.
message
)
# Execute the tool
tool_call_1
=
response1
.
choices
[
0
]
.
message
.
tool_calls
[
0
]
result_1
=
get_current_temperature
(
**
json
.
loads
(
tool_call_1
.
function
.
arguments
))
# Append tool response to messages
messages
.
append
(
{
"role"
:
"tool"
,
"tool_call_id"
:
tool_call_1
.
id
,
"content"
:
json
.
dumps
(
result_1
),
}
)
response2
=
client
.
chat
.
completions
.
create
(
model
=
"gemini-3-pro-preview"
,
messages
=
messages
,
tools
=
tools
,
extra_body
=
{
"extra_body"
:
{
"google"
:
{
"thinking_config"
:
{
"include_thoughts"
:
True
,
},
},
},
},
)
print
(
response2
.
choices
[
0
]
.
message
.
tool_calls
)
See the full code example .
Manual handling
If you are interacting with the API directly or managing raw JSON payloads,
you must correctly handle the thought_signature
included in the model's turn.
You mustreturn this signature in the exact part where it was received when sending the conversation history back.
If proper signatures are not returned, Gemini 3 will return a 400
Error " <Function Call>
in the <index of contents array> content block is
missing a thought_signature
".
Multimodal function responses
Multimodal function calling allows users to have function responses containing multimodal objects allowing for improved utilization of function calling capabilities of the model. Standard function calling only supports text-based function responses:
from
google
import
genai
from
google.genai
import
types
client
=
genai
.
Client
()
# This is a manual, two turn multimodal function calling workflow:
# 1. Define the function tool
get_image_declaration
=
types
.
FunctionDeclaration
(
name
=
"get_image"
,
description
=
"Retrieves the image file reference for a specific order item."
,
parameters
=
{
"type"
:
"object"
,
"properties"
:
{
"item_name"
:
{
"type"
:
"string"
,
"description"
:
"The name or description of the item ordered (e.g., 'green shirt')."
}
},
"required"
:
[
"item_name"
],
},
)
tool_config
=
types
.
Tool
(
function_declarations
=
[
get_image_declaration
])
# 2. Send a message that triggers the tool
prompt
=
"Show me the green shirt I ordered last month."
response_1
=
client
.
models
.
generate_content
(
model
=
"gemini-3-pro-preview"
,
contents
=
[
prompt
],
config
=
types
.
GenerateContentConfig
(
tools
=
[
tool_config
],
)
)
# 3. Handle the function call
function_call
=
response_1
.
function_calls
[
0
]
requested_item
=
function_call
.
args
[
"item_name"
]
print
(
f
"Model wants to call:
{
function_call
.
name
}
"
)
# Execute your tool (e.g., call an API)
# (This is a mock response for the example)
print
(
f
"Calling external tool for:
{
requested_item
}
"
)
function_response_data
=
{
"image_ref"
:
{
"$ref"
:
"dress.jpg"
},
}
function_response_multimodal_data
=
types
.
FunctionResponsePart
(
file_data
=
types
.
FunctionResponseFileData
(
mime_type
=
"image/png"
,
display_name
=
"dress.jpg"
,
file_uri
=
"gs://cloud-samples-data/generative-ai/image/dress.jpg"
,
)
)
# 4. Send the tool's result back
# Append this turn's messages to history for a final response.
history
=
[
types
.
Content
(
role
=
"user"
,
parts
=
[
types
.
Part
(
text
=
prompt
)]),
response_1
.
candidates
[
0
]
.
content
,
types
.
Content
(
role
=
"tool"
,
parts
=
[
types
.
Part
.
from_function_response
(
name
=
function_call
.
name
,
response
=
function_response_data
,
parts
=
[
function_response_multimodal_data
]
)
],
)
]
response_2
=
client
.
models
.
generate_content
(
model
=
"gemini-3-pro-preview"
,
contents
=
history
,
config
=
types
.
GenerateContentConfig
(
tools
=
[
tool_config
],
thinking_config
=
types
.
ThinkingConfig
(
include_thoughts
=
True
)
),
)
print
(
f
"
\n
Final model response:
{
response_2
.
text
}
"
)
Streaming function calling
You can use streaming partial function call arguments to improve streaming
experience on tool use. This feature can be enabled by explicitly setting stream_function_call_arguments
to true
:
from
google
import
genai
from
google.genai
import
types
client
=
genai
.
Client
()
get_weather_declaration
=
types
.
FunctionDeclaration
(
name
=
"get_weather"
,
description
=
"Gets the current weather temperature for a given location."
,
parameters
=
{
"type"
:
"object"
,
"properties"
:
{
"location"
:
{
"type"
:
"string"
}},
"required"
:
[
"location"
],
},
)
get_weather_tool
=
types
.
Tool
(
function_declarations
=
[
get_weather_declaration
])
for
chunk
in
client
.
models
.
generate_content_stream
(
model
=
"gemini-3-pro-preview"
,
contents
=
"What's the weather in London and New York?"
,
config
=
types
.
GenerateContentConfig
(
tools
=
[
get_weather_tool
],
tool_config
=
types
.
ToolConfig
(
function_calling_config
=
types
.
FunctionCallingConfig
(
mode
=
types
.
FunctionCallingConfigMode
.
AUTO
,
stream_function_call_arguments
=
True
,
)
),
),
):
function_call
=
chunk
.
function_calls
[
0
]
if
function_call
and
function_call
.
name
:
print
(
f
"
{
function_call
.
name
}
"
)
print
(
f
"will_continue=
{
function_call
.
will_continue
}
"
)
Model response example:
{
"candidates"
:
[
{
"content"
:
{
"role"
:
"model"
,
"parts"
:
[
{
"functionCall"
:
{
"name"
:
"get_weather"
,
"willContinue"
:
true
}
}
]
}
}
]
}
Temperature
-
Range for Gemini 3: 0.0 - 2.0 (default: 1.0)
For Gemini 3, it is strongly recommended to keep the temperature
parameter at its default value of 1.0
.
While previous models often benefited from tuning temperature to control creativity versus determinism, Gemini 3's reasoning capabilities are optimized for the default setting.
Changing the temperature (setting it to less than 1.0
) may lead to unexpected
behavior, such as looping or degraded performance, particularly in complex
mathematical or reasoning tasks.
Supported features
Gemini 3 models also support the following features:
- System instructions
- Structured output
- Function calling
- Grounding with Google Search
- Code execution
- URL Context
- Thinking
- Context caching
- Count tokens
- Chat completions
- Batch prediction
- Provisioned Throughput
- Dynamic shared quota
Prompting best practices
Gemini 3 is a reasoning model, which changes how you should prompt.
- Precise instructions:Be concise in your input prompts. Gemini 3 responds best to direct, clear instructions. It may over-analyze verbose or overly complex prompt engineering techniques used for older models.
- Output verbosity:By default, Gemini 3 is less verbose and prefers providing direct, efficient answers. If your use case requires a more conversational or "chatty" persona, you must explicitly steer the model in the prompt (for example, "Explain this as a friendly, talkative assistant").
- Grounding:For grounding use cases, we recommend using the following
developer instructions:
You are a strictly grounded assistant limited to the information provided in the User Context. In your answers, rely **only** on the facts that are directly mentioned in that context. You must **not** access or utilize your own knowledge or common sense to answer. Do not assume or infer from the provided facts; simply report them exactly as they appear. Your answer must be factual and fully truthful to the provided text, leaving absolutely no room for speculation or interpretation. Treat the provided context as the absolute limit of truth; any facts or details that are not directly mentioned in the context must be considered **completely untruthful** and **completely unsupported**. If the exact answer is not explicitly written in the context, you must state that the information is not available. - Using the Google Search tool:When using the Google Search tool,
Gemini 3 Flash can at times confuse the current date / time for
events in 2024. This can result in the model
formulating search queries for the wrong year. To ensure the model utilizes
the correct time period, explicitly reinforce the current date in
system instructions:For time-sensitive user queries that require up-to-date information, you MUST follow the provided current time (date and year) when formulating search queries in tool calls. Remember it is 2025 this year. - Knowledge cutoff: For certain queries, Gemini 3 Flash benefits
from being explicitly told its knowledge cutoff. This is the case when the
Google Search tool is disabled and the query explicitly requires the model
to be able to identify the cutoff data in parametric knowledge. Recommendation: Adding the following clause to
system instructions:Your knowledge cutoff date is January 2025. - Using
media_resolution: Use themedia_resolutionparameter to control the maximum number of tokens the model uses to represent an image or frames in videos. High resolution will enable the model to capture details in an image, and may use more tokens per frame, while lower resolution allows for optimizing cost and latency for images with less visual detail. - Improving video analysis: Use a higher Frame Per Second (FPS) sampling rate for videos requiring granular temporal analysis, such as fast-action understanding or high-speed motion tracking.
Migration considerations
Consider the following features and constraints when when migrating:
- Thinking level: Gemini 3 models use the
thinking_levelparameter to control the amount of internal reasoning the model performs ( low or high ) and to balance response quality, reasoning complexity, latency, and cost. - Temperature settings:If your existing code explicitly sets
temperature(especially to low values for deterministic outputs), it is recommended to remove this parameter and use the Gemini 3 default of1.0to avoid potential looping issues or performance degradation on complex tasks. - Thought signatures: For Gemini 3 models, if a thought signature is expected in a turn but not provided, the model returns an error instead of a warning.
- Media resolution and tokenization: Gemini 3 models use a variable sequence length for media tokenization instead of Pan and Scan, and have new default resolutions and token costs for images, PDFs, and video.
- Token counting for multimodality input:Token counts for multimodal
inputs (images, video. audio) are an estimation based on the chosen
media_resolution. As such, the result from thecount_tokensAPI call may not match the final consumed tokens. The accurate usage for billing is only available after execution within the response'susage_metadata. - Token consumption:Migrating to Gemini 3 defaults may increasetoken usage for images and PDFs but decreasetoken usage for video. If requests now exceed the context window due to higher default resolutions, it is recommended to explicitly reduce the media resolution.
- PDF & document understanding:Default OCR resolution for PDFs has
changed. If you relied on specific behavior for dense document parsing,
test the new
media_resolution: "high"setting to ensure continued accuracy. For Gemini 3 models, PDF token counts inusage_metadataare reported under the IMAGE modality instead of DOCUMENT. - Image segmentation:Image segmentation is not supported by Gemini 3 models. For workloads requiring built-in image segmentation, it is recommended to continue to utilize Gemini 2.5 Flash with thinking turned off.
- Multimodal function responses: For Gemini 3 models, you can include image and PDF data in function responses.
FAQ
-
What is the knowledge cutoff for Gemini 3 Pro?Gemini 3 has a knowledge cutoff of January 2025.
-
Which region is
gemini-3-pro-previewavailable on Google Cloud?Global. -
What are the context window limits?Gemini 3 models support a 1 million token input context window and up to 64k tokens of output.
-
Does
gemini-3-pro-previewsupport image output? No. -
Does
gemini-3-pro-previewsupport Gemini Live API ? No.
What's next
- Learn more about Gemini 3 Pro .
- Try the Intro to Gemini 3 Pro notebook tutorial.
- Learn about Function calling .
- Learn about Thinking .

