Gemini 3 Pro & Flash, Gemini 3 Pro Image (nano banana pro), and the latest Gemini Live API native audio models are now available to use with Firebase AI Logic on all platforms!

Thinking

Gemini 3 and Gemini 2.5 models can use an internal "thinking process" that significantly improves their reasoning and multi-step planning abilities, making them highly effective for complex tasks such as coding, advanced mathematics, and data analysis.

Thinking models offer the following configurations and options:

Thinking budget : You can configure how much "thinking" that a model can do using a thinking budget . This configuration is particularly important if reducing latency or cost is a priority. Also, review the comparison of task difficulties to decide how much a model might need its thinking capability.
Thought summaries : You can enable thought summaries to include with the generated response. These summaries are synthesized versions of the model's raw thoughts and offer insights into the model's internal reasoning process.
Thought signatures : The Firebase AI Logic SDKs automatically handle thought signatures for you, which ensures that the model has access to the thought context from previous turns specifically when using function calling.

Make sure to review the best practices and prompting guidance for using thinking models.

Use a thinking model

Use a thinking model just like you'd use any other Gemini model(initialize your chosen Gemini API provider, create a GenerativeModel instance, etc.). These models can be used for text or code generation tasks, like generating structured output or analyzing multimodal input (like images , video , audio , or PDFs ). You can even use thinking models when you're streaming the output.

Models that support this capability

Only Gemini 3 and Gemini 2.5 models support this capability.

gemini-3-pro-preview
gemini-3-flash-preview
gemini-3-pro-image-preview (aka "nano banana pro")
gemini-2.5-pro
gemini-2.5-flash
gemini-2.5-flash-lite

Best practices & prompting guidance for using thinking models

We recommend testing your prompt in Google AI Studio or Vertex AI Studio where you can view the full thinking process. You can identify any areas where the model may have gone astray so that you can refine your prompts to get more consistent and accurate responses.

Begin with a general prompt that describes the desired outcome, and observe the model's initial thoughts on how it determines its response. If the response isn't as expected, help the model generate a better response by using any of the following prompting techniques :

Provide step-by-step instructions
Provide several examples of input-output pairs
Provide guidance for how the output and responses should be phrased and be formatted
Provide specific verification steps

In addition to prompting, consider using these recommendations:

Set system instructions , which are like a "preamble" that you add before the model gets exposed to any further instructions from the prompt or end user. They let you steer the behavior of the model based on your specific needs and use cases.
Set a thinking budget to configure how much thinking the model can do. If you set a low budget, then the model won't "overthink" its response. If you set a high budget, then the model can think more if needed. Setting a thinking budget also reserves more of the total token output limit for the actual response.
Enable AI monitoring in the Firebase console to monitor the count of thinking tokens and the latency of your requests that have thinking enabled. And if you have thought summaries enabled, they will display in the console where you can inspect the model's detailed reasoning to help you debug and refine your prompts.

Control the thinking budget

To control how much thinking the model can do to generate its response, you can specify the number of thinking budget tokens that it's allowed to use.

You can manually set the thinking budget in situations where you might need more or fewer tokens than the default thinking budget. Find more detailed guidance about task complexity and suggested budgets later in this section. Here's some high-level guidance:

Set a low thinking budget if latency is important or for less complex tasks
Set a high thinking budget for more complex tasks

Set the thinking budget

Click your Gemini API provider to view provider-specific content and code on this page.

Set the thinking budget in a GenerationConfig as part of creating the GenerativeModel instance. The configuration is maintained for the lifetime of the instance. If you want to use different thinking budgets for different requests, then create GenerativeModel instances configured with each budget.

Learn about supported thinking budget values later in this section.

Swift

Set the thinking budget in a GenerationConfig as part of creating a GenerativeModel instance.

  // ... 
 // Set the thinking configuration 
 // Use a thinking budget value appropriate for your model (example value shown here) 
 let 
  
 generationConfig 
  
 = 
  
 GenerationConfig 
 ( 
  
 thinkingConfig 
 : 
  
 ThinkingConfig 
 ( 
 thinkingBudget 
 : 
  
 1024 
 ) 
 ) 
 // Specify the config as part of creating the `GenerativeModel` instance 
 let 
  
 model 
  
 = 
  
 FirebaseAI 
 . 
 firebaseAI 
 ( 
 backend 
 : 
  
 . 
 googleAI 
 ()). 
 generativeModel 
 ( 
  
 modelName 
 : 
  
 " GEMINI_MODEL_NAME 
" 
 , 
  
 generationConfig 
 : 
  
 generationConfig 
 ) 
 // ...

Kotlin

Set the values of the parameters in a GenerationConfig as part of creating a GenerativeModel instance.

  // ... 
 // Set the thinking configuration 
 // Use a thinking budget value appropriate for your model (example value shown here) 
 val 
  
 generationConfig 
  
 = 
  
 generationConfig 
  
 { 
  
 thinkingConfig 
  
 = 
  
 thinkingConfig 
  
 { 
  
 thinkingBudget 
  
 = 
  
 1024 
  
 } 
 } 
 // Specify the config as part of creating the `GenerativeModel` instance 
 val 
  
 model 
  
 = 
  
 Firebase 
 . 
 ai 
 ( 
 backend 
  
 = 
  
 GenerativeBackend 
 . 
 googleAI 
 ()). 
 generativeModel 
 ( 
  
 modelName 
  
 = 
  
 " GEMINI_MODEL_NAME 
" 
 , 
  
 generationConfig 
 , 
 ) 
 // ...

Java

Set the values of the parameters in a GenerationConfig as part of creating a GenerativeModel instance.

  // ... 
 // Set the thinking configuration 
 // Use a thinking budget value appropriate for your model (example value shown here) 
 ThinkingConfig 
  
 thinkingConfig 
  
 = 
  
 new 
  
 ThinkingConfig 
 . 
 Builder 
 () 
  
 . 
 setThinkingBudget 
 ( 
 1024 
 ) 
  
 . 
 build 
 (); 
 GenerationConfig 
  
 generationConfig 
  
 = 
  
 GenerationConfig 
 . 
 builder 
 () 
  
 . 
 setThinkingConfig 
 ( 
 thinkingConfig 
 ) 
  
 . 
 build 
 (); 
 // Specify the config as part of creating the `GenerativeModel` instance 
 GenerativeModelFutures 
  
 model 
  
 = 
  
 GenerativeModelFutures 
 . 
 from 
 ( 
  
 FirebaseAI 
 . 
 getInstance 
 ( 
 GenerativeBackend 
 . 
 googleAI 
 ()) 
  
 . 
 generativeModel 
 ( 
  
 /* modelName */ 
  
 " GEMINI_MODEL_NAME 
" 
 , 
  
 /* generationConfig */ 
  
 generationConfig 
  
 ); 
 ); 
 // ...

Web

Set the values of the parameters in a GenerationConfig as part of creating a GenerativeModel instance.

  // ... 
 const 
  
 ai 
  
 = 
  
 getAI 
 ( 
 firebaseApp 
 , 
  
 { 
  
 backend 
 : 
  
 new 
  
 GoogleAIBackend 
 () 
  
 }); 
 // Set the thinking configuration 
 // Use a thinking budget value appropriate for your model (example value shown here) 
 const 
  
 generationConfig 
  
 = 
  
 { 
  
 thinkingConfig 
 : 
  
 { 
  
 thinkingBudget 
 : 
  
 1024 
  
 } 
 }; 
 // Specify the config as part of creating the `GenerativeModel` instance 
 const 
  
 model 
  
 = 
  
 getGenerativeModel 
 ( 
 ai 
 , 
  
 { 
  
 model 
 : 
  
 " GEMINI_MODEL_NAME 
" 
 , 
  
 generationConfig 
  
 }); 
 // ...

Dart

Set the values of the parameters in a GenerationConfig as part of creating a GenerativeModel instance.

  // ... 
 // Set the thinking configuration 
 // Use a thinking budget value appropriate for your model (example value shown here) 
 final 
  
 thinkingConfig 
  
 = 
  
 ThinkingConfig 
 ( 
 thinkingBudget: 
  
 1024 
 ); 
 final 
  
 generationConfig 
  
 = 
  
 GenerationConfig 
 ( 
  
 thinkingConfig: 
  
 thinkingConfig 
 ); 
 // Specify the config as part of creating the `GenerativeModel` instance 
 final 
  
 model 
  
 = 
  
 FirebaseAI 
 . 
 googleAI 
 (). 
 generativeModel 
 ( 
  
 model: 
  
 ' GEMINI_MODEL_NAME 
' 
 , 
  
 config: 
  
 generationConfig 
 , 
 ); 
 // ...

Unity

Set the values of the parameters in a GenerationConfig as part of creating a GenerativeModel instance.

  // ... 
 // Set the thinking configuration 
 // Use a thinking budget value appropriate for your model (example value shown here) 
 var 
  
 thinkingConfig 
  
 = 
  
 new 
  
 ThinkingConfig 
 ( 
 thinkingBudget 
 : 
  
 1024 
 ); 
 var 
  
 generationConfig 
  
 = 
  
 new 
  
 GenerationConfig 
 ( 
  
 thinkingConfig 
 : 
  
 thinkingConfig 
 ); 
 // Specify the config as part of creating the `GenerativeModel` instance 
 var 
  
 model 
  
 = 
  
 FirebaseAI 
 . 
 GetInstance 
 ( 
 FirebaseAI 
 . 
 Backend 
 . 
 GoogleAI 
 ()). 
 GetGenerativeModel 
 ( 
  
 modelName 
 : 
  
 " GEMINI_MODEL_NAME 
" 
 , 
  
 generationConfig 
 : 
  
 generationConfig 
 ); 
 // ...

Supported thinking budget values

The following table lists the thinking budget values that you can set for each model by configuring the model's thinkingBudget .

Model	Default value	Available range for thinking budget		Value to disable thinking	Value to enable dynamic thinking
Model	Default value			Value to disable thinking	Value to enable dynamic thinking	Minimum value	Maximum value
Gemini 2.5 Pro	`8,192`	`128`	`32,768`	cannot be turned off	`-1`
Gemini 2.5 Flash	`8,192`	`1`	`24,576`	`0`	`-1`
Gemini 2.5 Flash‑Lite	`0` (thinking is disabled by default)	`512`	`24,576`	`0` (or don't configure thinking budget at all)	`-1`

Disable thinking

For some easier tasks , the thinking capability isn't necessary, and traditional inference is sufficient. Or if reducing latency is a priority, you may not want the model to take any more time than necessary to generate a response.

In these situations, you can disable (or turn off) thinking:

Gemini 2.5 Pro : thinking cannot be disabled
Gemini 2.5 Flash : set thinkingBudget to 0 tokens
Gemini 2.5 Flash‑Lite : thinking is disabled by default

Enable dynamic thinking

You can let the model decide when and how much it thinks (called dynamic thinking ) by setting thinkingBudget to -1 . The model can use as many tokens as it decides is appropriate, up to its maximum token value listed above.

Task complexity

Easy tasks — thinking could be turned offStraightforward requests where complex reasoning isn't required, such as fact retrieval or classification. Examples:
- "Where was DeepMind founded?"
- "Is this email asking for a meeting or just providing information?"
Medium tasks — default budget or some additional thinking budget neededCommon requests that benefit from a degree of step-by-step processing or deeper understanding. Examples:
- "Create an analogy between photosynthesis and growing up."
- "Compare and contrast electric cars and hybrid cars."
Hard tasks — maximum thinking budget may be neededTruly complex challenges, such as solving complex math problems or coding tasks. These types of tasks require the model to engage its full reasoning and planning capabilities, often involving many internal steps before providing an answer. Examples:
- "Solve problem 1 in AIME 2025: Find the sum of all integer bases b > 9 for which 17b is a divisor of 97b."
- "Write Python code for a web application that visualizes real-time stock market data, including user authentication. Make it as efficient as possible."

Include thought summaries in responses

Thought summaries are synthesized versions of the model's raw thoughts and offer insights into the model's internal reasoning process.

Here are some reasons to include thought summaries in responses:

You can display the thought summary in your app's UI or make them accessible to your users. The thought summary is returned as a separate part in the response so that you have more control over how it's used in your app.
If you also enable AI monitoring in the Firebase console , then thought summaries display in the console where you can inspect the model's detailed reasoning to help you debug and refine your prompts.

Here are some key notes about thought summaries:

Thought summaries are not controlled by thinking budgets (budgets only apply to the model's raw thoughts). However, if thinking is disabled , then the model won't return a thought summary.
Thought summaries are considered part of the model's regular generated-text response and count as output tokens.

Enable thought summaries

Click your Gemini API provider to view provider-specific content and code on this page.

You can enable thought summaries by setting includeThoughts to true in your model configuration. You can then access the summary by checking the thoughtSummary field from the response.

Here's an example demonstrating how to enable and retrieve thought summaries with the response: