Gemini 3.1 Flash-Lite

Preview

This product or feature is a Generative AI Preview offering, subject to the "Pre-GA Offerings Terms" of the Google Cloud Service Specific Terms . For this Generative AI Preview offering, Customers may elect to use it for production or commercial purposes, or disclose Generated Output to third-parties, and may process personal data as outlined in the Cloud Data Processing Addendum , subject to the obligations and restrictions described in the agreement under which you access Google Cloud.

Gemini 3.1 Flash-Lite is our most cost-efficient Gemini model, optimized for low latency use cases for high-volume, cost-sensitive LLM traffic. It provides a significant quality increase over Gemini 2.0 Flash-Lite and Gemini 2.5 Flash-Lite models, matching Gemini 2.5 Flash performance across key capability areas:

  • Improved response quality:Aims to match 2.5 Flash performance.
  • Improved instruction following:Targeted improvements to serve as a reliable migration path for complex chatbot and instruction-heavy workflows.
  • Improved audio input:Improved audio-input quality for tasks like Automated Speech Recognition (ASR).
  • Expanded thinking support:You can control how much reasoning the model performs by choosing from minimal, low, medium, or high thinking levels . This feature lets you balance response quality and speed for your specific use case.

Try in Vertex AI (Preview) Deploy example app

Note: To use the "Deploy example app" feature, you need a Google Cloud project with billing and Vertex AI API enabled.
Model ID
gemini-3.1-flash-lite-preview
Supported inputs & outputs
  • Inputs:
    Text , Code , Images , Audio , Video , PDF
  • Outputs:
    Text
Token limits
  • Maximum input tokens: 1,048,576
  • Maximum output tokens: 65,535 (default)
See Consumption options for more information.
Technical specifications
Images
  • Maximum images per prompt: 3,000
  • Maximum file size per file for inline data or direct uploads through the console: 7 MB
  • Maximum file size per file from Google Cloud Storage: 30 MB
  • Maximum number of output images per prompt: 10
  • Supported MIME types:
    image/png , image/jpeg , image/webp , image/heic , image/heif
Documents
  • Maximum number of files per prompt: 3,000
  • Maximum number of pages per file: 1,000
  • Maximum file size per file: 50 MB
  • Supported MIME types:
    application/pdf , text/plain
Video
  • Maximum video length (with audio): Approximately 45 minutes
  • Maximum video length (without audio): Approximately 1 hour
  • Maximum number of videos per prompt: 10
  • Supported MIME types:
    video/x-flv , video/quicktime , video/mpeg , video/mpegs , video/mpg , video/mp4 , video/webm , video/wmv , video/3gpp
Audio
  • Maximum audio length per prompt: Approximately 8.4 hours, or up to 1 million tokens
  • Maximum number of audio files per prompt: 1
  • Supported MIME types:
    audio/x-aac , audio/flac , audio/mp3 , audio/m4a , audio/mpeg , audio/mpga , audio/mp4 , audio/ogg , audio/pcm , audio/wav , audio/webm
Parameter defaults
  • Temperature: 0.0-2.0 (default 1.0)
  • topP: 0.0-1.0 (default 0.95)
  • topK: 64 (fixed)
  • candidateCount: 1–8 (default 1)
Supported regions

Model availability

  • Global
    • global
See Deployments and endpoints for more information.
Knowledge cutoff date
January 2025
Versions
  • gemini-3.1-flash-lite-preview
    • Launch stage: Public preview
    • Release date: March 3, 2026
Supported languages
Pricing
See Pricing .
Create a Mobile Website
View Site in Mobile | Classic
Share by: