Gemini 3.1 Flash-Lite

Preview

This product or feature is a Generative AI Preview offering, subject to the "Pre-GA Offerings Terms" of the Google Cloud Service Specific Terms . For this Generative AI Preview offering, Customers may elect to use it for production or commercial purposes, or disclose Generated Output to third-parties, and may process personal data as outlined in the Cloud Data Processing Addendum , subject to the obligations and restrictions described in the agreement under which you access Google Cloud.

Gemini 3.1 Flash-Lite is our most cost-efficient Gemini model, optimized for low latency use cases for high-volume, cost-sensitive LLM traffic. It provides a significant quality increase over Gemini 2.0 Flash-Lite and Gemini 2.5 Flash-Lite models, matching Gemini 2.5 Flash performance across key capability areas:

Improved response quality:Aims to match 2.5 Flash performance.
Improved instruction following:Targeted improvements to serve as a reliable migration path for complex chatbot and instruction-heavy workflows.
Improved audio input:Improved audio-input quality for tasks like Automated Speech Recognition (ASR).
Expanded thinking support:You can control how much reasoning the model performs by choosing from minimal, low, medium, or high thinking levels . This feature lets you balance response quality and speed for your specific use case.

Try in Vertex AI (Preview) Deploy example app

Note: To use the "Deploy example app" feature, you need a Google Cloud project with billing and Vertex AI API enabled.

Model ID

gemini-3.1-flash-lite-preview

Supported inputs & outputs

Inputs:
Text , Code , Images , Audio , Video , PDF
Outputs:
Text

Token limits

Maximum input tokens: 1,048,576
Maximum output tokens: 65,535 (default)

Capabilities

Supported

Not supported

Consumption options

Supported

Not supported

See Consumption options for more information.

Technical specifications

Images

Maximum images per prompt: 3,000
Maximum file size per file for inline data or direct uploads through the console: 7 MB
Maximum file size per file from Google Cloud Storage: 30 MB
Maximum number of output images per prompt: 10
Supported MIME types:
image/png , image/jpeg , image/webp , image/heic , image/heif

Documents

Maximum number of files per prompt: 3,000
Maximum number of pages per file: 1,000
Maximum file size per file: 50 MB
Supported MIME types:
application/pdf , text/plain

Video

Maximum video length (with audio): Approximately 45 minutes
Maximum video length (without audio): Approximately 1 hour
Maximum number of videos per prompt: 10
Supported MIME types:
video/x-flv , video/quicktime , video/mpeg , video/mpegs , video/mpg , video/mp4 , video/webm , video/wmv , video/3gpp

Audio

Maximum audio length per prompt: Approximately 8.4 hours, or up to 1 million tokens
Maximum number of audio files per prompt: 1
Supported MIME types:
audio/x-aac , audio/flac , audio/mp3 , audio/m4a , audio/mpeg , audio/mpga , audio/mp4 , audio/ogg , audio/pcm , audio/wav , audio/webm

Parameter defaults

Temperature: 0.0-2.0 (default 1.0)
topP: 0.0-1.0 (default 0.95)
topK: 64 (fixed)
candidateCount: 1–8 (default 1)

Supported regions

Model availability

Global

global

See Deployments and endpoints for more information.

Knowledge cutoff date

January 2025

Versions

gemini-3.1-flash-lite-preview

Launch stage: Public preview
Release date: March 3, 2026

Supported languages

See Supported languages .

Pricing

See Pricing .

Gemini 3.1 Flash-Lite Stay organized with collections Save and categorize content based on your preferences.

Gemini 3.1 Flash-Lite