Gemini 2.0 Flash-Lite

Caution: As of March 6, 2026, gemini-2.0-flash-001 and gemini-2.0-flash-lite-001 are only available for existing customers. This includes both model serving and Provisioned Throughput. New projects should use gemini-2.5-flash , gemini-2.5-flash-lite , or more recent releases.

Gemini 2.0 Flash-Lite is our fastest Gemini 2.0 model, optimized for cost efficiency and low latency.

Try in Vertex AI View in Model Garden (Preview) Deploy example app

Note: To use the "Deploy example app" feature, you need a Google Cloud project with billing and Vertex AI API enabled.

Model ID

gemini-2.0-flash-lite

Supported inputs & outputs

Inputs:
Text , Code , Images , Audio , Video
Outputs:
Text

Token limits

Maximum input tokens: 1,048,576
Maximum output tokens: 8,192 (default)

Capabilities

Supported

Not supported

Consumption options

Supported

Not supported

See Consumption options for more information.

Input size limit

500 MB

Technical specifications

Images

Maximum images per prompt: 3,000
Maximum file size per file for inline data or direct uploads through the console: 7 MB
Maximum file size per file from Google Cloud Storage: 30 MB
Maximum tokens per minute (TPM):
- High/Medium/Default media resolution:
  - US/Asia: 6.7 M
  - EU: 2.6 M
- Low media resolution:
  - US/Asia: 2.6 M
  - EU: 2.6 M
Supported MIME types:
image/png , image/jpeg , image/webp , image/heic , image/heif

Documents

Maximum number of files per prompt: 3,000
Maximum number of pages per file: 1,000
Maximum file size per file for the API or Cloud Storage imports: 50 MB
Maximum file size per file for direct uploads through the console: 7 MB
Maximum tokens per minute (TPM) per project1:
- US/Asia: 3.4 M
- EU: 3.4 M
Supported MIME types:

Video

Maximum video length (with audio): Approximately 45 minutes
Maximum video length (without audio): Approximately 1 hour
Maximum number of videos per prompt: 10
Maximum tokens per minute (TPM):
- High/Medium/Default media resolution:
  - US/Asia: 6.3 M
  - EU: 3.2 M
- Low media resolution:
  - US/Asia: 3.2 M
  - EU: 3.2 M
Supported MIME types:
video/x-flv , video/quicktime , video/mpeg , video/mpegs , video/mpg , video/mp4 , video/webm , video/wmv , video/3gpp

Audio

Maximum audio length per prompt: Approximately 8.4 hours, or up to 1 million tokens
Maximum number of audio files per prompt: 1
Speech understanding for: Audio summarization, transcription, and translation
Maximum tokens per minute (TPM):
- US/Asia: 3.5 M
- EU: 3.5 M
Supported MIME types:
audio/x-aac , audio/flac , audio/mp3 , audio/m4a , audio/mpeg , audio/mpga , audio/mp4 , audio/ogg , audio/pcm , audio/wav , audio/webm

Parameter defaults

Temperature: 0.0-2.0 (default 1.0)
topP: 0.0-1.0 (default 0.95)
topK: 64 (fixed)
candidateCount: 1–8 (default 1)

Supported regions

Model availability

Global

global

United States

us-central1
us-east1
us-east4
us-east5
us-south1
us-west1
us-west4

Europe

europe-central2
europe-north1
europe-southwest1
europe-west1
europe-west4
europe-west8
europe-west9

ML processing

United States

Multi-region

Europe

Multi-region

See Deployments and endpoints for more information.

Knowledge cutoff date

June 2024

Versions

gemini-2.0-flash-lite-001

Launch stage: GA
Release date: February 25, 2025
Discontinuation date: June 1, 2026

Security controls

Online prediction

Data residency
CMEK
VPC-SC
AXT

Batch prediction

Data residency
CMEK
VPC-SC
AXT

Tuning

Data residency
CMEK
VPC-SC
AXT

RAG Engine

Data residency
CMEK
VPC-SC
AXT

Context caching

Data residency
CMEK
VPC-SC
AXT

Grounding with Google Search and Grounding with Google Maps

Data residency
CMEK
VPC-SC
AXT

See Security controls for more information.

Supported languages

See Supported languages .

Pricing

See Pricing .

Gemini 2.0 Flash-Lite Stay organized with collections Save and categorize content based on your preferences.

Gemini 2.0 Flash-Lite