The latest Gemini models, like Gemini 3.5 Flash , are available to use with Firebase AI Logic! Learn more.

All Imagen models will shut down as early as June 30, 2026 . Learn about migrating your apps to use Nano Banana.

Analyze video files using the Gemini API

You can ask a Gemini model to analyze video files that you provide either inline (base64-encoded) or via URL. When you use Firebase AI Logic , you can make this request directly from your app.

With this capability, you can do things like:

Caption and answer questions about videos
Analyze specific segments of a video using timestamps
Transcribe video content by processing both the audio track and visual frames
Describe, segment, and extract information from videos, including both the audio track and visual frames

This guide is about generating text from video input, but you can also generate images from video input .

Jump to code samples Jump to code for streamed responses

See other guides for additional options for working with video
Generate structured output Multi-turn chat Generate images

Before you begin

Click your Gemini API provider to view provider-specific content and code on this page.

If you haven't already, complete the getting started guide , which describes how to set up your Firebase project, connect your app to Firebase, add the SDK, initialize the backend service for your chosen Gemini API provider, and create a GenerativeModel instance.

For testing and iterating on your prompts, we recommend using Google AI Studio .

Need a sample video file?

You can use this publicly available file with a MIME type of video/mp4 ( view or download file ). https://storage.googleapis.com/cloud-samples-data/video/animals.mp4

Analyze video files using the Gemini API

Before you begin

Models that support this capability

Generate text from video files (base64-encoded)

Swift

Kotlin

Java

Web

Dart

Unity

Stream the response

View example: Stream generated text from video files

Swift

Kotlin

Java

Web

Dart

Unity

Requirements and recommendations for input video files

Supported video MIME types

Limits per request

What else can you do?

Try out other capabilities

Learn how to control content generation

Learn more about the supported models

Analyze video files using the Gemini API Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Models that support this capability

Generate text from video files (base64-encoded)

Swift

Kotlin

Java

Web

Dart

Unity

Stream the response

View example: Stream generated text from video files

Swift

Kotlin

Java

Web

Dart

Unity

Requirements and recommendations for input video files

Supported video MIME types

Limits per request

What else can you do?

Try out other capabilities

Learn how to control content generation

Learn more about the supported models

Analyze video files using the Gemini API