All Gemini 1.0 and Gemini 1.5 models are now retired.
To avoid service disruption, update to a newer model (for example, gemini-2.5-flash-lite ). Learn more.

Analyze audio files using the Gemini API

You can ask a Gemini model to analyze audio files that you provide either inline (base64-encoded) or via URL. When you use Firebase AI Logic , you can make this request directly from your app.

With this capability, you can do things like:

Describe, summarize, or answer questions about audio content
Transcribe audio content
Analyze specific segments of audio using timestamps

Jump to code samples Jump to code for streamed responses

See other guides for additional options for working with audio
Generate structured output Multi-turn chat Bidirectional streaming

Before you begin

Click your Gemini API provider to view provider-specific content and code on this page.

If you haven't already, complete the getting started guide , which describes how to set up your Firebase project, connect your app to Firebase, add the SDK, initialize the backend service for your chosen Gemini API provider, and create a GenerativeModel instance.

For testing and iterating on your prompts, we recommend using Google AI Studio .

Need a sample audio file?

You can use this publicly available file with a MIME type of audio/mp3 ( view or download file ). https://storage.googleapis.com/cloud-samples-data/generative-ai/audio/pixel.mp3

Analyze audio files using the Gemini API

Before you begin

Generate text from audio files (base64-encoded)

Swift

Kotlin

Java

Web

Dart

Unity

Stream the response

View example: Stream generated text from audio files

Swift

Kotlin

Java

Web

Dart

Unity

Requirements and recommendations for input audio files

Supported audio MIME types

Limits per request

What else can you do?

Try out other capabilities

Learn how to control content generation

Learn more about the supported models

Analyze audio files using the Gemini API Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Generate text from audio files (base64-encoded)

Swift

Kotlin

Java

Web

Dart

Unity

Stream the response

View example: Stream generated text from audio files

Swift

Kotlin

Java

Web

Dart

Unity

Requirements and recommendations for input audio files

Supported audio MIME types

Limits per request

What else can you do?

Try out other capabilities

Learn how to control content generation

Learn more about the supported models

Analyze audio files using the Gemini API