Analyze documents (like PDFs) using the Gemini API
Stay organized with collectionsSave and categorize content based on your preferences.
You can ask aGeminimodel to analyze document files
(like PDFs and plain-text files) that you provide
either inline (base64-encoded) or via URL. When you useFirebase AI Logic,
you can make this request directly from your app.
With this capability, you can do things like:
Analyze diagrams, charts, and tables inside documents
Extract information into structured output formats
Answer questions about visual and text contents in documents
Summarize documents
Transcribe document content (for example, into HTML), preserving layouts and
formatting, for use in downstream applications (such as in RAG pipelines)
Click yourGemini APIprovider to view provider-specific content
and code on this page.
If you haven't already, complete thegetting started guide, which describes how to
set up your Firebase project, connect your app to Firebase, add the SDK,
initialize the backend service for your chosenGemini APIprovider, and
create aGenerativeModelinstance.
For testing and iterating on your prompts and even
getting a generated code snippet, we recommend usingGoogle AI Studio.
Need a sample PDF file?
You can use this publicly available file with a MIME type ofapplication/pdf(view or download file).https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf
Generate text from PDF files (base64-encoded)
Before trying this sample, complete theBefore you beginsection of this guide
to set up your project and app. In that section, you'll also click a button for your chosenGemini APIprovider so that you see provider-specific content
on this page.
You can ask aGeminimodel to
generate text by prompting with text and PDFs—providing each
input file'smimeTypeand the file itself. Findrequirements and recommendations for input fileslater on this page.
Swift
You can callgenerateContent()to generate text from multimodal input of text and PDFs.
importFirebaseAI// Initialize the Gemini Developer API backend serviceletai=FirebaseAI.firebaseAI(backend:.googleAI())// Create a `GenerativeModel` instance with a model that supports your use caseletmodel=ai.generativeModel(modelName:"gemini-2.5-flash")// Provide the PDF as `Data` with the appropriate MIME typeletpdf=tryInlineDataPart(data:Data(contentsOf:pdfURL),mimeType:"application/pdf")// Provide a text prompt to include with the PDF fileletprompt="Summarize the important results in this report."// To generate text output, call `generateContent` with the PDF file and text promptletresponse=tryawaitmodel.generateContent(pdf,prompt)// Print the generated text, handling the case where it might be nilprint(response.text??"No text in response.")
Kotlin
You can callgenerateContent()to generate text from multimodal input of text and PDFs.
For Kotlin, the methods in this SDK are suspend functions and need to be called
from aCoroutine scope.
// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use casevalmodel=Firebase.ai(backend=GenerativeBackend.googleAI()).generativeModel("gemini-2.5-flash")valcontentResolver=applicationContext.contentResolver// Provide the URI for the PDF file you want to send to the modelvalinputStream=contentResolver.openInputStream(pdfUri)if(inputStream!=null){// Check if the PDF file loaded successfullyinputStream.use{stream->// Provide a prompt that includes the PDF file specified above and textvalprompt=content{inlineData(bytes=stream.readBytes(),mimeType="application/pdf"// Specify the appropriate PDF file MIME type)text("Summarize the important results in this report.")}// To generate text output, call `generateContent` with the promptvalresponse=generativeModel.generateContent(prompt)// Log the generated text, handling the case where it might be nullLog.d(TAG,response.text?:"")}}else{Log.e(TAG,"Error getting input stream for file.")// Handle the error appropriately}
Java
You can callgenerateContent()to generate text from multimodal input of text and PDFs.
// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use caseGenerativeModelai=FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel("gemini-2.5-flash");// Use the GenerativeModelFutures Java compatibility layer which offers// support for ListenableFuture and Publisher APIsGenerativeModelFuturesmodel=GenerativeModelFutures.from(ai);ContentResolverresolver=getApplicationContext().getContentResolver();// Provide the URI for the PDF file you want to send to the modeltry(InputStreamstream=resolver.openInputStream(pdfUri)){if(stream!=null){byte[]audioBytes=stream.readAllBytes();stream.close();// Provide a prompt that includes the PDF file specified above and textContentprompt=newContent.Builder().addInlineData(audioBytes,"application/pdf")// Specify the appropriate PDF file MIME type.addText("Summarize the important results in this report.").build();// To generate text output, call `generateContent` with the promptListenableFuture<GenerateContentResponse>response=model.generateContent(prompt);Futures.addCallback(response,newFutureCallback<GenerateContentResponse>(){@OverridepublicvoidonSuccess(GenerateContentResponseresult){Stringtext=result.getText();Log.d(TAG,(text==null)?"":text);}@OverridepublicvoidonFailure(Throwablet){Log.e(TAG,"Failed to generate a response",t);}},executor);}else{Log.e(TAG,"Error getting input stream for file.");// Handle the error appropriately}}catch(IOExceptione){Log.e(TAG,"Failed to read the pdf file",e);}catch(URISyntaxExceptione){Log.e(TAG,"Invalid pdf file",e);}
Web
You can callgenerateContent()to generate text from multimodal input of text and PDFs.
import{initializeApp}from"firebase/app";import{getAI,getGenerativeModel,GoogleAIBackend}from"firebase/ai";// TODO(developer) Replace the following with your app's Firebase configuration// See: https://firebase.google.com/docs/web/learn-more#config-objectconstfirebaseConfig={// ...};// Initialize FirebaseAppconstfirebaseApp=initializeApp(firebaseConfig);// Initialize the Gemini Developer API backend serviceconstai=getAI(firebaseApp,{backend:newGoogleAIBackend()});// Create a `GenerativeModel` instance with a model that supports your use caseconstmodel=getGenerativeModel(ai,{model:"gemini-2.5-flash"});// Converts a File object to a Part object.asyncfunctionfileToGenerativePart(file){constbase64EncodedDataPromise=newPromise((resolve)=>{constreader=newFileReader();reader.onloadend=()=>resolve(reader.result.split(','));reader.readAsDataURL(file);});return{inlineData:{data:awaitbase64EncodedDataPromise,mimeType:file.type},};}asyncfunctionrun(){// Provide a text prompt to include with the PDF fileconstprompt="Summarize the important results in this report.";// Prepare PDF file for inputconstfileInputEl=document.querySelector("input[type=file]");constpdfPart=awaitfileToGenerativePart(fileInputEl.files);// To generate text output, call `generateContent` with the text and PDF fileconstresult=awaitmodel.generateContent([prompt,pdfPart]);// Log the generated text, handling the case where it might be undefinedconsole.log(result.response.text()??"No text in response.");}run();
Dart
You can callgenerateContent()to generate text from multimodal input of text and PDFs.
import'package:firebase_ai/firebase_ai.dart';import'package:firebase_core/firebase_core.dart';import'firebase_options.dart';// Initialize FirebaseAppawaitFirebase.initializeApp(options:DefaultFirebaseOptions.currentPlatform,);// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use casefinalmodel=FirebaseAI.googleAI().generativeModel(model:'gemini-2.5-flash');// Provide a text prompt to include with the PDF filefinalprompt=TextPart("Summarize the important results in this report.");// Prepare the PDF file for inputfinaldoc=awaitFile('document0.pdf').readAsBytes();// Provide the PDF file as `Data` with the appropriate PDF file MIME typefinaldocPart=InlineDataPart('application/pdf',doc);// To generate text output, call `generateContent` with the text and PDF filefinalresponse=awaitmodel.generateContent([Content.multi([prompt,docPart])]);// Print the generated textprint(response.text);
Unity
You can callGenerateContentAsync()to generate text from multimodal input of text and PDFs.
usingFirebase;usingFirebase.AI;// Initialize the Gemini Developer API backend servicevarai=FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());// Create a `GenerativeModel` instance with a model that supports your use casevarmodel=ai.GetGenerativeModel(modelName:"gemini-2.5-flash");// Provide a text prompt to include with the PDF filevarprompt=ModelContent.Text("Summarize the important results in this report.");// Provide the PDF file as `data` with the appropriate PDF file MIME typevardoc=ModelContent.InlineData("application/pdf",System.IO.File.ReadAllBytes(System.IO.Path.Combine(UnityEngine.Application.streamingAssetsPath,"document0.pdf")));// To generate text output, call `GenerateContentAsync` with the text and PDF filevarresponse=awaitmodel.GenerateContentAsync(new[]{prompt,doc});// Print the generated textUnityEngine.Debug.Log(response.Text??"No text in response.");
Learn how to choose amodelappropriate for your use case and app.
Stream the response
Before trying this sample, complete theBefore you beginsection of this guide
to set up your project and app. In that section, you'll also click a button for your chosenGemini APIprovider so that you see provider-specific content
on this page.
You can achieve faster interactions by not waiting for the entire result from
the model generation, and instead use streaming to handle partial results.
To stream the response, callgenerateContentStream.
View example: Stream generated text from PDF files
Swift
You can callgenerateContentStream()to stream generated text from multimodal input of text and PDFs.
importFirebaseAI// Initialize the Gemini Developer API backend serviceletai=FirebaseAI.firebaseAI(backend:.googleAI())// Create a `GenerativeModel` instance with a model that supports your use caseletmodel=ai.generativeModel(modelName:"gemini-2.5-flash")// Provide the PDF as `Data` with the appropriate MIME typeletpdf=tryInlineDataPart(data:Data(contentsOf:pdfURL),mimeType:"application/pdf")// Provide a text prompt to include with the PDF fileletprompt="Summarize the important results in this report."// To stream generated text output, call `generateContentStream` with the PDF file and text promptletcontentStream=trymodel.generateContentStream(pdf,prompt)// Print the generated text, handling the case where it might be nilfortryawaitchunkincontentStream{iflettext=chunk.text{print(text)}}
Kotlin
You can callgenerateContentStream()to stream generated text from multimodal input of text and PDFs.
For Kotlin, the methods in this SDK are suspend functions and need to be called
from aCoroutine scope.
// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use casevalmodel=Firebase.ai(backend=GenerativeBackend.googleAI()).generativeModel("gemini-2.5-flash")valcontentResolver=applicationContext.contentResolver// Provide the URI for the PDF you want to send to the modelvalinputStream=contentResolver.openInputStream(pdfUri)if(inputStream!=null){// Check if the PDF file loaded successfullyinputStream.use{stream->// Provide a prompt that includes the PDF file specified above and textvalprompt=content{inlineData(bytes=stream.readBytes(),mimeType="application/pdf"// Specify the appropriate PDF file MIME type)text("Summarize the important results in this report.")}// To stream generated text output, call `generateContentStream` with the promptvarfullResponse=""generativeModel.generateContentStream(prompt).collect{chunk->// Log the generated text, handling the case where it might be nullvalchunkText=chunk.text?:""Log.d(TAG,chunkText)fullResponse+=chunkText}}}else{Log.e(TAG,"Error getting input stream for file.")// Handle the error appropriately}
Java
You can callgenerateContentStream()to stream generated text from multimodal input of text and PDFs.
For Java, the streaming methods in this SDK return aPublishertype from theReactive Streams library.
// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use caseGenerativeModelai=FirebaseAI.getInstance(GenerativeBackend.googleAI()).generativeModel("gemini-2.5-flash");// Use the GenerativeModelFutures Java compatibility layer which offers// support for ListenableFuture and Publisher APIsGenerativeModelFuturesmodel=GenerativeModelFutures.from(ai);ContentResolverresolver=getApplicationContext().getContentResolver();// Provide the URI for the PDF file you want to send to the modeltry(InputStreamstream=resolver.openInputStream(pdfUri)){if(stream!=null){byte[]audioBytes=stream.readAllBytes();stream.close();// Provide a prompt that includes the PDF file specified above and textContentprompt=newContent.Builder().addInlineData(audioBytes,"application/pdf")// Specify the appropriate PDF file MIME type.addText("Summarize the important results in this report.").build();// To stream generated text output, call `generateContentStream` with the promptPublisher<GenerateContentResponse>streamingResponse=model.generateContentStream(prompt);StringBuilderfullResponse=newStringBuilder();streamingResponse.subscribe(newSubscriber<GenerateContentResponse>(){@OverridepublicvoidonNext(GenerateContentResponsegenerateContentResponse){Stringchunk=generateContentResponse.getText();Stringtext=(chunk==null)?"":chunk;Log.d(TAG,text);fullResponse.append(text);}@OverridepublicvoidonComplete(){Log.d(TAG,fullResponse.toString());}@OverridepublicvoidonError(Throwablet){Log.e(TAG,"Failed to generate a response",t);}@OverridepublicvoidonSubscribe(Subscriptions){}});}else{Log.e(TAG,"Error getting input stream for file.");// Handle the error appropriately}}catch(IOExceptione){Log.e(TAG,"Failed to read the pdf file",e);}catch(URISyntaxExceptione){Log.e(TAG,"Invalid pdf file",e);}
Web
You can callgenerateContentStream()to stream generated text from multimodal input of text and PDFs.
import{initializeApp}from"firebase/app";import{getAI,getGenerativeModel,GoogleAIBackend}from"firebase/ai";// TODO(developer) Replace the following with your app's Firebase configuration// See: https://firebase.google.com/docs/web/learn-more#config-objectconstfirebaseConfig={// ...};// Initialize FirebaseAppconstfirebaseApp=initializeApp(firebaseConfig);// Initialize the Gemini Developer API backend serviceconstai=getAI(firebaseApp,{backend:newGoogleAIBackend()});// Create a `GenerativeModel` instance with a model that supports your use caseconstmodel=getGenerativeModel(ai,{model:"gemini-2.5-flash"});// Converts a File object to a Part object.asyncfunctionfileToGenerativePart(file){constbase64EncodedDataPromise=newPromise((resolve)=>{constreader=newFileReader();reader.onloadend=()=>resolve(reader.result.split(','));reader.readAsDataURL(file);});return{inlineData:{data:awaitbase64EncodedDataPromise,mimeType:file.type},};}asyncfunctionrun(){// Provide a text prompt to include with the PDF fileconstprompt="Summarize the important results in this report.";// Prepare PDF file for inputconstfileInputEl=document.querySelector("input[type=file]");constpdfPart=awaitfileToGenerativePart(fileInputEl.files);// To stream generated text output, call `generateContentStream` with the text and PDF fileconstresult=awaitmodel.generateContentStream([prompt,pdfPart]);// Log the generated textforawait(constchunkofresult.stream){constchunkText=chunk.text();console.log(chunkText);}}run();
Dart
You can callgenerateContentStream()to stream generated text from multimodal input of text and PDFs.
import'package:firebase_ai/firebase_ai.dart';import'package:firebase_core/firebase_core.dart';import'firebase_options.dart';// Initialize FirebaseAppawaitFirebase.initializeApp(options:DefaultFirebaseOptions.currentPlatform,);// Initialize the Gemini Developer API backend service// Create a `GenerativeModel` instance with a model that supports your use casefinalmodel=FirebaseAI.googleAI().generativeModel(model:'gemini-2.5-flash');// Provide a text prompt to include with the PDF filefinalprompt=TextPart("Summarize the important results in this report.");// Prepare the PDF file for inputfinaldoc=awaitFile('document0.pdf').readAsBytes();// Provide the PDF file as `Data` with the appropriate PDF file MIME typefinaldocPart=InlineDataPart('application/pdf',doc);// To generate text output, call `generateContentStream` with the text and PDF filefinalresponse=awaitmodel.generateContentStream([Content.multi([prompt,docPart])]);// Print the generated textawaitfor(finalchunkinresponse){print(chunk.text);}
usingFirebase;usingFirebase.AI;// Initialize the Gemini Developer API backend servicevarai=FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI());// Create a `GenerativeModel` instance with a model that supports your use casevarmodel=ai.GetGenerativeModel(modelName:"gemini-2.5-flash");// Provide a text prompt to include with the PDF filevarprompt=ModelContent.Text("Summarize the important results in this report.");// Provide the PDF file as `data` with the appropriate PDF file MIME typevardoc=ModelContent.InlineData("application/pdf",System.IO.File.ReadAllBytes(System.IO.Path.Combine(UnityEngine.Application.streamingAssetsPath,"document0.pdf")));// To stream generated text output, call `GenerateContentStreamAsync` with the text and PDF filevarresponseStream=model.GenerateContentStreamAsync(new[]{prompt,doc});// Print the generated textawaitforeach(varresponseinresponseStream){if(!string.IsNullOrWhiteSpace(response.Text)){UnityEngine.Debug.Log(response.Text);}}
Learn how to choose amodelappropriate for your use case and app.
Requirements and recommendations for input documents
Note that a file provided as inline data is encoded to base64 in transit, which
increases the size of the request. You get an HTTP 413 error if a request is
too large.
See "Supported input files and requirements" page to learn detailed information
about the following:
Geminimultimodal models support the following document MIME types:
PDF -application/pdf
Text -text/plain
Limits per request
PDFs are treated as images, so a single page of a PDF is treated as one
image. The number of pages allowed in a prompt is limited to the number of
images theGeminimultimodal models can support.
Maximum files per request: 3,000 files
Maximum pages per file: 1,000 pages per file
Maximum size per file: 50 MB per file
What else can you do?
Learn how tocount tokensbefore sending long prompts to the model.
Set upCloud Storage for Firebaseso that you can include large files in your multimodal requests and have a
more managed solution for providing files in prompts.
Files can include images, PDFs, video, and audio.
Start thinking about preparing for production (see theproduction checklist),
including:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-05 UTC."],[],[],null,[]]