Stay organized with collectionsSave and categorize content based on your preferences.
You can use ML Kit to recognize text in images. ML Kit has both a
general-purpose API suitable for recognizing text in images, such as the
text of a street sign, and an API optimized for recognizing the text of
documents. The general-purpose API has both on-device and cloud-based models.
Document text recognition is available only as a cloud-based model. See theoverviewfor a comparison of the
cloud and on-device models.
Optional but recommended: If you use the on-device API, configure your
app to automatically download the ML model to the device after your app is
installed from the Play Store.
To do so, add the following declaration to your app'sAndroidManifest.xmlfile:
If you do not enable install-time model downloads, the model will be
downloaded the first time you run the on-device detector. Requests you make
before the download has completed will produce no results.
If you want to use the Cloud-based model, and you have not already enabled
the Cloud-based APIs for your project, do so now:
If you have not already upgraded your project to a Blaze pricing plan, clickUpgradeto do so. (You will be prompted to upgrade only if your
project isn't on the Blaze plan.)
Only Blaze-level projects can use Cloud-based APIs.
If Cloud-based APIs aren't already enabled, clickEnable Cloud-based
APIs.
If you want to use only the on-device model, you can skip this step.
Now you are ready to start recognizing text in images.
Input image guidelines
For ML Kit to accurately recognize text, input images must contain
text that is represented by sufficient pixel data. Ideally, for Latin
text, each character should be at least 16x16 pixels. For Chinese,
Japanese, and Korean text (only supported by the cloud-based APIs), each
character should be 24x24 pixels. For all languages, there is generally no
accuracy benefit for characters to be larger than 24x24 pixels.
So, for example, a 640x480 image might work well to scan a business card
that occupies the full width of the image. To scan a document printed on
letter-sized paper, a 720x1280 pixel image might be required.
Poor image focus can hurt text recognition accuracy. If you aren't
getting acceptable results, try asking the user to recapture the image.
If you are recognizing text in a real-time application, you might also
want to consider the overall dimensions of the input images. Smaller
images can be processed faster, so to reduce latency, capture images at
lower resolutions (keeping in mind the above accuracy requirements) and
ensure that the text occupies as much of the image as possible. Also seeTips to improve real-time performance.
Recognize text in images
To recognize text in an image using either an on-device or cloud-based model,
run the text recognizer as described below.
1. Run the text recognizer
To recognize text in an image, create aFirebaseVisionImageobject
from either aBitmap,media.Image,ByteBuffer, byte array, or a file on
the device. Then, pass theFirebaseVisionImageobject to theFirebaseVisionTextRecognizer'sprocessImagemethod.
To create aFirebaseVisionImageobject from amedia.Imageobject, such as when capturing an image from a
device's camera, pass themedia.Imageobject and the image's
rotation toFirebaseVisionImage.fromMediaImage().
If you use theCameraXlibrary, theOnImageCapturedListenerandImageAnalysis.Analyzerclasses calculate the rotation value
for you, so you just need to convert the rotation to one of ML Kit'sROTATION_constants before callingFirebaseVisionImage.fromMediaImage():
Java
privateclassYourAnalyzerimplementsImageAnalysis.Analyzer{privateintdegreesToFirebaseRotation(intdegrees){switch(degrees){case0:returnFirebaseVisionImageMetadata.ROTATION_0;case90:returnFirebaseVisionImageMetadata.ROTATION_90;case180:returnFirebaseVisionImageMetadata.ROTATION_180;case270:returnFirebaseVisionImageMetadata.ROTATION_270;default:thrownewIllegalArgumentException("Rotation must be 0, 90, 180, or 270.");}}@Overridepublicvoidanalyze(ImageProxyimageProxy,intdegrees){if(imageProxy==null||imageProxy.getImage()==null){return;}ImagemediaImage=imageProxy.getImage();introtation=degreesToFirebaseRotation(degrees);FirebaseVisionImageimage=FirebaseVisionImage.fromMediaImage(mediaImage,rotation);// Pass image to an ML Kit Vision API// ...}}
Kotlin
privateclassYourImageAnalyzer:ImageAnalysis.Analyzer{privatefundegreesToFirebaseRotation(degrees:Int):Int=when(degrees){0->FirebaseVisionImageMetadata.ROTATION_090->FirebaseVisionImageMetadata.ROTATION_90180->FirebaseVisionImageMetadata.ROTATION_180270->FirebaseVisionImageMetadata.ROTATION_270else->throwException("Rotation must be 0, 90, 180, or 270.")}overridefunanalyze(imageProxy:ImageProxy?,degrees:Int){valmediaImage=imageProxy?.imagevalimageRotation=degreesToFirebaseRotation(degrees)if(mediaImage!=null){valimage=FirebaseVisionImage.fromMediaImage(mediaImage,imageRotation)// Pass image to an ML Kit Vision API// ...}}}
If you don't use a camera library that gives you the image's rotation, you
can calculate it from the device's rotation and the orientation of camera
sensor in the device:
Java
privatestaticfinalSparseIntArrayORIENTATIONS=newSparseIntArray();static{ORIENTATIONS.append(Surface.ROTATION_0,90);ORIENTATIONS.append(Surface.ROTATION_90,0);ORIENTATIONS.append(Surface.ROTATION_180,270);ORIENTATIONS.append(Surface.ROTATION_270,180);}/*** Get the angle by which an image must be rotated given the device's current* orientation.*/@RequiresApi(api=Build.VERSION_CODES.LOLLIPOP)privateintgetRotationCompensation(StringcameraId,Activityactivity,Contextcontext)throwsCameraAccessException{// Get the device's current rotation relative to its "native" orientation.// Then, from the ORIENTATIONS table, look up the angle the image must be// rotated to compensate for the device's rotation.intdeviceRotation=activity.getWindowManager().getDefaultDisplay().getRotation();introtationCompensation=ORIENTATIONS.get(deviceRotation);// On most devices, the sensor orientation is 90 degrees, but for some// devices it is 270 degrees. For devices with a sensor orientation of// 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.CameraManagercameraManager=(CameraManager)context.getSystemService(CAMERA_SERVICE);intsensorOrientation=cameraManager.getCameraCharacteristics(cameraId).get(CameraCharacteristics.SENSOR_ORIENTATION);rotationCompensation=(rotationCompensation+sensorOrientation+270)%360;// Return the corresponding FirebaseVisionImageMetadata rotation value.intresult;switch(rotationCompensation){case0:result=FirebaseVisionImageMetadata.ROTATION_0;break;case90:result=FirebaseVisionImageMetadata.ROTATION_90;break;case180:result=FirebaseVisionImageMetadata.ROTATION_180;break;case270:result=FirebaseVisionImageMetadata.ROTATION_270;break;default:result=FirebaseVisionImageMetadata.ROTATION_0;Log.e(TAG,"Bad rotation value: "+rotationCompensation);}returnresult;}
privatevalORIENTATIONS=SparseIntArray()init{ORIENTATIONS.append(Surface.ROTATION_0,90)ORIENTATIONS.append(Surface.ROTATION_90,0)ORIENTATIONS.append(Surface.ROTATION_180,270)ORIENTATIONS.append(Surface.ROTATION_270,180)}/*** Get the angle by which an image must be rotated given the device's current* orientation.*/@RequiresApi(api=Build.VERSION_CODES.LOLLIPOP)@Throws(CameraAccessException::class)privatefungetRotationCompensation(cameraId:String,activity:Activity,context:Context):Int{// Get the device's current rotation relative to its "native" orientation.// Then, from the ORIENTATIONS table, look up the angle the image must be// rotated to compensate for the device's rotation.valdeviceRotation=activity.windowManager.defaultDisplay.rotationvarrotationCompensation=ORIENTATIONS.get(deviceRotation)// On most devices, the sensor orientation is 90 degrees, but for some// devices it is 270 degrees. For devices with a sensor orientation of// 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.valcameraManager=context.getSystemService(CAMERA_SERVICE)asCameraManagervalsensorOrientation=cameraManager.getCameraCharacteristics(cameraId).get(CameraCharacteristics.SENSOR_ORIENTATION)!!rotationCompensation=(rotationCompensation+sensorOrientation+270)%360// Return the corresponding FirebaseVisionImageMetadata rotation value.valresult:Intwhen(rotationCompensation){0->result=FirebaseVisionImageMetadata.ROTATION_090->result=FirebaseVisionImageMetadata.ROTATION_90180->result=FirebaseVisionImageMetadata.ROTATION_180270->result=FirebaseVisionImageMetadata.ROTATION_270else->{result=FirebaseVisionImageMetadata.ROTATION_0Log.e(TAG,"Bad rotation value:$rotationCompensation")}}returnresult}
To create aFirebaseVisionImageobject from a file URI, pass
the app context and file URI toFirebaseVisionImage.fromFilePath(). This is useful when you
use anACTION_GET_CONTENTintent to prompt the user to select
an image from their gallery app.
To create aFirebaseVisionImageobject from aByteBufferor a byte array, first calculate the image
rotation as described above formedia.Imageinput.
Then, create aFirebaseVisionImageMetadataobject
that contains the image's height, width, color encoding format,
and rotation:
Java
FirebaseVisionImageMetadatametadata=newFirebaseVisionImageMetadata.Builder().setWidth(480)// 480x360 is typically sufficient for.setHeight(360)// image recognition.setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21).setRotation(rotation).build();
valmetadata=FirebaseVisionImageMetadata.Builder().setWidth(480)// 480x360 is typically sufficient for.setHeight(360)// image recognition.setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21).setRotation(rotation).build()
FirebaseVisionTextRecognizerdetector=FirebaseVision.getInstance().getCloudTextRecognizer();// Or, to change the default settings:// FirebaseVisionTextRecognizer detector = FirebaseVision.getInstance()// .getCloudTextRecognizer(options);
// Or, to provide language hints to assist with language detection:// See https://cloud.google.com/vision/docs/languages for supported languagesFirebaseVisionCloudTextRecognizerOptionsoptions=newFirebaseVisionCloudTextRecognizerOptions.Builder().setLanguageHints(Arrays.asList("en","hi")).build();
Kotlin
valdetector=FirebaseVision.getInstance().cloudTextRecognizer// Or, to change the default settings:// val detector = FirebaseVision.getInstance().getCloudTextRecognizer(options)
// Or, to provide language hints to assist with language detection:// See https://cloud.google.com/vision/docs/languages for supported languagesvaloptions=FirebaseVisionCloudTextRecognizerOptions.Builder().setLanguageHints(listOf("en","hi")).build()
Finally, pass the image to theprocessImagemethod:
Java
Task<FirebaseVisionText>result=detector.processImage(image).addOnSuccessListener(newOnSuccessListener<FirebaseVisionText>(){@OverridepublicvoidonSuccess(FirebaseVisionTextfirebaseVisionText){// Task completed successfully// ...}}).addOnFailureListener(newOnFailureListener(){@OverridepublicvoidonFailure(@NonNullExceptione){// Task failed with an exception// ...}});
Kotlin
valresult=detector.processImage(image).addOnSuccessListener{firebaseVisionText->// Task completed successfully// ...}.addOnFailureListener{e->// Task failed with an exception// ...}
2. Extract text from blocks of recognized text
If the text recognition operation succeeds, aFirebaseVisionTextobject will be passed to the success
listener. AFirebaseVisionTextobject contains the full text recognized in
the image and zero or moreTextBlockobjects.
EachTextBlockrepresents a rectangular block of text, which contains zero or
moreLineobjects. EachLineobject contains zero or moreElementobjects, which represent words and word-like
entities (dates, numbers, and so on).
For eachTextBlock,Line, andElementobject, you can get the text
recognized in the region and the bounding coordinates of the region.
If you want use the on-device model to recognize text in a real-time
application, follow these guidelines to achieve the best framerates:
Throttle calls to the text recognizer. If a new video frame becomes
available while the text recognizer is running, drop the frame.
If you are using the output of the text recognizer to overlay graphics on
the input image, first get the result from ML Kit, then render the image
and overlay in a single step. By doing so, you render to the display surface
only once for each input frame.
If you use the Camera2 API, capture images inImageFormat.YUV_420_888format.
If you use the older Camera API, capture images inImageFormat.NV21format.
Consider capturing images at a lower resolution. However, also keep in mind
this API's image dimension requirements.
To recognize the text of a document, configure and run the cloud-based
document text recognizer as described below.
The document text recognition API, described below, provides an interface that
is intended to be more convenient for working with images of documents. However,
if you prefer the interface provided by theFirebaseVisionTextRecognizerAPI,
you can use it instead to scan documents by configuring the cloud text
recognizer touse the dense text model.
To use the document text recognition API:
1. Run the text recognizer
To recognize text in an image, create aFirebaseVisionImageobject from either
aBitmap,media.Image,ByteBuffer, byte array, or a file on the device.
Then, pass theFirebaseVisionImageobject to theFirebaseVisionDocumentTextRecognizer'sprocessImagemethod.
To create aFirebaseVisionImageobject from amedia.Imageobject, such as when capturing an image from a
device's camera, pass themedia.Imageobject and the image's
rotation toFirebaseVisionImage.fromMediaImage().
If you use theCameraXlibrary, theOnImageCapturedListenerandImageAnalysis.Analyzerclasses calculate the rotation value
for you, so you just need to convert the rotation to one of ML Kit'sROTATION_constants before callingFirebaseVisionImage.fromMediaImage():
Java
privateclassYourAnalyzerimplementsImageAnalysis.Analyzer{privateintdegreesToFirebaseRotation(intdegrees){switch(degrees){case0:returnFirebaseVisionImageMetadata.ROTATION_0;case90:returnFirebaseVisionImageMetadata.ROTATION_90;case180:returnFirebaseVisionImageMetadata.ROTATION_180;case270:returnFirebaseVisionImageMetadata.ROTATION_270;default:thrownewIllegalArgumentException("Rotation must be 0, 90, 180, or 270.");}}@Overridepublicvoidanalyze(ImageProxyimageProxy,intdegrees){if(imageProxy==null||imageProxy.getImage()==null){return;}ImagemediaImage=imageProxy.getImage();introtation=degreesToFirebaseRotation(degrees);FirebaseVisionImageimage=FirebaseVisionImage.fromMediaImage(mediaImage,rotation);// Pass image to an ML Kit Vision API// ...}}
Kotlin
privateclassYourImageAnalyzer:ImageAnalysis.Analyzer{privatefundegreesToFirebaseRotation(degrees:Int):Int=when(degrees){0->FirebaseVisionImageMetadata.ROTATION_090->FirebaseVisionImageMetadata.ROTATION_90180->FirebaseVisionImageMetadata.ROTATION_180270->FirebaseVisionImageMetadata.ROTATION_270else->throwException("Rotation must be 0, 90, 180, or 270.")}overridefunanalyze(imageProxy:ImageProxy?,degrees:Int){valmediaImage=imageProxy?.imagevalimageRotation=degreesToFirebaseRotation(degrees)if(mediaImage!=null){valimage=FirebaseVisionImage.fromMediaImage(mediaImage,imageRotation)// Pass image to an ML Kit Vision API// ...}}}
If you don't use a camera library that gives you the image's rotation, you
can calculate it from the device's rotation and the orientation of camera
sensor in the device:
Java
privatestaticfinalSparseIntArrayORIENTATIONS=newSparseIntArray();static{ORIENTATIONS.append(Surface.ROTATION_0,90);ORIENTATIONS.append(Surface.ROTATION_90,0);ORIENTATIONS.append(Surface.ROTATION_180,270);ORIENTATIONS.append(Surface.ROTATION_270,180);}/*** Get the angle by which an image must be rotated given the device's current* orientation.*/@RequiresApi(api=Build.VERSION_CODES.LOLLIPOP)privateintgetRotationCompensation(StringcameraId,Activityactivity,Contextcontext)throwsCameraAccessException{// Get the device's current rotation relative to its "native" orientation.// Then, from the ORIENTATIONS table, look up the angle the image must be// rotated to compensate for the device's rotation.intdeviceRotation=activity.getWindowManager().getDefaultDisplay().getRotation();introtationCompensation=ORIENTATIONS.get(deviceRotation);// On most devices, the sensor orientation is 90 degrees, but for some// devices it is 270 degrees. For devices with a sensor orientation of// 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.CameraManagercameraManager=(CameraManager)context.getSystemService(CAMERA_SERVICE);intsensorOrientation=cameraManager.getCameraCharacteristics(cameraId).get(CameraCharacteristics.SENSOR_ORIENTATION);rotationCompensation=(rotationCompensation+sensorOrientation+270)%360;// Return the corresponding FirebaseVisionImageMetadata rotation value.intresult;switch(rotationCompensation){case0:result=FirebaseVisionImageMetadata.ROTATION_0;break;case90:result=FirebaseVisionImageMetadata.ROTATION_90;break;case180:result=FirebaseVisionImageMetadata.ROTATION_180;break;case270:result=FirebaseVisionImageMetadata.ROTATION_270;break;default:result=FirebaseVisionImageMetadata.ROTATION_0;Log.e(TAG,"Bad rotation value: "+rotationCompensation);}returnresult;}
privatevalORIENTATIONS=SparseIntArray()init{ORIENTATIONS.append(Surface.ROTATION_0,90)ORIENTATIONS.append(Surface.ROTATION_90,0)ORIENTATIONS.append(Surface.ROTATION_180,270)ORIENTATIONS.append(Surface.ROTATION_270,180)}/*** Get the angle by which an image must be rotated given the device's current* orientation.*/@RequiresApi(api=Build.VERSION_CODES.LOLLIPOP)@Throws(CameraAccessException::class)privatefungetRotationCompensation(cameraId:String,activity:Activity,context:Context):Int{// Get the device's current rotation relative to its "native" orientation.// Then, from the ORIENTATIONS table, look up the angle the image must be// rotated to compensate for the device's rotation.valdeviceRotation=activity.windowManager.defaultDisplay.rotationvarrotationCompensation=ORIENTATIONS.get(deviceRotation)// On most devices, the sensor orientation is 90 degrees, but for some// devices it is 270 degrees. For devices with a sensor orientation of// 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.valcameraManager=context.getSystemService(CAMERA_SERVICE)asCameraManagervalsensorOrientation=cameraManager.getCameraCharacteristics(cameraId).get(CameraCharacteristics.SENSOR_ORIENTATION)!!rotationCompensation=(rotationCompensation+sensorOrientation+270)%360// Return the corresponding FirebaseVisionImageMetadata rotation value.valresult:Intwhen(rotationCompensation){0->result=FirebaseVisionImageMetadata.ROTATION_090->result=FirebaseVisionImageMetadata.ROTATION_90180->result=FirebaseVisionImageMetadata.ROTATION_180270->result=FirebaseVisionImageMetadata.ROTATION_270else->{result=FirebaseVisionImageMetadata.ROTATION_0Log.e(TAG,"Bad rotation value:$rotationCompensation")}}returnresult}
To create aFirebaseVisionImageobject from a file URI, pass
the app context and file URI toFirebaseVisionImage.fromFilePath(). This is useful when you
use anACTION_GET_CONTENTintent to prompt the user to select
an image from their gallery app.
To create aFirebaseVisionImageobject from aByteBufferor a byte array, first calculate the image
rotation as described above formedia.Imageinput.
Then, create aFirebaseVisionImageMetadataobject
that contains the image's height, width, color encoding format,
and rotation:
Java
FirebaseVisionImageMetadatametadata=newFirebaseVisionImageMetadata.Builder().setWidth(480)// 480x360 is typically sufficient for.setHeight(360)// image recognition.setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21).setRotation(rotation).build();
valmetadata=FirebaseVisionImageMetadata.Builder().setWidth(480)// 480x360 is typically sufficient for.setHeight(360)// image recognition.setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21).setRotation(rotation).build()
// Or, to provide language hints to assist with language detection:// See https://cloud.google.com/vision/docs/languages for supported languagesFirebaseVisionCloudDocumentRecognizerOptionsoptions=newFirebaseVisionCloudDocumentRecognizerOptions.Builder().setLanguageHints(Arrays.asList("en","hi")).build();FirebaseVisionDocumentTextRecognizerdetector=FirebaseVision.getInstance().getCloudDocumentTextRecognizer(options);
// Or, to provide language hints to assist with language detection:// See https://cloud.google.com/vision/docs/languages for supported languagesvaloptions=FirebaseVisionCloudDocumentRecognizerOptions.Builder().setLanguageHints(listOf("en","hi")).build()valdetector=FirebaseVision.getInstance().getCloudDocumentTextRecognizer(options)
Finally, pass the image to theprocessImagemethod:
Java
detector.processImage(myImage).addOnSuccessListener(newOnSuccessListener<FirebaseVisionDocumentText>(){@OverridepublicvoidonSuccess(FirebaseVisionDocumentTextresult){// Task completed successfully// ...}}).addOnFailureListener(newOnFailureListener(){@OverridepublicvoidonFailure(@NonNullExceptione){// Task failed with an exception// ...}});
Kotlin
detector.processImage(myImage).addOnSuccessListener{firebaseVisionDocumentText->// Task completed successfully// ...}.addOnFailureListener{e->// Task failed with an exception// ...}
2. Extract text from blocks of recognized text
If the text recognition operation succeeds, it will return aFirebaseVisionDocumentTextobject. AFirebaseVisionDocumentTextobject contains the full text recognized in the
image and a hierarchy of objects that reflect the structure of the recognized
document:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[],[],null,[]]