Recognize text in images with ML Kit on iOS

You can use ML Kit to recognize text in images or video, such as the text of a street sign. The main characteristics of this feature are:

Text Recognition v2 API
Description
Recognize text in images or videos, support for Latin, Chinese, Devanagari, Japanese and Korean scripts and a wide range of languages .
SDK names
GoogleMLKit/TextRecognition
GoogleMLKit/TextRecognitionChinese
GoogleMLKit/TextRecognitionDevanagari
GoogleMLKit/TextRecognitionJapanese
GoogleMLKit/TextRecognitionKorean
Implementation
Assets are statically linked to your app at build time
App size impact
About 38 MB per script SDK
Performance
Real-time on most devices for Latin script SDK, slower for others.

Try it out

  • Play around with the sample app to see an example usage of this API.
  • Try the code yourself with the codelab .

Before you begin

  1. Include the following ML Kit pods in your Podfile:
    # To recognize Latin script
    pod 'GoogleMLKit/TextRecognition', '8.0.0'
    # To recognize Chinese script
    pod 'GoogleMLKit/TextRecognitionChinese', '8.0.0'
    # To recognize Devanagari script
    pod 'GoogleMLKit/TextRecognitionDevanagari', '8.0.0'
    # To recognize Japanese script
    pod 'GoogleMLKit/TextRecognitionJapanese', '8.0.0'
    # To recognize Korean script
    pod 'GoogleMLKit/TextRecognitionKorean', '8.0.0'
  2. After you install or update your project's Pods, open your Xcode project using its .xcworkspace . ML Kit is supported in Xcode version 12.4 or greater.

1. Create an instance of TextRecognizer

Create an instance of TextRecognizer by calling +textRecognizer(options:) , passing the options related to the SDK you declared as dependency on above:

Swift

 // When using Latin script recognition SDK 
 let 
  
 latinOptions 
  
 = 
  
 TextRecognizerOptions 
 () 
 let 
  
 latinTextRecognizer 
  
 = 
  
 TextRecognizer 
 . 
 textRecognizer 
 ( 
 options 
 : 
 options 
 ) 
 // When using Chinese script recognition SDK 
 let 
  
 chineseOptions 
  
 = 
  
 ChineseTextRecognizerOptions 
 () 
 let 
  
 chineseTextRecognizer 
  
 = 
  
 TextRecognizer 
 . 
 textRecognizer 
 ( 
 options 
 : 
 options 
 ) 
 // When using Devanagari script recognition SDK 
 let 
  
 devanagariOptions 
  
 = 
  
 DevanagariTextRecognizerOptions 
 () 
 let 
  
 devanagariTextRecognizer 
  
 = 
  
 TextRecognizer 
 . 
 textRecognizer 
 ( 
 options 
 : 
 options 
 ) 
 // When using Japanese script recognition SDK 
 let 
  
 japaneseOptions 
  
 = 
  
 JapaneseTextRecognizerOptions 
 () 
 let 
  
 japaneseTextRecognizer 
  
 = 
  
 TextRecognizer 
 . 
 textRecognizer 
 ( 
 options 
 : 
 options 
 ) 
 // When using Korean script recognition SDK 
 let 
  
 koreanOptions 
  
 = 
  
 KoreanTextRecognizerOptions 
 () 
 let 
  
 koreanTextRecognizer 
  
 = 
  
 TextRecognizer 
 . 
 textRecognizer 
 ( 
 options 
 : 
 options 
 ) 

Objective-C

 // When using Latin script recognition SDK 
 MLKTextRecognizerOptions 
  
 * 
 latinOptions 
  
 = 
  
 [[ 
 MLKTextRecognizerOptions 
  
 alloc 
 ] 
  
 init 
 ]; 
 MLKTextRecognizer 
  
 * 
 latinTextRecognizer 
  
 = 
  
 [ 
 MLKTextRecognizer 
  
 textRecognizerWithOptions 
 : 
 options 
 ]; 
 // When using Chinese script recognition SDK 
 MLKChineseTextRecognizerOptions 
  
 * 
 chineseOptions 
  
 = 
  
 [[ 
 MLKChineseTextRecognizerOptions 
  
 alloc 
 ] 
  
 init 
 ]; 
 MLKTextRecognizer 
  
 * 
 chineseTextRecognizer 
  
 = 
  
 [ 
 MLKTextRecognizer 
  
 textRecognizerWithOptions 
 : 
 options 
 ]; 
 // When using Devanagari script recognition SDK 
 MLKDevanagariTextRecognizerOptions 
  
 * 
 devanagariOptions 
  
 = 
  
 [[ 
 MLKDevanagariTextRecognizerOptions 
  
 alloc 
 ] 
  
 init 
 ]; 
 MLKTextRecognizer 
  
 * 
 devanagariTextRecognizer 
  
 = 
  
 [ 
 MLKTextRecognizer 
  
 textRecognizerWithOptions 
 : 
 options 
 ]; 
 // When using Japanese script recognition SDK 
 MLKJapaneseTextRecognizerOptions 
  
 * 
 japaneseOptions 
  
 = 
  
 [[ 
 MLKJapaneseTextRecognizerOptions 
  
 alloc 
 ] 
  
 init 
 ]; 
 MLKTextRecognizer 
  
 * 
 japaneseTextRecognizer 
  
 = 
  
 [ 
 MLKTextRecognizer 
  
 textRecognizerWithOptions 
 : 
 options 
 ]; 
 // When using Korean script recognition SDK 
 MLKKoreanTextRecognizerOptions 
  
 * 
 koreanOptions 
  
 = 
  
 [[ 
 MLKKoreanTextRecognizerOptions 
  
 alloc 
 ] 
  
 init 
 ]; 
 MLKTextRecognizer 
  
 * 
 koreanTextRecognizer 
  
 = 
  
 [ 
 MLKTextRecognizer 
  
 textRecognizerWithOptions 
 : 
 options 
 ]; 

2. Prepare the input image

Pass the image as a UIImage or a CMSampleBufferRef to the TextRecognizer 's process(_:completion:) method:

Create a VisionImage object using a UIImage or a CMSampleBuffer .

If you use a UIImage , follow these steps:

  • Create a VisionImage object with the UIImage . Make sure to specify the correct .orientation .

    Swift

    let image = VisionImage(image: UIImage)
    visionImage.orientation = image.imageOrientation

    Objective-C

     MLKVisionImage 
      
     * 
     visionImage 
      
     = 
      
     [[ 
     MLKVisionImage 
      
     alloc 
     ] 
      
     initWithImage 
     : 
     image 
     ]; 
     visionImage 
     . 
     orientation 
      
     = 
      
     image 
     . 
     imageOrientation 
     ; 
    

If you use a CMSampleBuffer , follow these steps:

  • Specify the orientation of the image data contained in the CMSampleBuffer .

    To get the image orientation:

    Swift

     func 
      
     imageOrientation 
     ( 
      
     deviceOrientation 
     : 
      
     UIDeviceOrientation 
     , 
      
     cameraPosition 
     : 
      
     AVCaptureDevice 
     . 
     Position 
     ) 
      
     -> 
      
     UIImage 
     . 
     Orientation 
      
     { 
      
     switch 
      
     deviceOrientation 
      
     { 
      
     case 
      
     . 
     portrait 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     . 
     front 
      
     ? 
      
     . 
     leftMirrored 
      
     : 
      
     . 
     right 
      
     case 
      
     . 
     landscapeLeft 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     . 
     front 
      
     ? 
      
     . 
     downMirrored 
      
     : 
      
     . 
     up 
      
     case 
      
     . 
     portraitUpsideDown 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     . 
     front 
      
     ? 
      
     . 
     rightMirrored 
      
     : 
      
     . 
     left 
      
     case 
      
     . 
     landscapeRight 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     . 
     front 
      
     ? 
      
     . 
     upMirrored 
      
     : 
      
     . 
     down 
      
     case 
      
     . 
     faceDown 
     , 
      
     . 
     faceUp 
     , 
      
     . 
     unknown 
     : 
      
     return 
      
     . 
     up 
      
     } 
     } 
      
    

    Objective-C

     - 
      
     ( 
     UIImageOrientation 
     ) 
      
     imageOrientationFromDeviceOrientation 
     :( 
     UIDeviceOrientation 
     ) 
     deviceOrientation 
      
     cameraPosition 
     :( 
     AVCaptureDevicePosition 
     ) 
     cameraPosition 
      
     { 
      
     switch 
      
     (deviceOrientation) 
      
     { 
      
     case 
      
     UIDeviceOrientationPortrait 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     AVCaptureDevicePositionFront 
      
     ? 
      
     UIImageOrientationLeftMirrored 
      
     : 
      
     UIImageOrientationRight 
     ; 
      
     case 
      
     UIDeviceOrientationLandscapeLeft 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     AVCaptureDevicePositionFront 
      
     ? 
      
     UIImageOrientationDownMirrored 
      
     : 
      
     UIImageOrientationUp 
     ; 
      
     case 
      
     UIDeviceOrientationPortraitUpsideDown 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     AVCaptureDevicePositionFront 
      
     ? 
      
     UIImageOrientationRightMirrored 
      
     : 
      
     UIImageOrientationLeft 
     ; 
      
     case 
      
     UIDeviceOrientationLandscapeRight 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     AVCaptureDevicePositionFront 
      
     ? 
      
     UIImageOrientationUpMirrored 
      
     : 
      
     UIImageOrientationDown 
     ; 
      
     case 
      
     UIDeviceOrientationUnknown 
     : 
      
     case 
      
     UIDeviceOrientationFaceUp 
     : 
      
     case 
      
     UIDeviceOrientationFaceDown 
     : 
      
     return 
      
     UIImageOrientationUp 
     ; 
      
     } 
     } 
      
    
  • Create a VisionImage object using the CMSampleBuffer object and orientation:

    Swift

     let 
      
     image 
      
     = 
      
     VisionImage 
     ( 
     buffer 
     : 
      
     sampleBuffer 
     ) 
     image 
     . 
     orientation 
      
     = 
      
     imageOrientation 
     ( 
      
     deviceOrientation 
     : 
      
     UIDevice 
     . 
     current 
     . 
     orientation 
     , 
      
     cameraPosition 
     : 
      
     cameraPosition 
     ) 
    

    Objective-C

      
     MLKVisionImage 
      
     * 
     image 
      
     = 
      
     [[ 
     MLKVisionImage 
      
     alloc 
     ] 
      
     initWithBuffer 
     : 
     sampleBuffer 
     ]; 
      
     image 
     . 
     orientation 
      
     = 
      
     [ 
     self 
      
     imageOrientationFromDeviceOrientation 
     : 
     UIDevice 
     . 
     currentDevice 
     . 
     orientation 
      
     cameraPosition 
     : 
     cameraPosition 
     ]; 
    

3. Process the image

Then, pass the image to the process(_:completion:) method:

Swift

textRecognizer.process(visionImage) { result, error in
  guard error == nil, let result = result else {
    // Error handling
    return
  }
  // Recognized text
}

Objective-C

 [ 
 textRecognizer 
  
 processImage 
 : 
 image 
  
 completion 
 : 
 ^ 
 ( 
 MLKText 
  
 * 
 _Nullable 
  
 result 
 , 
  
 NSError 
  
 * 
 _Nullable 
  
 error 
 ) 
  
 { 
  
 if 
  
 ( 
 error 
  
 != 
  
 nil 
  
 || 
  
 result 
  
 == 
  
 nil 
 ) 
  
 { 
  
 // Error handling 
  
 return 
 ; 
  
 } 
  
 // Recognized text 
 }]; 

4. Extract text from blocks of recognized text

If the text recognition operation succeeds, it returns a Text object. A Text object contains the full text recognized in the image and zero or more TextBlock objects.

Each TextBlock represents a rectangular block of text, which contain zero or more TextLine objects. Each TextLine object contains zero or more TextElement objects, which represent words and word-like entities such as dates and numbers.

For each TextBlock , TextLine , and TextElement object, you can get the text recognized in the region and the bounding coordinates of the region.

For example:

Swift

let resultText = result.text
for block in result.blocks {
    let blockText = block.text
    let blockLanguages = block.recognizedLanguages
    let blockCornerPoints = block.cornerPoints
    let blockFrame = block.frame
    for line in block.lines {
        let lineText = line.text
        let lineLanguages = line.recognizedLanguages
        let lineCornerPoints = line.cornerPoints
        let lineFrame = line.frame
        for element in line.elements {
            let elementText = element.text
            let elementCornerPoints = element.cornerPoints
            let elementFrame = element.frame
        }
    }
}

Objective-C

NSString *resultText = result.text;
for (MLKTextBlock *block in result.blocks) {
  NSString *blockText = block.text; 
 NSArray<MLKTextRecognizedLanguage * 
> *blockLanguages = block.recognizedLanguages; 
 NSArray<NSValue * 
> *blockCornerPoints = block.cornerPoints;
  CGRect blockFrame = block.frame;
  for (MLKTextLine *line in block.lines) {
    NSString *lineText = line.text; 
 NSArray<MLKTextRecognizedLanguage * 
> *lineLanguages = line.recognizedLanguages; 
 NSArray<NSValue * 
> *lineCornerPoints = line.cornerPoints;
    CGRect lineFrame = line.frame;
    for (MLKTextElement *element in line.elements) {
      NSString *elementText = element.text; 
 NSArray<NSValue * 
> *elementCornerPoints = element.cornerPoints;
      CGRect elementFrame = element.frame;
    }
  }
}

Input image guidelines

  • For ML Kit to accurately recognize text, input images must contain text that is represented by sufficient pixel data. Ideally, each character should be at least 16x16 pixels. There is generally no accuracy benefit for characters to be larger than 24x24 pixels.

    So, for example, a 640x480 image might work well to scan a business card that occupies the full width of the image. To scan a document printed on letter-sized paper, a 720x1280 pixel image might be required.

  • Poor image focus can affect text recognition accuracy. If you aren't getting acceptable results, try asking the user to recapture the image.

  • If you are recognizing text in a real-time application, you should consider the overall dimensions of the input images. Smaller images can be processed faster. To reduce latency, ensure that the text occupies as much of the image as possible, and capture images at lower resolutions (keeping in mind the accuracy requirements mentioned above). For more information, see Tips to improve performance .

Tips to improve performance

  • For processing video frames, use the results(in:) synchronous API of the detector. Call this method from the AVCaptureVideoDataOutputSampleBufferDelegate 's captureOutput(_, didOutput:from:) function to synchronously get results from the given video frame. Keep AVCaptureVideoDataOutput 's alwaysDiscardsLateVideoFrames as true to throttle calls to the detector. If a new video frame becomes available while the detector is running, it will be dropped.
  • If you use the output of the detector to overlay graphics on the input image, first get the result from ML Kit, then render the image and overlay in a single step. By doing so, you render to the display surface only once for each processed input frame. See the updatePreviewOverlayViewWithLastFrame in the ML Kit quickstart sample for an example.
  • Consider capturing images at a lower resolution. However, also keep in mind this API's image dimension requirements.
  • To avoid potential performance degradation, do not run multiple TextRecognizer instances with different script options concurrently.
Create a Mobile Website
View Site in Mobile | Classic
Share by: