Detect and track objects with ML Kit on iOS

You can use ML Kit to detect and track objects in successive video frames.

When you pass an image to ML Kit, it detects up to five objects in the image along with the position of each object in the image. When detecting objects in video streams, each object has a unique ID that you can use to track the object from frame to frame. You can also optionally enable coarse object classification, which labels objects with broad category descriptions.

Try it out

Before you begin

  1. Include the following ML Kit pods in your Podfile:
    pod 'GoogleMLKit/ObjectDetection', '8.0.0'
  2. After you install or update your project's Pods, open your Xcode project using its .xcworkspace . ML Kit is supported in Xcode version 12.4 or greater.

1. Configure the object detector

To detect and track objects, first create an instance of ObjectDetector and optionally specify any detector settings you want to change from the default.

  1. Configure the object detector for your use case with an ObjectDetectorOptions object. You can change the following settings:

    Object Detector Settings
    Detection mode
    .stream (default) | .singleImage

    In stream mode (default), the object detector runs with very low latency, but might produce incomplete results (such as unspecified bounding boxes or categories) on the first few invocations of the detector. Also, in stream mode, the detector assigns tracking IDs to objects, which you can use to track objects across frames. Use this mode when you want to track objects, or when low latency is important, such as when processing video streams in real time.

    In single image mode, the object detector returns the result after the object's bounding box is determined. If you also enable classification it returns the result after the bounding box and category label are both available. As a consequence, detection latency is potentially higher. Also, in single image mode, tracking IDs are not assigned. Use this mode if latency isn't critical and you don't want to deal with partial results.

    Detect and track multiple objects
    false (default) | true

    Whether to detect and track up to five objects or only the most prominent object (default).

    Classify objects
    false (default) | true

    Whether or not to classify detected objects into coarse categories. When enabled, the object detector classifies objects into the following categories: fashion goods, food, home goods, places, and plants.

    The object detection and tracking API is optimized for these two core use cases:

    • Live detection and tracking of the most prominent object in the camera viewfinder.
    • The detection of multiple objects in a static image.

    To configure the API for these use cases:

Swift

 // Live detection and tracking 
 let 
  
 options 
  
 = 
  
 ObjectDetectorOptions 
 () 
 options 
 . 
 shouldEnableClassification 
  
 = 
  
 true 
 // Multiple object detection in static images 
 let 
  
 options 
  
 = 
  
 ObjectDetectorOptions 
 () 
 options 
 . 
 detectorMode 
  
 = 
  
 . 
 singleImage 
 options 
 . 
 shouldEnableMultipleObjects 
  
 = 
  
 true 
 options 
 . 
 shouldEnableClassification 
  
 = 
  
 true 

Objective-C

 // Live detection and tracking 
 MLKObjectDetectorOptions 
  
 * 
 options 
  
 = 
  
 [[ 
 MLKObjectDetectorOptions 
  
 alloc 
 ] 
  
 init 
 ]; 
 options 
 . 
 shouldEnableClassification 
  
 = 
  
 YES 
 ; 
 // Multiple object detection in static images 
 MLKObjectDetectorOptions 
  
 * 
 options 
  
 = 
  
 [[ 
 MLKOptions 
  
 alloc 
 ] 
  
 init 
 ]; 
 options 
 . 
 detectorMode 
  
 = 
  
 MLKObjectDetectorModeSingleImage 
 ; 
 options 
 . 
 shouldEnableMultipleObjects 
  
 = 
  
 YES 
 ; 
 options 
 . 
 shouldEnableClassification 
  
 = 
  
 YES 
 ; 
  1. Get an instance of ObjectDetector :

Swift

 let 
  
 objectDetector 
  
 = 
  
 ObjectDetector 
 . 
 objectDetector 
 () 
 // Or, to change the default settings: 
 let 
  
 objectDetector 
  
 = 
  
 ObjectDetector 
 . 
 objectDetector 
 ( 
 options 
 : 
  
 options 
 ) 

Objective-C

 MLKObjectDetector 
  
 * 
 objectDetector 
  
 = 
  
 [ 
 MLKObjectDetector 
  
 objectDetector 
 ]; 
 // Or, to change the default settings: 
 MLKObjectDetector 
  
 * 
 objectDetector 
  
 = 
  
 [ 
 MLKObjectDetector 
  
 objectDetectorWithOptions 
 : 
 options 
 ]; 

2. Prepare the input image

To detect and track objects, do the following for each image or frame of video. If you enabled stream mode, you must create VisionImage objects from CMSampleBuffer s.

Create a VisionImage object using a UIImage or a CMSampleBuffer .

If you use a UIImage , follow these steps:

  • Create a VisionImage object with the UIImage . Make sure to specify the correct .orientation .

    Swift

    let image = VisionImage(image: UIImage)
    visionImage.orientation = image.imageOrientation

    Objective-C

     MLKVisionImage 
      
     * 
     visionImage 
      
     = 
      
     [[ 
     MLKVisionImage 
      
     alloc 
     ] 
      
     initWithImage 
     : 
     image 
     ]; 
     visionImage 
     . 
     orientation 
      
     = 
      
     image 
     . 
     imageOrientation 
     ; 
    

If you use a CMSampleBuffer , follow these steps:

  • Specify the orientation of the image data contained in the CMSampleBuffer .

    To get the image orientation:

    Swift

     func 
      
     imageOrientation 
     ( 
      
     deviceOrientation 
     : 
      
     UIDeviceOrientation 
     , 
      
     cameraPosition 
     : 
      
     AVCaptureDevice 
     . 
     Position 
     ) 
      
     -> 
      
     UIImage 
     . 
     Orientation 
      
     { 
      
     switch 
      
     deviceOrientation 
      
     { 
      
     case 
      
     . 
     portrait 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     . 
     front 
      
     ? 
      
     . 
     leftMirrored 
      
     : 
      
     . 
     right 
      
     case 
      
     . 
     landscapeLeft 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     . 
     front 
      
     ? 
      
     . 
     downMirrored 
      
     : 
      
     . 
     up 
      
     case 
      
     . 
     portraitUpsideDown 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     . 
     front 
      
     ? 
      
     . 
     rightMirrored 
      
     : 
      
     . 
     left 
      
     case 
      
     . 
     landscapeRight 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     . 
     front 
      
     ? 
      
     . 
     upMirrored 
      
     : 
      
     . 
     down 
      
     case 
      
     . 
     faceDown 
     , 
      
     . 
     faceUp 
     , 
      
     . 
     unknown 
     : 
      
     return 
      
     . 
     up 
      
     } 
     } 
      
    

    Objective-C

     - 
      
     ( 
     UIImageOrientation 
     ) 
      
     imageOrientationFromDeviceOrientation 
     :( 
     UIDeviceOrientation 
     ) 
     deviceOrientation 
      
     cameraPosition 
     :( 
     AVCaptureDevicePosition 
     ) 
     cameraPosition 
      
     { 
      
     switch 
      
     (deviceOrientation) 
      
     { 
      
     case 
      
     UIDeviceOrientationPortrait 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     AVCaptureDevicePositionFront 
      
     ? 
      
     UIImageOrientationLeftMirrored 
      
     : 
      
     UIImageOrientationRight 
     ; 
      
     case 
      
     UIDeviceOrientationLandscapeLeft 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     AVCaptureDevicePositionFront 
      
     ? 
      
     UIImageOrientationDownMirrored 
      
     : 
      
     UIImageOrientationUp 
     ; 
      
     case 
      
     UIDeviceOrientationPortraitUpsideDown 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     AVCaptureDevicePositionFront 
      
     ? 
      
     UIImageOrientationRightMirrored 
      
     : 
      
     UIImageOrientationLeft 
     ; 
      
     case 
      
     UIDeviceOrientationLandscapeRight 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     AVCaptureDevicePositionFront 
      
     ? 
      
     UIImageOrientationUpMirrored 
      
     : 
      
     UIImageOrientationDown 
     ; 
      
     case 
      
     UIDeviceOrientationUnknown 
     : 
      
     case 
      
     UIDeviceOrientationFaceUp 
     : 
      
     case 
      
     UIDeviceOrientationFaceDown 
     : 
      
     return 
      
     UIImageOrientationUp 
     ; 
      
     } 
     } 
      
    
  • Create a VisionImage object using the CMSampleBuffer object and orientation:

    Swift

     let 
      
     image 
      
     = 
      
     VisionImage 
     ( 
     buffer 
     : 
      
     sampleBuffer 
     ) 
     image 
     . 
     orientation 
      
     = 
      
     imageOrientation 
     ( 
      
     deviceOrientation 
     : 
      
     UIDevice 
     . 
     current 
     . 
     orientation 
     , 
      
     cameraPosition 
     : 
      
     cameraPosition 
     ) 
    

    Objective-C

      
     MLKVisionImage 
      
     * 
     image 
      
     = 
      
     [[ 
     MLKVisionImage 
      
     alloc 
     ] 
      
     initWithBuffer 
     : 
     sampleBuffer 
     ]; 
      
     image 
     . 
     orientation 
      
     = 
      
     [ 
     self 
      
     imageOrientationFromDeviceOrientation 
     : 
     UIDevice 
     . 
     currentDevice 
     . 
     orientation 
      
     cameraPosition 
     : 
     cameraPosition 
     ]; 
    

3. Process the image

Pass the VisionImage to one of the object detector's image processing methods. You can either use the asynchronous process(image:) method or the synchronous results() method.

To detect objects asynchronously:

Swift

 objectDetector 
 . 
 process 
 ( 
 image 
 ) 
  
 { 
  
 objects 
 , 
  
 error 
  
 in 
  
 guard 
  
 error 
  
 == 
  
 nil 
  
 else 
  
 { 
  
 // Error. 
  
 return 
  
 } 
  
 guard 
  
 ! 
 objects 
 . 
 isEmpty 
  
 else 
  
 { 
  
 // No objects detected. 
  
 return 
  
 } 
  
 // Success. Get object info here. 
  
 // ... 
 } 

Objective-C

 [ 
 objectDetector 
  
 processImage 
 : 
 image 
  
 completion 
 : 
 ^ 
 ( 
 NSArray 
   
 * 
  
 _Nullable 
  
 objects 
 , 
  
 NSError 
  
 * 
  
 _Nullable 
  
 error 
 ) 
  
 { 
  
 if 
  
 ( 
 error 
  
 == 
  
 nil 
 ) 
  
 { 
  
 return 
 ; 
  
 } 
  
 if 
  
 ( 
 objects 
 . 
 count 
  
 == 
  
 0 
 ) 
  
 { 
  
 // No objects detected. 
  
 return 
 ; 
  
 } 
  
 // Success. Get object info here. 
  
 }]; 
 

To detect objects synchronously:

Swift

 var 
  
 objects 
 : 
  
 [ 
 Object 
 ] 
 do 
  
 { 
  
 objects 
  
 = 
  
 try 
  
 objectDetector 
 . 
 results 
 ( 
 in 
 : 
  
 image 
 ) 
 } 
  
 catch 
  
 let 
  
 error 
  
 { 
  
 print 
 ( 
 "Failed to detect object with error: 
 \( 
 error 
 . 
 localizedDescription 
 ) 
 ." 
 ) 
  
 return 
 } 
 guard 
  
 ! 
 objects 
 . 
 isEmpty 
  
 else 
  
 { 
  
 print 
 ( 
 "Object detector returned no results." 
 ) 
  
 return 
 } 
 // Success. Get object info here. 

Objective-C

 NSError 
  
 * 
 error 
 ; 
 NSArray 
   
 * 
 objects 
  
 = 
  
 [ 
 objectDetector 
  
 resultsInImage 
 : 
 image 
  
 error 
 :& 
 error 
 ]; 
 if 
  
 ( 
 error 
  
 == 
  
 nil 
 ) 
  
 { 
  
 return 
 ; 
 } 
 if 
  
 ( 
 objects 
 . 
 count 
  
 == 
  
 0 
 ) 
  
 { 
  
 // No objects detected. 
  
 return 
 ; 
 } 
 // Success. Get object info here. 
 

4. Get information about detected objects

If the call to the image processor succeeds, it either passes a list of Object s to the completion handler or returns the list, depending on whether you called the asynchronous or synchronous method.

Each Object contains the following properties:

frame A CGRect indicating the position of the object in the image.
trackingID An integer that identifies the object across images, or `nil` in single image mode.
labels An array of labels describing the object returned by the detector. The property is empty if the detector option shouldEnableClassification is set to false .

Swift

 // objects contains one item if multiple object detection wasn't enabled. 
 for 
  
 object 
  
 in 
  
 objects 
  
 { 
  
 let 
  
 frame 
  
 = 
  
 object 
 . 
 frame 
  
 let 
  
 trackingID 
  
 = 
  
 object 
 . 
 trackingID 
  
 // If classification was enabled: 
  
 let 
  
 description 
  
 = 
  
 object 
 . 
 labels 
 . 
 enumerated 
 (). 
 map 
  
 { 
  
 ( 
 index 
 , 
  
 label 
 ) 
  
 in 
  
 "Label 
 \( 
 index 
 ) 
 : 
 \( 
 label 
 . 
 text 
 ) 
 , 
 \( 
 label 
 . 
 confidence 
 ) 
 " 
  
 }. 
 joined 
 ( 
 separator 
 : 
 " 
 \n 
 " 
 ) 
 } 

Objective-C

 // The list of detected objects contains one item if multiple 
 // object detection wasn't enabled. 
 for 
  
 ( 
 MLKObject 
  
 * 
 object 
  
 in 
  
 objects 
 ) 
  
 { 
  
 CGRect 
  
 frame 
  
 = 
  
 object 
 . 
 frame 
 ; 
  
 NSNumber 
  
 * 
 trackingID 
  
 = 
  
 object 
 . 
 trackingID 
 ; 
  
 for 
  
 ( 
 MLKObjectLabel 
  
 * 
 label 
  
 in 
  
 object 
 . 
 labels 
 ) 
  
 { 
  
 NSString 
  
 * 
 labelString 
  
 = 
  
 [ 
 NSString 
  
 stringWithFormat 
 : 
  
 @"%@, %f, %lu" 
 , 
  
 label 
 . 
 text 
 , 
  
 label 
 . 
 confidence 
 , 
  
 ( 
 unsigned 
  
 long 
 ) 
 label 
 . 
 index 
 ]; 
  
 ... 
  
 } 
 } 

Improving usability and performance

For the best user experience, follow these guidelines in your app:

  • Successful object detection depends on the object's visual complexity. In order to be detected, objects with a small number of visual features might need to take up a larger part of the image. You should provide users with guidance on capturing input that works well with the kind of objects you want to detect.
  • When you use classification, if you want to detect objects that don't fall cleanly into the supported categories, implement special handling for unknown objects.

Also, check out the Material Design Patterns for machine learning-powered features collection.

When you use streaming mode in a real-time application, follow these guidelines to achieve the best framerates:

  • Don't use multiple object detection in streaming mode, as most devices won't be able to produce adequate framerates.
  • Disable classification if you don't need it.
  • For processing video frames, use the results(in:) synchronous API of the detector. Call this method from the AVCaptureVideoDataOutputSampleBufferDelegate 's captureOutput(_, didOutput:from:) function to synchronously get results from the given video frame. Keep AVCaptureVideoDataOutput 's alwaysDiscardsLateVideoFrames as true to throttle calls to the detector. If a new video frame becomes available while the detector is running, it will be dropped.
  • If you use the output of the detector to overlay graphics on the input image, first get the result from ML Kit, then render the image and overlay in a single step. By doing so, you render to the display surface only once for each processed input frame. See the updatePreviewOverlayViewWithLastFrame in the ML Kit quickstart sample for an example.
Create a Mobile Website
View Site in Mobile | Classic
Share by: