Selfie segmentation with ML Kit on iOS

ML Kit provides an optimized SDK for selfie segmentation. The Selfie Segmenter assets are statically linked to your app at build time. This will increase your app size by up to 24MB and the API latency can vary from ~7ms to ~12ms depending on the input image size, as measured on iPhone X.

Try it out

  • Play around with the sample app to see an example usage of this API.

Before you begin

  1. Include the following ML Kit libraries in your Podfile:

      pod 
      
     ' 
     GoogleMLKit 
     / 
     SegmentationSelfie 
     ' 
     , 
      
     ' 
     8.0.0 
     ' 
     
    
  2. After you install or update your project’s Pods, open your Xcode project using its . xcworkspace . ML Kit is supported in Xcode version 13.2.1 or higher.

1. Create an instance of Segmenter

To perform segmentation on a selfie image, first create an instance of Segmenter with SelfieSegmenterOptions and optionally specify the segmentation settings.

Segmenter options

Segmenter Mode

The Segmenter operates in two modes. Be sure you choose the one that matches your use case.

STREAM_MODE (default)

This mode is designed for streaming frames from video or camera. In this mode, the segmenter will leverage results from previous frames to return smoother segmentation results.

SINGLE_IMAGE_MODE (default)

This mode is designed for single images that are not related. In this mode, the segmenter will process each image independently, with no smoothing over frames.

Enable raw size mask

Asks the segmenter to return the raw size mask which matches the model output size.

The raw mask size (e.g. 256x256) is usually smaller than the input image size.

Without specifying this option, the segmenter will rescale the raw mask to match the input image size. Consider using this option if you want to apply customized rescaling logic or rescaling is not needed for your use case.

Specify the segmenter options:

Swift

 let 
  
 options 
  
 = 
  
 SelfieSegmenterOptions 
 () 
 options 
 . 
 segmenterMode 
  
 = 
  
 . 
 singleImage 
 options 
 . 
 shouldEnableRawSizeMask 
  
 = 
  
 true 

Objective-C

 MLKSelfieSegmenterOptions 
  
 * 
 options 
  
 = 
  
 [[ 
 MLKSelfieSegmenterOptions 
  
 alloc 
 ] 
  
 init 
 ]; 
 options 
 . 
 segmenterMode 
  
 = 
  
 MLKSegmenterModeSingleImage 
 ; 
 options 
 . 
 shouldEnableRawSizeMask 
  
 = 
  
 YES 
 ; 

Finally, get an instance of Segmenter . Pass the options you specified:

Swift

 let 
  
 segmenter 
  
 = 
  
 Segmenter 
 . 
 segmenter 
 ( 
 options 
 : 
  
 options 
 ) 

Objective-C

 MLKSegmenter 
  
 * 
 segmenter 
  
 = 
  
 [ 
 MLKSegmenter 
  
 segmenterWithOptions 
 : 
 options 
 ]; 

2. Prepare the input image

To segment selfies, do the following for each image or frame of video. If you enabled stream mode, you must create VisionImage objects from CMSampleBuffer s.

Create a VisionImage object using a UIImage or a CMSampleBuffer .

If you use a UIImage , follow these steps:

  • Create a VisionImage object with the UIImage . Make sure to specify the correct .orientation .

    Swift

    let image = VisionImage(image: UIImage)
    visionImage.orientation = image.imageOrientation

    Objective-C

     MLKVisionImage 
      
     * 
     visionImage 
      
     = 
      
     [[ 
     MLKVisionImage 
      
     alloc 
     ] 
      
     initWithImage 
     : 
     image 
     ]; 
     visionImage 
     . 
     orientation 
      
     = 
      
     image 
     . 
     imageOrientation 
     ; 
    

If you use a CMSampleBuffer , follow these steps:

  • Specify the orientation of the image data contained in the CMSampleBuffer .

    To get the image orientation:

    Swift

     func 
      
     imageOrientation 
     ( 
      
     deviceOrientation 
     : 
      
     UIDeviceOrientation 
     , 
      
     cameraPosition 
     : 
      
     AVCaptureDevice 
     . 
     Position 
     ) 
      
     -> 
      
     UIImage 
     . 
     Orientation 
      
     { 
      
     switch 
      
     deviceOrientation 
      
     { 
      
     case 
      
     . 
     portrait 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     . 
     front 
      
     ? 
      
     . 
     leftMirrored 
      
     : 
      
     . 
     right 
      
     case 
      
     . 
     landscapeLeft 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     . 
     front 
      
     ? 
      
     . 
     downMirrored 
      
     : 
      
     . 
     up 
      
     case 
      
     . 
     portraitUpsideDown 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     . 
     front 
      
     ? 
      
     . 
     rightMirrored 
      
     : 
      
     . 
     left 
      
     case 
      
     . 
     landscapeRight 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     . 
     front 
      
     ? 
      
     . 
     upMirrored 
      
     : 
      
     . 
     down 
      
     case 
      
     . 
     faceDown 
     , 
      
     . 
     faceUp 
     , 
      
     . 
     unknown 
     : 
      
     return 
      
     . 
     up 
      
     } 
     } 
      
    

    Objective-C

     - 
      
     ( 
     UIImageOrientation 
     ) 
      
     imageOrientationFromDeviceOrientation 
     :( 
     UIDeviceOrientation 
     ) 
     deviceOrientation 
      
     cameraPosition 
     :( 
     AVCaptureDevicePosition 
     ) 
     cameraPosition 
      
     { 
      
     switch 
      
     (deviceOrientation) 
      
     { 
      
     case 
      
     UIDeviceOrientationPortrait 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     AVCaptureDevicePositionFront 
      
     ? 
      
     UIImageOrientationLeftMirrored 
      
     : 
      
     UIImageOrientationRight 
     ; 
      
     case 
      
     UIDeviceOrientationLandscapeLeft 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     AVCaptureDevicePositionFront 
      
     ? 
      
     UIImageOrientationDownMirrored 
      
     : 
      
     UIImageOrientationUp 
     ; 
      
     case 
      
     UIDeviceOrientationPortraitUpsideDown 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     AVCaptureDevicePositionFront 
      
     ? 
      
     UIImageOrientationRightMirrored 
      
     : 
      
     UIImageOrientationLeft 
     ; 
      
     case 
      
     UIDeviceOrientationLandscapeRight 
     : 
      
     return 
      
     cameraPosition 
      
     == 
      
     AVCaptureDevicePositionFront 
      
     ? 
      
     UIImageOrientationUpMirrored 
      
     : 
      
     UIImageOrientationDown 
     ; 
      
     case 
      
     UIDeviceOrientationUnknown 
     : 
      
     case 
      
     UIDeviceOrientationFaceUp 
     : 
      
     case 
      
     UIDeviceOrientationFaceDown 
     : 
      
     return 
      
     UIImageOrientationUp 
     ; 
      
     } 
     } 
      
    
  • Create a VisionImage object using the CMSampleBuffer object and orientation:

    Swift

     let 
      
     image 
      
     = 
      
     VisionImage 
     ( 
     buffer 
     : 
      
     sampleBuffer 
     ) 
     image 
     . 
     orientation 
      
     = 
      
     imageOrientation 
     ( 
      
     deviceOrientation 
     : 
      
     UIDevice 
     . 
     current 
     . 
     orientation 
     , 
      
     cameraPosition 
     : 
      
     cameraPosition 
     ) 
    

    Objective-C

      
     MLKVisionImage 
      
     * 
     image 
      
     = 
      
     [[ 
     MLKVisionImage 
      
     alloc 
     ] 
      
     initWithBuffer 
     : 
     sampleBuffer 
     ]; 
      
     image 
     . 
     orientation 
      
     = 
      
     [ 
     self 
      
     imageOrientationFromDeviceOrientation 
     : 
     UIDevice 
     . 
     currentDevice 
     . 
     orientation 
      
     cameraPosition 
     : 
     cameraPosition 
     ]; 
    

3. Process the image

Pass the VisionImage object to one of the Segmenter 's image processing methods. You can either use the asynchronous process(image:) method or the synchronous results(in:) method.

To perform segmentation on a selfie image synchronously:

Swift

 var 
  
 mask 
 : 
  
 [ 
 SegmentationMask 
 ] 
 do 
  
 { 
  
 mask 
  
 = 
  
 try 
  
 segmenter 
 . 
 results 
 ( 
 in 
 : 
  
 image 
 ) 
 } 
  
 catch 
  
 let 
  
 error 
  
 { 
  
 print 
 ( 
 "Failed to perform segmentation with error: 
 \( 
 error 
 . 
 localizedDescription 
 ) 
 ." 
 ) 
  
 return 
 } 
 // Success. Get a segmentation mask here. 

Objective-C

 NSError 
  
 * 
 error 
 ; 
 MLKSegmentationMask 
  
 * 
 mask 
  
 = 
  
 [ 
 segmenter 
  
 resultsInImage 
 : 
 image 
  
 error 
 :& 
 error 
 ]; 
 if 
  
 ( 
 error 
  
 != 
  
 nil 
 ) 
  
 { 
  
 // Error. 
  
 return 
 ; 
 } 
 // Success. Get a segmentation mask here. 

To perform segmentation on a selfie image asynchronously:

Swift

 segmenter 
 . 
 process 
 ( 
 image 
 ) 
  
 { 
  
 mask 
 , 
  
 error 
  
 in 
  
 guard 
  
 error 
  
 == 
  
 nil 
  
 else 
  
 { 
  
 // Error. 
  
 return 
  
 } 
  
 // Success. Get a segmentation mask here. 

Objective-C

 [ 
 segmenter 
  
 processImage 
 : 
 image 
  
 completion 
 : 
 ^ 
 ( 
 MLKSegmentationMask 
  
 * 
  
 _Nullable 
  
 mask 
 , 
  
 NSError 
  
 * 
  
 _Nullable 
  
 error 
 ) 
  
 { 
  
 if 
  
 ( 
 error 
  
 != 
  
 nil 
 ) 
  
 { 
  
 // Error. 
  
 return 
 ; 
  
 } 
  
 // Success. Get a segmentation mask here. 
  
 }]; 

4. Get the segmentation mask

You can get the segmentation result as follows:

Swift

 let 
  
 maskWidth 
  
 = 
  
 CVPixelBufferGetWidth 
 ( 
 mask 
 . 
 buffer 
 ) 
 let 
  
 maskHeight 
  
 = 
  
 CVPixelBufferGetHeight 
 ( 
 mask 
 . 
 buffer 
 ) 
 CVPixelBufferLockBaseAddress 
 ( 
 mask 
 . 
 buffer 
 , 
  
 CVPixelBufferLockFlags 
 . 
 readOnly 
 ) 
 let 
  
 maskBytesPerRow 
  
 = 
  
 CVPixelBufferGetBytesPerRow 
 ( 
 mask 
 . 
 buffer 
 ) 
 var 
  
 maskAddress 
  
 = 
  
 CVPixelBufferGetBaseAddress 
 ( 
 mask 
 . 
 buffer 
 ) 
 ! 
 . 
 bindMemory 
 ( 
  
 to 
 : 
  
 Float32 
 . 
 self 
 , 
  
 capacity 
 : 
  
 maskBytesPerRow 
  
 * 
  
 maskHeight 
 ) 
 for 
  
 _ 
  
 in 
  
 0. 
 ..( 
 maskHeight 
  
 - 
  
 1 
 ) 
  
 { 
  
 for 
  
 col 
  
 in 
  
 0. 
 ..( 
 maskWidth 
  
 - 
  
 1 
 ) 
  
 { 
  
 // Gets the confidence of the pixel in the mask being in the foreground. 
  
 let 
  
 foregroundConfidence 
 : 
  
 Float32 
  
 = 
  
 maskAddress 
 [ 
 col 
 ] 
  
 } 
  
 maskAddress 
  
 += 
  
 maskBytesPerRow 
  
 / 
  
 MemoryLayout<Float32> 
 . 
 size 
 } 

Objective-C

 size_t 
  
 width 
  
 = 
  
 CVPixelBufferGetWidth 
 ( 
 mask 
 . 
 buffer 
 ); 
 size_t 
  
 height 
  
 = 
  
 CVPixelBufferGetHeight 
 ( 
 mask 
 . 
 buffer 
 ); 
 CVPixelBufferLockBaseAddress 
 ( 
 mask 
 . 
 buffer 
 , 
  
 kCVPixelBufferLock_ReadOnly 
 ); 
 size_t 
  
 maskBytesPerRow 
  
 = 
  
 CVPixelBufferGetBytesPerRow 
 ( 
 mask 
 . 
 buffer 
 ); 
 float 
  
 * 
 maskAddress 
  
 = 
  
 ( 
 float 
  
 * 
 ) 
 CVPixelBufferGetBaseAddress 
 ( 
 mask 
 . 
 buffer 
 ); 
 for 
  
 ( 
 int 
  
 row 
  
 = 
  
 0 
 ; 
  
 row 
 < 
 height 
 ; 
  
 ++ 
 row 
 ) 
  
 { 
  
 for 
  
 ( 
 int 
  
 col 
  
 = 
  
 0 
 ; 
  
 col 
 < 
 width 
 ; 
  
 ++ 
 col 
 ) 
  
 { 
  
 // Gets the confidence of the pixel in the mask being in the foreground. 
  
 float 
  
 foregroundConfidence 
  
 = 
  
 maskAddress 
 [ 
 col 
 ]; 
  
 } 
  
 maskAddress 
  
 += 
  
 maskBytesPerRow 
  
 / 
  
 sizeof 
 ( 
 float 
 ); 
 } 

For a full example of how to use the segmentation results, please see the ML Kit quickstart sample .

Tips to improve performance

The quality of your results depends on the quality of the input image:

  • For ML Kit to get an accurate segmentation result, the image should be at least 256x256 pixels.
  • If you perform selfie segmentation in a real-time application, you might also want to consider the overall dimensions of the input images. Smaller images can be processed faster, so to reduce latency, capture images at lower resolutions, but keep in mind the above resolution requirements and ensure that the subject occupies as much of the image as possible.
  • Poor image focus can also impact accuracy. If you don't get acceptable results, ask the user to recapture the image.

If you want to use segmentation in a real-time application, follow these guidelines to achieve the best frame rates:

  • Use the stream segmenter mode.
  • Consider capturing images at a lower resolution. However, also keep in mind this API's image dimension requirements.
  • For processing video frames, use the results(in:) synchronous API of the segmenter. Call this method from the AVCaptureVideoDataOutputSampleBufferDelegate 's captureOutput(_, didOutput:from:) function to synchronously get results from the given video frame. Keep AVCaptureVideoDataOutput 's alwaysDiscardsLateVideoFrames as true to throttle calls to the segmenter. If a new video frame becomes available while the segmenter is running, it will be dropped.
  • If you use the output of the segmenter to overlay graphics on the input image, first get the result from ML Kit, then render the image and overlay in a single step. By doing so, you render to the display surface only once for each processed input frame. See the previewOverlayView and CameraViewController classes in the ML Kit quickstart sample for an example.
Create a Mobile Website
View Site in Mobile | Classic
Share by: