Detect poses with ML Kit on iOS

Page Summary

ML Kit offers two pose detection SDKs: PoseDetection (faster) and PoseDetectionAccurate (more precise), with varying app size and performance impacts.
This API is in beta and may have breaking changes, with no SLA or deprecation policy.
Developers can choose between two detection modes: stream for real-time video and singleImage for static images.
Input image quality and resolution significantly affect pose detection accuracy and performance.
Refer to provided code samples and guidelines for integration, pose landmark access, and performance optimization.

ML Kit provides two optimized SDKs for pose detection.

SDK Name	PoseDetection	PoseDetectionAccurate
Implementation	Assets for base detector are statically linked to your app at build time.	Assets for accurate detector are statically linked to your app at build time.
App size	Up to 29.6MB	Up to 33.2MB
Performance	iPhone X: ~45FPS	iPhone X: ~29FPS

Try it out

Play around with the sample app to see an example usage of this API.

Before you begin

Include the following ML Kit pods in your Podfile:

  # 
  
 If 
  
 you 
  
 want 
  
 to 
  
 use 
  
 the 
  
 base 
  
 implementation 
 : 
 pod 
  
 ' 
 GoogleMLKit 
 / 
 PoseDetection 
 ' 
 , 
  
 ' 
 8.0.0 
 ' 
 # 
  
 If 
  
 you 
  
 want 
  
 to 
  
 use 
  
 the 
  
 accurate 
  
 implementation 
 : 
 pod 
  
 ' 
 GoogleMLKit 
 / 
 PoseDetectionAccurate 
 ' 
 , 
  
 ' 
 8.0.0 
 '

After you install or update your project’s pods, open your Xcode project using its xcworkspace . ML Kit is supported in Xcode version 13.2.1 or higher.

1. Create an instance of `PoseDetector`

To detect a pose in an image, first create an instance of PoseDetector and optionally specify the detector settings.

`PoseDetector` options

Detection Mode

The PoseDetector operates in two detection modes. Be sure you choose the one that matches your use case.

stream (default): The pose detector will first detect the most prominent person in the image and then run pose detection. In subsequent frames, the person-detection step will not be conducted unless the person becomes obscured or is no longer detected with high confidence. The pose detector will attempt to track the most-prominent person and return their pose in each inference. This reduces latency and smooths detection. Use this mode when you want to detect pose in a video stream.
singleImage: The pose detector will detect a person and then run pose detection. The person-detection step will run for every image, so latency will be higher, and there is no person-tracking. Use this mode when using pose detection on static images or where tracking is not desired.

Specify the pose detector options:

Swift

 // Base pose detector with streaming, when depending on the PoseDetection SDK 
 let 
  
 options 
  
 = 
  
 PoseDetectorOptions 
 () 
 options 
 . 
 detectorMode 
  
 = 
  
 . 
 stream 
 // Accurate pose detector on static images, when depending on the 
 // PoseDetectionAccurate SDK 
 let 
  
 options 
  
 = 
  
 AccuratePoseDetectorOptions 
 () 
 options 
 . 
 detectorMode 
  
 = 
  
 . 
 singleImage

Objective-C

 // Base pose detector with streaming, when depending on the PoseDetection SDK 
 MLKPoseDetectorOptions 
  
 * 
 options 
  
 = 
  
 [[ 
 MLKPoseDetectorOptions 
  
 alloc 
 ] 
  
 init 
 ]; 
 options 
 . 
 detectorMode 
  
 = 
  
 MLKPoseDetectorModeStream 
 ; 
 // Accurate pose detector on static images, when depending on the 
 // PoseDetectionAccurate SDK 
 MLKAccuratePoseDetectorOptions 
  
 * 
 options 
  
 = 
  
 [[ 
 MLKAccuratePoseDetectorOptions 
  
 alloc 
 ] 
  
 init 
 ]; 
 options 
 . 
 detectorMode 
  
 = 
  
 MLKPoseDetectorModeSingleImage 
 ;

Finally, get an instance of PoseDetector . Pass the options you specified:

Swift

 let 
  
 poseDetector 
  
 = 
  
 PoseDetector 
 . 
 poseDetector 
 ( 
 options 
 : 
  
 options 
 )

Objective-C

 MLKPoseDetector 
  
 * 
 poseDetector 
  
 = 
  
 [ 
 MLKPoseDetector 
  
 poseDetectorWithOptions 
 : 
 options 
 ];

2. Prepare the input image

To detect poses, do the following for each image or frame of video. If you enabled stream mode, you must create VisionImage objects from CMSampleBuffer s.

Create a VisionImage object using a UIImage or a CMSampleBuffer .

If you use a UIImage , follow these steps:

Create a VisionImage object with the UIImage . Make sure to specify the correct .orientation .

Swift

let image = VisionImage(image: UIImage)
visionImage.orientation = image.imageOrientation

Objective-C

 MLKVisionImage 
  
 * 
 visionImage 
  
 = 
  
 [[ 
 MLKVisionImage 
  
 alloc 
 ] 
  
 initWithImage 
 : 
 image 
 ]; 
 visionImage 
 . 
 orientation 
  
 = 
  
 image 
 . 
 imageOrientation 
 ;

If you use a CMSampleBuffer , follow these steps:

Specify the orientation of the image data contained in the CMSampleBuffer .

To get the image orientation:

Swift

 func 
  
 imageOrientation 
 ( 
  
 deviceOrientation 
 : 
  
 UIDeviceOrientation 
 , 
  
 cameraPosition 
 : 
  
 AVCaptureDevice 
 . 
 Position 
 ) 
  
 -> 
  
 UIImage 
 . 
 Orientation 
  
 { 
  
 switch 
  
 deviceOrientation 
  
 { 
  
 case 
  
 . 
 portrait 
 : 
  
 return 
  
 cameraPosition 
  
 == 
  
 . 
 front 
  
 ? 
  
 . 
 leftMirrored 
  
 : 
  
 . 
 right 
  
 case 
  
 . 
 landscapeLeft 
 : 
  
 return 
  
 cameraPosition 
  
 == 
  
 . 
 front 
  
 ? 
  
 . 
 downMirrored 
  
 : 
  
 . 
 up 
  
 case 
  
 . 
 portraitUpsideDown 
 : 
  
 return 
  
 cameraPosition 
  
 == 
  
 . 
 front 
  
 ? 
  
 . 
 rightMirrored 
  
 : 
  
 . 
 left 
  
 case 
  
 . 
 landscapeRight 
 : 
  
 return 
  
 cameraPosition 
  
 == 
  
 . 
 front 
  
 ? 
  
 . 
 upMirrored 
  
 : 
  
 . 
 down 
  
 case 
  
 . 
 faceDown 
 , 
  
 . 
 faceUp 
 , 
  
 . 
 unknown 
 : 
  
 return 
  
 . 
 up 
  
 } 
 }

Objective-C

 - 
  
 ( 
 UIImageOrientation 
 ) 
  
 imageOrientationFromDeviceOrientation 
 :( 
 UIDeviceOrientation 
 ) 
 deviceOrientation 
  
 cameraPosition 
 :( 
 AVCaptureDevicePosition 
 ) 
 cameraPosition 
  
 { 
  
 switch 
  
 (deviceOrientation) 
  
 { 
  
 case 
  
 UIDeviceOrientationPortrait 
 : 
  
 return 
  
 cameraPosition 
  
 == 
  
 AVCaptureDevicePositionFront 
  
 ? 
  
 UIImageOrientationLeftMirrored 
  
 : 
  
 UIImageOrientationRight 
 ; 
  
 case 
  
 UIDeviceOrientationLandscapeLeft 
 : 
  
 return 
  
 cameraPosition 
  
 == 
  
 AVCaptureDevicePositionFront 
  
 ? 
  
 UIImageOrientationDownMirrored 
  
 : 
  
 UIImageOrientationUp 
 ; 
  
 case 
  
 UIDeviceOrientationPortraitUpsideDown 
 : 
  
 return 
  
 cameraPosition 
  
 == 
  
 AVCaptureDevicePositionFront 
  
 ? 
  
 UIImageOrientationRightMirrored 
  
 : 
  
 UIImageOrientationLeft 
 ; 
  
 case 
  
 UIDeviceOrientationLandscapeRight 
 : 
  
 return 
  
 cameraPosition 
  
 == 
  
 AVCaptureDevicePositionFront 
  
 ? 
  
 UIImageOrientationUpMirrored 
  
 : 
  
 UIImageOrientationDown 
 ; 
  
 case 
  
 UIDeviceOrientationUnknown 
 : 
  
 case 
  
 UIDeviceOrientationFaceUp 
 : 
  
 case 
  
 UIDeviceOrientationFaceDown 
 : 
  
 return 
  
 UIImageOrientationUp 
 ; 
  
 } 
 }

Create a VisionImage object using the CMSampleBuffer object and orientation:

Swift

 let 
  
 image 
  
 = 
  
 VisionImage 
 ( 
 buffer 
 : 
  
 sampleBuffer 
 ) 
 image 
 . 
 orientation 
  
 = 
  
 imageOrientation 
 ( 
  
 deviceOrientation 
 : 
  
 UIDevice 
 . 
 current 
 . 
 orientation 
 , 
  
 cameraPosition 
 : 
  
 cameraPosition 
 )

Objective-C

  
 MLKVisionImage 
  
 * 
 image 
  
 = 
  
 [[ 
 MLKVisionImage 
  
 alloc 
 ] 
  
 initWithBuffer 
 : 
 sampleBuffer 
 ]; 
  
 image 
 . 
 orientation 
  
 = 
  
 [ 
 self 
  
 imageOrientationFromDeviceOrientation 
 : 
 UIDevice 
 . 
 currentDevice 
 . 
 orientation 
  
 cameraPosition 
 : 
 cameraPosition 
 ];

3. Process the image

Pass the VisionImage to one of the pose detector's image processing methods. You can either use the asynchronous process(image:) method or the synchronous results() method.

To detect objects synchronously:

Swift

 var 
  
 results 
 : 
  
 [ 
 Pose 
 ] 
 do 
  
 { 
  
 results 
  
 = 
  
 try 
  
 poseDetector 
 . 
 results 
 ( 
 in 
 : 
  
 image 
 ) 
 } 
  
 catch 
  
 let 
  
 error 
  
 { 
  
 print 
 ( 
 "Failed to detect pose with error: 
 \( 
 error 
 . 
 localizedDescription 
 ) 
 ." 
 ) 
  
 return 
 } 
 guard 
  
 let 
  
 detectedPoses 
  
 = 
  
 results 
 , 
  
 ! 
 detectedPoses 
 . 
 isEmpty 
  
 else 
  
 { 
  
 print 
 ( 
 "Pose detector returned no results." 
 ) 
  
 return 
 } 
 // Success. Get pose landmarks here.

Objective-C

 NSError 
  
 * 
 error 
 ; 
 NSArray 
   
 * 
 poses 
  
 = 
  
 [ 
 poseDetector 
  
 resultsInImage 
 : 
 image 
  
 error 
 :& 
 error 
 ]; 
 if 
  
 ( 
 error 
  
 != 
  
 nil 
 ) 
  
 { 
  
 // Error. 
  
 return 
 ; 
 } 
 if 
  
 ( 
 poses 
 . 
 count 
  
 == 
  
 0 
 ) 
  
 { 
  
 // No pose detected. 
  
 return 
 ; 
 } 
 // Success. Get pose landmarks here.

To detect objects asynchronously:

Swift

 poseDetector 
 . 
 process 
 ( 
 image 
 ) 
  
 { 
  
 detectedPoses 
 , 
  
 error 
  
 in 
  
 guard 
  
 error 
  
 == 
  
 nil 
  
 else 
  
 { 
  
 // Error. 
  
 return 
  
 } 
  
 guard 
  
 ! 
 detectedPoses 
 . 
 isEmpty 
  
 else 
  
 { 
  
 // No pose detected. 
  
 return 
  
 } 
  
 // Success. Get pose landmarks here. 
 }

Objective-C

 [ 
 poseDetector 
  
 processImage 
 : 
 image 
  
 completion 
 : 
 ^ 
 ( 
 NSArray 
   
 * 
  
 _Nullable 
  
 poses 
 , 
  
 NSError 
  
 * 
  
 _Nullable 
  
 error 
 ) 
  
 { 
  
 if 
  
 ( 
 error 
  
 != 
  
 nil 
 ) 
  
 { 
  
 // Error. 
  
 return 
 ; 
  
 } 
  
 if 
  
 ( 
 poses 
 . 
 count 
  
 == 
  
 0 
 ) 
  
 { 
  
 // No pose detected. 
  
 return 
 ; 
  
 } 
  
 // Success. Get pose landmarks here. 
  
 }];

4. Get information about the detected pose

If a person is detected in the image, the pose detection API either passes an array of Pose objects to the completion handler or returns the array, depending on whether you called the asynchronous or synchronous method.

If the person was not completely inside the image, the model assigns the missing landmarks coordinates outside the frame and gives them low InFrameConfidence values.

If no person was detected the array is empty.

Swift

 for 
  
 pose 
  
 in 
  
 detectedPoses 
  
 { 
  
 let 
  
 leftAnkleLandmark 
  
 = 
  
 pose 
 . 
 landmark 
 ( 
 ofType 
 : 
  
 . 
 leftAnkle 
 ) 
  
 if 
  
 leftAnkleLandmark 
 . 
 inFrameLikelihood 
  
 > 
  
 0.5 
  
 { 
  
 let 
  
 position 
  
 = 
  
 leftAnkleLandmark 
 . 
 position 
  
 } 
 }

Objective-C

 for 
  
 ( 
 MLKPose 
  
 * 
 pose 
  
 in 
  
 detectedPoses 
 ) 
  
 { 
  
 MLKPoseLandmark 
  
 * 
 leftAnkleLandmark 
  
 = 
  
 [ 
 pose 
  
 landmarkOfType 
 : 
 MLKPoseLandmarkTypeLeftAnkle 
 ]; 
  
 if 
  
 ( 
 leftAnkleLandmark 
 . 
 inFrameLikelihood 
  
 > 
  
 0.5 
 ) 
  
 { 
  
 MLKVision3DPoint 
  
 * 
 position 
  
 = 
  
 leftAnkleLandmark 
 . 
 position 
 ; 
  
 } 
 }

Tips to improve performance

The quality of your results depends on the quality of the input image:

For ML Kit to accurately detect pose, the person in the image should be represented by sufficient pixel data; for best performance, the subject should be at least 256x256 pixels.
If you detect pose in a real-time application, you might also want to consider the overall dimensions of the input images. Smaller images can be processed faster, so to reduce latency, capture images at lower resolutions, but keep in mind the above resolution requirements and ensure that the subject occupies as much of the image as possible.
Poor image focus can also impact accuracy. If you don't get acceptable results, ask the user to recapture the image.

If you want to use pose detection in a real-time application, follow these guidelines to achieve the best framerates:

Use the base PoseDetection SDK and stream detection mode.
Consider capturing images at a lower resolution. However, also keep in mind this API's image dimension requirements.
For processing video frames, use the results(in:) synchronous API of the detector. Call this method from the AVCaptureVideoDataOutputSampleBufferDelegate 's captureOutput(_, didOutput:from:) function to synchronously get results from the given video frame. Keep AVCaptureVideoDataOutput 's alwaysDiscardsLateVideoFrames as true to throttle calls to the detector. If a new video frame becomes available while the detector is running, it will be dropped.
If you use the output of the detector to overlay graphics on the input image, first get the result from ML Kit, then render the image and overlay in a single step. By doing so, you render to the display surface only once for each processed input frame. See the previewOverlayView and MLKDetectionOverlayView classes in the showcase sample app for an example.

Next steps

To learn how to use pose landmarks to classify poses, see Pose Classification Tips .
See the ML Kit quickstart sample on GitHub for an example of this API in use.

Detect poses with ML Kit on iOS Stay organized with collections Save and categorize content based on your preferences.

Page Summary

Try it out

Before you begin

1. Create an instance of PoseDetector

PoseDetector options

Detection Mode

Swift

Objective-C

Swift

Objective-C

2. Prepare the input image

Swift

Objective-C

Swift

Objective-C

Swift

Objective-C

3. Process the image

Swift

Objective-C

Swift

Objective-C

4. Get information about the detected pose

Swift

Objective-C

Tips to improve performance

Next steps

Detect poses with ML Kit on iOS

1. Create an instance of `PoseDetector`

`PoseDetector` options