Use ARCore as input for Machine Learning models

You can use the camera feed that ARCore captures in a machine learning pipeline to create an intelligent augmented reality experience. The ARCore ML Kit sample demonstrates how to use ML Kit and the Google Cloud Vision API to identify real-world objects. The sample uses a machine learning model to classify objects in the camera's view and attaches a label to the object in the virtual scene.

The ARCore ML Kit sample is written in Kotlin. It is also available as the ml_kotlinsample app in the ARCore SDK GitHub repository.

Use ARCore's CPU image

ARCore captures at least two sets of image streams by default:

  • A CPU image streamused for feature recognition and image processing. By default, the CPU image has a resolution of VGA (640x480). ARCore can be configured to use an additional higher resolution image stream, if required.
  • A GPU texture stream, which contains a high-resolution texture, usually at a resolution of 1080p. This is typically used as a user-facing camera preview. This is stored in the OpenGL texture specified by Session.setCameraTextureName() .
  • Any additional streams specified by SharedCamera.setAppSurfaces() .

CPU image size considerations

No additional cost is incurred if the default VGA-sized CPU stream is used because ARCore uses this stream for world comprehension. Requesting a stream with a different resolution may be expensive, as an additional stream will need to be captured. Keep in mind that a higher resolution may quickly become expensive for your model: doubling the width and height of the image quadruples the amount of pixels in the image.

It may be advantageous to downscale the image, if your model can still perform well on a lower resolution image.

Configure an additional high resolution CPU image stream

The performance of your ML model may depend on the resolution of the image used as input. The resolution of these streams can be adjusted by changing the current CameraConfig using Session.setCameraConfig() , selecting a valid configuration from Session.getSupportedCameraConfigs() .

Java

 CameraConfigFilter 
  
 cameraConfigFilter 
  
 = 
  
 new 
  
 CameraConfigFilter 
 ( 
 session 
 ) 
  
 // World-facing cameras only. 
  
 . 
 setFacingDirection 
 ( 
 CameraConfig 
 . 
 FacingDirection 
 . 
 BACK 
 ); 
 List<CameraConfig> 
  
 supportedCameraConfigs 
  
 = 
  
 session 
 . 
 getSupportedCameraConfigs 
 ( 
 cameraConfigFilter 
 ); 
 // Select an acceptable configuration from supportedCameraConfigs. 
 CameraConfig 
  
 cameraConfig 
  
 = 
  
 selectCameraConfig 
 ( 
 supportedCameraConfigs 
 ); 
 session 
 . 
 setCameraConfig 
 ( 
 cameraConfig 
 ); 

Kotlin

 val 
  
 cameraConfigFilter 
  
 = 
  
 CameraConfigFilter 
 ( 
 session 
 ) 
  
 // World-facing cameras only. 
  
 . 
 setFacingDirection 
 ( 
 CameraConfig 
 . 
 FacingDirection 
 . 
 BACK 
 ) 
 val 
  
 supportedCameraConfigs 
  
 = 
  
 session 
 . 
 getSupportedCameraConfigs 
 ( 
 cameraConfigFilter 
 ) 
 // Select an acceptable configuration from supportedCameraConfigs. 
 val 
  
 cameraConfig 
  
 = 
  
 selectCameraConfig 
 ( 
 supportedCameraConfigs 
 ) 
 session 
 . 
 setCameraConfig 
 ( 
 cameraConfig 
 ) 

Retrieve the CPU image

Retrieve the CPU image using Frame.acquireCameraImage() . These images should be disposed of as soon as they're no longer needed.

Java

 Image 
  
 cameraImage 
  
 = 
  
 null 
 ; 
 try 
  
 { 
  
 cameraImage 
  
 = 
  
 frame 
 . 
 acquireCameraImage 
 (); 
  
 // Process `cameraImage` using your ML inference model. 
 } 
  
 catch 
  
 ( 
 NotYetAvailableException 
  
 e 
 ) 
  
 { 
  
 // NotYetAvailableException is an exception that can be expected when the camera is not ready 
  
 // yet. The image may become available on a next frame. 
 } 
  
 catch 
  
 ( 
 RuntimeException 
  
 e 
 ) 
  
 { 
  
 // A different exception occurred, e.g. DeadlineExceededException, ResourceExhaustedException. 
  
 // Handle this error appropriately. 
  
 handleAcquireCameraImageFailure 
 ( 
 e 
 ); 
 } 
  
 finally 
  
 { 
  
 if 
  
 ( 
 cameraImage 
  
 != 
  
 null 
 ) 
  
 { 
  
 cameraImage 
 . 
 close 
 (); 
  
 } 
 } 

Kotlin

 // NotYetAvailableException is an exception that can be expected when the camera is not ready yet. 
 // Map it to `null` instead, but continue to propagate other errors. 
 fun 
  
 Frame 
 . 
 tryAcquireCameraImage 
 () 
  
 = 
  
 try 
  
 { 
  
 acquireCameraImage 
 () 
  
 } 
  
 catch 
  
 ( 
 e 
 : 
  
 NotYetAvailableException 
 ) 
  
 { 
  
 null 
  
 } 
  
 catch 
  
 ( 
 e 
 : 
  
 RuntimeException 
 ) 
  
 { 
  
 // A different exception occurred, e.g. DeadlineExceededException, ResourceExhaustedException. 
  
 // Handle this error appropriately. 
  
 handleAcquireCameraImageFailure 
 ( 
 e 
 ) 
  
 } 
 // The `use` block ensures the camera image is disposed of after use. 
 frame 
 . 
 tryAcquireCameraImage 
 () 
 ?. 
 use 
  
 { 
  
 image 
  
 - 
>  
 // Process `image` using your ML inference model. 
 } 

Process the CPU image

To process the CPU image, various machine learning libraries can be used.

Display results in your AR scene

Image recognition models often output detected objects by indicating a center point or a bounding polygon representing the detected object.

Using the center point or center of the bounding box that is output from the model, it's possible to attach an anchor to the detected object. Use Frame.hitTest() to estimate the pose of an object in the virtual scene.

Convert IMAGE_PIXELS coordinates to VIEW coordinates:

Java

 // Suppose `mlResult` contains an (x, y) of a given point on the CPU image. 
 float 
 [] 
  
 cpuCoordinates 
  
 = 
  
 new 
  
 float 
 [] 
  
 { 
 mlResult 
 . 
 getX 
 (), 
  
 mlResult 
 . 
 getY 
 ()}; 
 float 
 [] 
  
 viewCoordinates 
  
 = 
  
 new 
  
 float 
 [ 
 2 
 ] 
 ; 
 frame 
 . 
 transformCoordinates2d 
 ( 
  
 Coordinates2d 
 . 
 IMAGE_PIXELS 
 , 
  
 cpuCoordinates 
 , 
  
 Coordinates2d 
 . 
 VIEW 
 , 
  
 viewCoordinates 
 ); 
 // `viewCoordinates` now contains coordinates suitable for hit testing. 

Kotlin

 // Suppose `mlResult` contains an (x, y) of a given point on the CPU image. 
 val 
  
 cpuCoordinates 
  
 = 
  
 floatArrayOf 
 ( 
 mlResult 
 . 
 x 
 , 
  
 mlResult 
 . 
 y 
 ) 
 val 
  
 viewCoordinates 
  
 = 
  
 FloatArray 
 ( 
 2 
 ) 
 frame 
 . 
 transformCoordinates2d 
 ( 
  
 Coordinates2d 
 . 
 IMAGE_PIXELS 
 , 
  
 cpuCoordinates 
 , 
  
 Coordinates2d 
 . 
 VIEW 
 , 
  
 viewCoordinates 
 ) 
 // `viewCoordinates` now contains coordinates suitable for hit testing. 

Use these VIEW coordinates to conduct a hit test and create an anchor from the result:

Java

 List<HitResult> 
  
 hits 
  
 = 
  
 frame 
 . 
 hitTest 
 ( 
 viewCoordinates 
 [ 
 0 
 ] 
 , 
  
 viewCoordinates 
 [ 
 1 
 ] 
 ); 
 HitResult 
  
 depthPointResult 
  
 = 
  
 null 
 ; 
 for 
  
 ( 
 HitResult 
  
 hit 
  
 : 
  
 hits 
 ) 
  
 { 
  
 if 
  
 ( 
 hit 
 . 
 getTrackable 
 () 
  
 instanceof 
  
 DepthPoint 
 ) 
  
 { 
  
 depthPointResult 
  
 = 
  
 hit 
 ; 
  
 break 
 ; 
  
 } 
 } 
 if 
  
 ( 
 depthPointResult 
  
 != 
  
 null 
 ) 
  
 { 
  
 Anchor 
  
 anchor 
  
 = 
  
 depthPointResult 
 . 
 getTrackable 
 (). 
 createAnchor 
 ( 
 depthPointResult 
 . 
 getHitPose 
 ()); 
  
 // This anchor will be attached to the scene with stable tracking. 
  
 // It can be used as a position for a virtual object, with a rotation prependicular to the 
  
 // estimated surface normal. 
 } 

Kotlin

 val 
  
 hits 
  
 = 
  
 frame 
 . 
 hitTest 
 ( 
 viewCoordinates 
 [ 
 0 
 ] 
 , 
  
 viewCoordinates 
 [ 
 1 
 ] 
 ) 
 val 
  
 depthPointResult 
  
 = 
  
 hits 
 . 
 filter 
  
 { 
  
 it 
 . 
 trackable 
  
 is 
  
 DepthPoint 
  
 }. 
 firstOrNull 
 () 
 if 
  
 ( 
 depthPointResult 
  
 != 
  
 null 
 ) 
  
 { 
  
 val 
  
 anchor 
  
 = 
  
 depthPointResult 
 . 
 trackable 
 . 
 createAnchor 
 ( 
 depthPointResult 
 . 
 hitPose 
 ) 
  
 // This anchor will be attached to the scene with stable tracking. 
  
 // It can be used as a position for a virtual object, with a rotation prependicular to the 
  
 // estimated surface normal. 
 } 

Performance considerations

Follow the following recommendations to save processing power and consume less energy:

  • Do not run your ML model on every incoming frame. Consider running object detection at a low framerate instead.
  • Consider an online ML inference model to reduce computational complexity.

Next steps

Create a Mobile Website
View Site in Mobile | Classic
Share by: