MediaPipeTasksVision Framework Reference

  
  
  BaseOptions

Holds the base options that is used for creation of any type of task. It has fields with important information acceleration configuration, TFLite model source etc.

Declaration

Swift

  class 
 BaseOptions 
 : 
 NSObject 
 , 
 NSCopying

  
  
  ResultCategory

Category is a util class that contains a label, its display name, a float value as score, and the index of the label in the corresponding label file. Typically it’s used as the result of classification tasks.

Declaration

Swift

  class 
 ResultCategory 
 : 
 NSObject

  
  
  Classifications

Represents the list of classification for a given classifier head. Typically used as a result for classification tasks.

Declaration

Swift

  class 
 Classifications 
 : 
 NSObject

  
  
  ClassificationResult

Represents the classification results of a model. Typically used as a result for classification tasks.

Declaration

Swift

  class 
 ClassificationResult 
 : 
 NSObject

  
  
  ClassifierOptions

Classifier options shared across MediaPipe iOS classification tasks.

Declaration

Swift

  class 
 ClassifierOptions 
 : 
 NSObject 
 , 
 NSCopying

  
  
  Connection

The value class representing a landmark connection.

Declaration

Swift

  class 
 Connection 
 : 
 NSObject

  
  
  NormalizedKeypoint

Normalized keypoint represents a point in 2D space with x, y coordinates. x and y are normalized to [0.0, 1.0] by the image width and height respectively.

Declaration

Swift

  class 
 NormalizedKeypoint 
 : 
 NSObject

  
  
  Detection

Represents one detected object in the results of ObjectDetector .

Declaration

Swift

  class 
 Detection 
 : 
 NSObject

  
  
  Embedding

Represents the embedding for a given embedder head. Typically used in embedding tasks.

One and only one of the two ‘floatEmbedding’ and ‘quantizedEmbedding’ will contain data, based on whether or not the embedder was configured to perform scala quantization.

Declaration

Swift

  class 
 Embedding 
 : 
 NSObject

  
  
  EmbeddingResult

Represents the embedding results of a model. Typically used as a result for embedding tasks.

Declaration

Swift

  class 
 EmbeddingResult 
 : 
 NSObject

  
  
  FaceDetector

@brief Class that performs face detection on images.

The API expects a TFLite model with mandatory TFLite Model Metadata.

The API supports models with one image input tensor and one or more output tensors. To be more specific, here are the requirements:

Input tensor (kTfLiteUInt8/kTfLiteFloat32)

image input of size [batch x height x width x channels] .
batch inference is not supported ( batch is required to be 1).
only RGB inputs are supported ( channels is required to be 3).
if type is kTfLiteFloat32, NormalizationOptions are required to be attached to the metadata for input normalization.

Output tensors must be the 4 outputs of a DetectionPostProcess op, i.e:(kTfLiteFloat32) (kTfLiteUInt8/kTfLiteFloat32)

locations tensor of size [num_results x 4] , the inner array representing bounding boxes in the form [top, left, right, bottom].
BoundingBoxProperties are required to be attached to the metadata and must specify type=BOUNDARIES and coordinate_type=RATIO. (kTfLiteFloat32)
classes tensor of size [num_results] , each value representing the integer index of a class.
scores tensor of size [num_results] , each value representing the score of the detected face.
optional score calibration can be attached using ScoreCalibrationOptions and an AssociatedFile with type TENSOR_AXIS_SCORE_CALIBRATION. See metadata_schema.fbs [1] for more details. (kTfLiteFloat32)
integer num_results as a tensor of size [1]

Declaration

Swift

  class 
 FaceDetector 
 : 
 NSObject

  
  
  FaceDetectorOptions

Options for setting up a FaceDetector .

Declaration

Swift

  class 
 FaceDetectorOptions 
 : 
  TaskOptions 
 
 , 
 NSCopying

  
  
  FaceDetectorResult

Represents the detection results generated by FaceDetector .

Declaration

Swift

  class 
 FaceDetectorResult 
 : 
  TaskResult

  
  
  FaceLandmarker

@brief Class that performs face landmark detection on images.

The API expects a TFLite model with mandatory TFLite Model Metadata.

Declaration

Swift

  class 
 FaceLandmarker 
 : 
 NSObject

  
  
  FaceLandmarkerOptions

Options for setting up a FaceLandmarker .

Declaration

Swift

  class 
 FaceLandmarkerOptions 
 : 
  TaskOptions 
 
 , 
 NSCopying

  
  
  TransformMatrix

A matrix that can be used for tansformations.

Declaration

Swift

  class 
 TransformMatrix 
 : 
 NSObject

  
  
  FaceLandmarkerResult

Represents the detection results generated by FaceLandmarker .

Declaration

Swift

  class 
 FaceLandmarkerResult 
 : 
  TaskResult

  
  
  FaceStylizer

Class that performs face stylization on images.

Declaration

Swift

  class 
 FaceStylizer 
 : 
 NSObject

  
  
  FaceStylizerOptions

Options for setting up a FaceStylizer .

Declaration

Swift

  class 
 FaceStylizerOptions 
 : 
  TaskOptions 
 
 , 
 NSCopying

  
  
  FaceStylizerResult

Represents the stylized image generated by FaceStylizer .

Declaration

Swift

  class 
 FaceStylizerResult 
 : 
  TaskResult

  
  
  GestureRecognizer

@brief Performs gesture recognition on images.

This API expects a pre-trained TFLite hand gesture recognizer model or a custom one created using MediaPipe Solutions Model Maker. See https://developers.google.com/mediapipe/solutions/model_maker .

Declaration

Swift

  class 
 GestureRecognizer 
 : 
 NSObject

  
  
  GestureRecognizerOptions

Options for setting up a GestureRecognizer .

Declaration

Swift

  class 
 GestureRecognizerOptions 
 : 
  TaskOptions 
 
 , 
 NSCopying

  
  
  GestureRecognizerResult

Represents the gesture recognition results generated by GestureRecognizer .

Declaration

Swift

  class 
 GestureRecognizerResult 
 : 
  TaskResult

  
  
  HandLandmarker

@brief Performs hand landmarks detection on images.

This API expects a pre-trained hand landmarks model asset bundle.

Declaration

Swift

  class 
 HandLandmarker 
 : 
 NSObject

  
  
  HandLandmarkerOptions

Options for setting up a HandLandmarker .

Declaration

Swift

  class 
 HandLandmarkerOptions 
 : 
  TaskOptions 
 
 , 
 NSCopying

  
  
  HandLandmarkerResult

Represents the hand landmarker results generated by HandLandmarker .

Declaration

Swift

  class 
 HandLandmarkerResult 
 : 
  TaskResult

  
  
  MPImage

An image used in on-device machine learning using MediaPipe Task library.

Declaration

Swift

  class 
 MPImage 
 : 
 NSObject

  
  
  ImageClassifier

@brief Performs classification on images.

The API expects a TFLite model with optional, but strongly recommended, TFLite Model Metadata. .

The API supports models with one image input tensor and one or more output tensors. To be more specific, here are the requirements.

Input tensor (kTfLiteUInt8/kTfLiteFloat32)

image input of size [batch x height x width x channels] .
batch inference is not supported ( batch is required to be 1).
only RGB inputs are supported ( channels is required to be 3).
if type is kTfLiteFloat32, NormalizationOptions are required to be attached to the metadata for input normalization.

At least one output tensor with: (kTfLiteUInt8/kTfLiteFloat32)

N classes and either 2 or 4 dimensions, i.e. [1 x N] or [1 x 1 x 1 x N]
optional (but recommended) label map(s) as AssociatedFiles with type TENSOR_AXIS_LABELS, containing one label per line. The first such AssociatedFile (if any) is used to fill the class_name field of the results. The display_name field is filled from the AssociatedFile (if any) whose locale matches the display_names_locale field of the ImageClassifierOptions used at creation time (“en” by default, i.e. English). If none of these are available, only the index field of the results will be filled.
optional score calibration can be attached using ScoreCalibrationOptions and an AssociatedFile with type TENSOR_AXIS_SCORE_CALIBRATION. See metadata_schema.fbs [1] for more details.

Declaration

Swift

  class 
 ImageClassifier 
 : 
 NSObject

  
  
  ImageClassifierOptions

Options for setting up a ImageClassifier .

Declaration

Swift

  class 
 ImageClassifierOptions 
 : 
  TaskOptions 
 
 , 
 NSCopying

  
  
  ImageClassifierResult

Represents the classification results generated by ImageClassifier . *

Declaration

Swift

  class 
 ImageClassifierResult 
 : 
  TaskResult

  
  
  ImageEmbedder

@brief Performs embedding extraction on images.

The API expects a TFLite model with optional, but strongly recommended, TFLite Model Metadata. .

The API supports models with one image input tensor and one or more output tensors. To be more specific, here are the requirements.

Input image tensor (kTfLiteUInt8/kTfLiteFloat32)

image input of size [batch x height x width x channels] .
batch inference is not supported ( batch is required to be 1).
only RGB inputs are supported ( channels is required to be 3).
if type is kTfLiteFloat32, NormalizationOptions are required to be attached to the metadata for input normalization.

At least one output tensor (kTfLiteUInt8/kTfLiteFloat32) with shape [1 x N] where N is the number of dimensions in the produced embeddings.

Declaration

Swift

  class 
 ImageEmbedder 
 : 
 NSObject

  
  
  ImageEmbedderOptions

Options for setting up a ImageEmbedder .

Declaration

Swift

  class 
 ImageEmbedderOptions 
 : 
  TaskOptions 
 
 , 
 NSCopying

  
  
  ImageEmbedderResult

Represents the embedding results generated by ImageEmbedder . *

Declaration

Swift

  class 
 ImageEmbedderResult 
 : 
  TaskResult

  
  
  ImageSegmenter

@brief Class that performs segmentation on images.

The API expects a TFLite model with mandatory TFLite Model Metadata.

Declaration

Swift

  class 
 ImageSegmenter 
 : 
 NSObject

  
  
  ImageSegmenterOptions

Options for setting up a ImageSegmenter .

Declaration

Swift

  class 
 ImageSegmenterOptions 
 : 
  TaskOptions 
 
 , 
 NSCopying

  
  
  ImageSegmenterResult

Represents the segmentation results generated by ImageSegmenter .

Declaration

Swift

  class 
 ImageSegmenterResult 
 : 
  TaskResult

  
  
  InteractiveSegmenter

@brief Class that performs interactive segmentation on images.

Users can represent user interaction through RegionOfInterest , which gives a hint to InteractiveSegmenter to perform segmentation focusing on the given region of interest.

The API expects a TFLite model with mandatory TFLite Model Metadata.

Input tensor: (kTfLiteUInt8/kTfLiteFloat32)

image input of size [batch x height x width x channels] .
batch inference is not supported ( batch is required to be 1).
RGB and greyscale inputs are supported ( channels is required to be 1 or 3).
if type is kTfLiteFloat32, NormalizationOptions are required to be attached to the metadata for input normalization. Output tensors: (kTfLiteUInt8/kTfLiteFloat32)
list of segmented masks.
if output_type is CATEGORY_MASK, uint8 Image, Image vector of size 1.
if output_type is CONFIDENCE_MASK, float32 Image list of size channels .
batch is always 1.

An example of such model can be found at: https://tfhub.dev/tensorflow/lite-model/deeplabv3/1/metadata/2

Declaration

Swift

  class 
 InteractiveSegmenter 
 : 
 NSObject

  
  
  InteractiveSegmenterOptions

Options for setting up a InteractiveSegmenter .

Declaration

Swift

  class 
 InteractiveSegmenterOptions 
 : 
  TaskOptions 
 
 , 
 NSCopying

  
  
  InteractiveSegmenterResult

Represents the segmentation results generated by ImageSegmenter .

Declaration

Swift

  class 
 InteractiveSegmenterResult 
 : 
  TaskResult

  
  
  Landmark

Landmark represents a point in 3D space with x, y, z coordinates. The landmark coordinates are in meters. z represents the landmark depth, and the smaller the value the closer the world landmark is to the camera.

Declaration

Swift

  class 
 Landmark 
 : 
 NSObject

  
  
  NormalizedLandmark

Normalized Landmark represents a point in 3D space with x, y, z coordinates. x and y are normalized to [0.0, 1.0] by the image width and height respectively. z represents the landmark depth, and the smaller the value the closer the landmark is to the camera. The magnitude of z uses roughly the same scale as x.

Declaration

Swift

  class 
 NormalizedLandmark 
 : 
 NSObject

  
  
  Mask

The wrapper class for MediaPipe segmentation masks.

Masks are stored as UInt8 * or float * objects. Every mask has an underlying type which can be accessed using dataType . You can access the mask as any other type using the appropriate properties. For example, if the underlying type is uInt8 , in addition to accessing the mask using uint8Data , you can access float32Data to get the 32 bit float data (with values ranging from 0.0 to 1.0). The first time you access the data as a type different from the underlying type, an expensive type conversion is performed. Subsequent accesses return a pointer to the memory location for the same type converted array. As type conversions can be expensive, it is recommended to limit the accesses to data of types different from the underlying type.

Masks that are returned from a MediaPipe Tasks are owned by by the underlying C++ Task. If you need to extend the lifetime of these objects, you can invoke the copy() method.

Declaration

Swift

  class 
 Mask 
 : 
 NSObject 
 , 
 NSCopying

  
  
  ObjectDetector

@brief Class that performs object detection on images.

The API expects a TFLite model with mandatory TFLite Model Metadata.

The API supports models with one image input tensor and one or more output tensors. To be more specific, here are the requirements:

Input tensor (kTfLiteUInt8/kTfLiteFloat32)

image input of size [batch x height x width x channels] .
batch inference is not supported ( batch is required to be 1).
only RGB inputs are supported ( channels is required to be 3).
if type is kTfLiteFloat32, NormalizationOptions are required to be attached to the metadata for input normalization.

Output tensors must be the 4 outputs of a DetectionPostProcess op, i.e:(kTfLiteFloat32) (kTfLiteUInt8/kTfLiteFloat32)

locations tensor of size [num_results x 4] , the inner array representing bounding boxes in the form [top, left, right, bottom].
BoundingBoxProperties are required to be attached to the metadata and must specify type=BOUNDARIES and coordinate_type=RATIO. (kTfLiteFloat32)
classes tensor of size [num_results] , each value representing the integer index of a class.
optional (but recommended) label map(s) can be attached as AssociatedFiles with type TENSOR_VALUE_LABELS, containing one label per line. The first such AssociatedFile (if any) is used to fill the class_name field of the results. The display_name field is filled from the AssociatedFile (if any) whose locale matches the display_names_locale field of the ObjectDetectorOptions used at creation time (“en” by default, i.e. English). If none of these are available, only the index field of the results will be filled. (kTfLiteFloat32)
scores tensor of size [num_results] , each value representing the score of the detected object.
optional score calibration can be attached using ScoreCalibrationOptions and an AssociatedFile with type TENSOR_AXIS_SCORE_CALIBRATION. See metadata_schema.fbs [1] for more details. (kTfLiteFloat32)
integer num_results as a tensor of size [1]

Declaration

Swift

  class 
 ObjectDetector 
 : 
 NSObject

  
  
  ObjectDetectorOptions

Options for setting up a ObjectDetector .

Declaration

Swift

  class 
 ObjectDetectorOptions 
 : 
  TaskOptions 
 
 , 
 NSCopying

  
  
  ObjectDetectorResult

Represents the detection results generated by ObjectDetector .

Declaration

Swift

  class 
 ObjectDetectorResult 
 : 
  TaskResult

  
  
  PoseLandmarker

@brief Performs pose landmarks detection on images.

This API expects a pre-trained pose landmarks model asset bundle.

Declaration

Swift

  class 
 PoseLandmarker 
 : 
 NSObject

  
  
  PoseLandmarkerOptions

Options for setting up a PoseLandmarker .

Declaration

Swift

  class 
 PoseLandmarkerOptions 
 : 
  TaskOptions 
 
 , 
 NSCopying

  
  
  PoseLandmarkerResult

Represents the pose landmarks deection results generated by PoseLandmarker .

Declaration

Swift

  class 
 PoseLandmarkerResult 
 : 
  TaskResult

  
  
  RegionOfInterest

The Region-Of-Interest (ROI) to interact with in an interactive segmentation inference.

An instance can contain erither contain a single normalized point pointing to the object that the user wants to segment or array of normalized key points that make up scribbles over the object that the user wants to segment.

Declaration

Swift

  class 
 RegionOfInterest 
 : 
 NSObject

  
  
  TaskOptions

MediaPipe Tasks options base class. Any MediaPipe task-specific options class should extend this class.

Declaration

Swift

  class 
 TaskOptions 
 : 
 NSObject 
 , 
 NSCopying

  
  
  TaskResult

MediaPipe Tasks result base class. Any MediaPipe task result class should extend this class.

Declaration

Swift

  class 
 TaskResult 
 : 
 NSObject 
 , 
 NSCopying

MediaPipeTasksVision Framework Reference Stay organized with collections Save and categorize content based on your preferences.

Classes

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration

Declaration