Stay organized with collectionsSave and categorize content based on your preferences.
This page describes how to prepare image training data for use in a
Vertex AI dataset to train an image object detection model.
The following objective section includes information about data requirements,
input/output schema file, and the format of the data import files
(JSON Lines& CSV) that are defined by the schema.
Object detection
Data requirements
General image requirements
Supported file types
JPEG
PNG
GIF
BMP
ICO
Types of images
AutoML models are optimized for
photographs of objects in the real world.
Training image file size (MB)
30MB maximum size.
Prediction image file* size (MB)
1.5MB maximum size.
Image size (pixels)
1024 pixels by 1024 pixels suggested maximum.
For images much
larger than 1024 pixels by 1024 pixels some image quality may be lost
during Vertex AI's image
normalization process.
Labels and bounding box requirements
The following requirements apply to datasets used to train
AutoML models.
Label instances for training
10 annotations (instances) minimum.
Annotation requirements
For each label you must haveat least10 images, each withat leastone annotation (bounding box and the label).
However, for model training purposes it's
recommended you use about1000 annotations per label. In general,
the more images per label you have the better your model will perform.
Label ratio (most common label to least common label):
The model works best when there are at most 100x more images for the
most common label than for the least common label.
For model
performance, it is recommended that you remove very low
frequency labels.
Bounding box edge length
At least0.01 * length of a sideof an image. For example, a 1000 * 900 pixel image
would require bounding boxes of at least 10 * 9 pixels.
Bound box minium size: 8 pixels by 8 pixels.
The following requirements apply to datasets used to train
AutoML or custom-trained models.
Bounding boxes per distinct image
500 maximum.
Bounding boxes returned from a prediction request
100 (default), 500 maximum.
Training data and dataset requirements
The following requirements apply to datasets used to train
AutoML models.
Training image characteristics
The training data should be as close as possible to the data on
which predictions are to be made.
For example, if your use case involves
blurry and low-resolution images (such as from a security camera),
your training data should be composed of blurry, low-resolution images.
In general, you should also consider providing multiple angles,
resolutions, and backgrounds for your training images.
Vertex AI models can't generally predict labels that
humans can't assign. So, if a human can't be trained to assign labels by
looking at the image for
1-2 seconds, the model likely can't be trained to do it either.
Internal image preprocessing
After images are imported, Vertex AI performs preprocessing on the data. The
preprocessed images are the actual data used to train the model.
Image preprocessing (resizing) occurs when the image's smallest edge isgreaterthan 1024 pixels. In the case where the image's smaller side is greater than 1024 pixels,
that smaller side is scaled down to 1024 pixels. The larger sideandspecified
bounding boxes are both scaled down by the same amount as the smaller side. Consequently,
any scaled down annotations (bounding boxes and labels) are removed if they are less than
8 pixels by 8 pixels.
Images with a smaller side less than or equal to 1024 pixel are not subject to
preprocessing resizing.
The following requirements apply to datasets used to train
AutoML or custom-trained models.
Images in each dataset
150,000 maximum
Total annotated bounding boxes in each dataset
1,000,000 maximum
Number of labels in each dataset
1 minimum, 1,000 maximum
YAML schema file
Use the following publicly accessible schema file to import image object detection
annotations (bounding boxes and labels). This schema file dictates the format
of the data input files. This file's structure follows theOpenAPI schema.
title: ImageBoundingBox
description: >
Import and export format for importing/exporting images together with bounding
box annotations. Can be used in Dataset.import_schema_uri field.
type: object
required:
- imageGcsUri
properties:imageGcsUri:
type: string
description: >
A Cloud Storage URI pointing to an image. Up to 30MB in size.
Supported file mime types: `image/jpeg`, `image/gif`, `image/png`,
`image/webp`, `image/bmp`, `image/tiff`, `image/vnd.microsoft.icon`.boundingBoxAnnotations:
type: array
description: Multiple bounding box Annotations on the image.
items:
type: object
description: >
Bounding box anntoation. `xMin`, `xMax`, `yMin`, and `yMax` are relative
to the image size, and the point 0,0 is in the top left of the image.
properties:displayName:
type: string
description: >
It will be imported as/exported from AnnotationSpec's display name,
i.e. the name of the label/class.xMin:
description: The leftmost coordinate of the bounding box.
type: number
format: doublexMax:
description: The rightmost coordinate of the bounding box.
type: number
format: doubleyMin:
description: The topmost coordinate of the bounding box.
type: number
format: doubleyMax:
description: The bottommost coordinate of the bounding box.
type: number
format: doubleannotationResourceLabels:
description: Resource labels on the Annotation.
type: object
additionalProperties:
type: stringdataItemResourceLabels:
description: Resource labels on the DataItem.
type: object
additionalProperties:
type: string
Wherevalueis one of the display names of the
existing annotation sets in the dataset.
dataItemResourceLabels- Can contain any number of
key-value string pairs. The only system-reserved key-value pair is
the following which specifies the machine learning use set of the
data item:
ML_USE(Optional). For data split purposes when training a
model. Use TRAINING, TEST, or VALIDATION. For more information about manual data
splitting, seeAbout data splits for AutoML models.
GCS_FILE_PATH. This field contains the Cloud Storage URI for
the image. Cloud Storage URIs are case-sensitive.
LABEL. Labels must start with a letter and only contain
letters, numbers, and underscores.
BOUNDING_BOX. A bounding box for an object in the image.
Specifying a bounding box involves more than one column. A.X_MIN,Y_MIN B.X_MAX,Y_MIN C.X_MAX,Y_MAX D.X_MIN,Y_MAX
Each vertex is specified by x, y coordinate values. Coordinates are normalized float
values [0,1]; 0.0 is X_MIN or Y_MIN, 1.0 is X_MAX or Y_MAX.
For example, a bounding box for the entire image
is expressed as (0.0,0.0,,,1.0,1.0,,), or (0.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0).
The bounding box for an object can be specified in one of two ways:
Two vertices (two sets of x,y coordinates) that are diagonally opposite points of
the rectangle: A.X_MIN,Y_MIN C.X_MAX,Y_MAX as shown in this example: A,,C, X_MIN,Y_MIN,,,X_MAX,Y_MAX,,
All four vertices specified as shown in: X_MIN,Y_MIN,X_MAX,Y_MIN,X_MAX,Y_MAX,X_MIN,Y_MAX, If the four
specified vertices don't form a rectangle parallel to image edges,
Vertex AI specifies vertices that do form such a
rectangle.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[],[],null,["# Prepare image training data for object detection\n\nThis page describes how to prepare image training data for use in a Vertex AI dataset to train an image object detection model.\n\n\u003cbr /\u003e\n\nThe following objective section includes information about data requirements,\ninput/output schema file, and the format of the data import files\n([JSON Lines](https://jsonlines.org/)\n\\& CSV) that are defined by the schema.\n\n\u003cbr /\u003e\n\n### Object detection\n\n\u003cbr /\u003e\n\n\n### Data requirements\n\n\u003cbr /\u003e\n\n\n### YAML schema file\n\n\u003cbr /\u003e\n\nUse the following publicly accessible schema file to import image object detection\nannotations (bounding boxes and labels). This schema file dictates the format\nof the data input files. This file's structure follows the\n[OpenAPI schema](https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.2.md#schema).\n\n\u003cbr /\u003e\n\n[`gs://google-cloud-aiplatform/schema/dataset/ioformat/image_bounding_box_io_format_1.0.0.yaml`](https://storage.cloud.google.com/google-cloud-aiplatform/schema/dataset/ioformat/image_bounding_box_io_format_1.0.0.yaml)\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n#### **Full schema file**\n\n```\ntitle: ImageBoundingBox\ndescription: \u003e\n Import and export format for importing/exporting images together with bounding\n box annotations. Can be used in Dataset.import_schema_uri field.\ntype: object\nrequired:\n- imageGcsUri\nproperties:\n imageGcsUri:\n type: string\n description: \u003e\n A Cloud Storage URI pointing to an image. Up to 30MB in size.\n Supported file mime types: `image/jpeg`, `image/gif`, `image/png`,\n `image/webp`, `image/bmp`, `image/tiff`, `image/vnd.microsoft.icon`.\n boundingBoxAnnotations:\n type: array\n description: Multiple bounding box Annotations on the image.\n items:\n type: object\n description: \u003e\n Bounding box anntoation. `xMin`, `xMax`, `yMin`, and `yMax` are relative\n to the image size, and the point 0,0 is in the top left of the image.\n properties:\n displayName:\n type: string\n description: \u003e\n It will be imported as/exported from AnnotationSpec's display name,\n i.e. the name of the label/class.\n xMin:\n description: The leftmost coordinate of the bounding box.\n type: number\n format: double\n xMax:\n description: The rightmost coordinate of the bounding box.\n type: number\n format: double\n yMin:\n description: The topmost coordinate of the bounding box.\n type: number\n format: double\n yMax:\n description: The bottommost coordinate of the bounding box.\n type: number\n format: double\n annotationResourceLabels:\n description: Resource labels on the Annotation.\n type: object\n additionalProperties:\n type: string\n dataItemResourceLabels:\n description: Resource labels on the DataItem.\n type: object\n additionalProperties:\n type: string\n```\n\n\u003cbr /\u003e\n\n\n### Input files\n\n\u003cbr /\u003e\n\n\n### JSON Lines\n\nJSON on each line: \n\n```\n\n\n{\n \"imageGcsUri\": \"gs://bucket/filename.ext\",\n \"boundingBoxAnnotations\": [\n {\n \"displayName\": \"OBJECT1_LABEL\",\n \"xMin\": \"X_MIN\",\n \"yMin\": \"Y_MIN\",\n \"xMax\": \"X_MAX\",\n \"yMax\": \"Y_MAX\",\n \"annotationResourceLabels\": {\n \"aiplatform.googleapis.com/annotation_set_name\": \"displayName\",\n \"env\": \"prod\"\n }\n },\n {\n \"displayName\": \"OBJECT2_LABEL\",\n \"xMin\": \"X_MIN\",\n \"yMin\": \"Y_MIN\",\n \"xMax\": \"X_MAX\",\n \"yMax\": \"Y_MAX\"\n }\n ],\n \"dataItemResourceLabels\": {\n \"aiplatform.googleapis.com/ml_use\": \"test/train/validation\"\n }\n}\n```\n\n**Field notes**:\n\n- `imageGcsUri` - The only required field.\n- `annotationResourceLabels` - Can contain any number of key-value string pairs. The only system-reserved key-value pair is the following:\n - \"aiplatform.googleapis.com/annotation_set_name\" : \"\u003cvar translate=\"no\"\u003evalue\u003c/var\u003e\"\n\n Where \u003cvar translate=\"no\"\u003evalue\u003c/var\u003e is one of the display names of the\n existing annotation sets in the dataset.\n- `dataItemResourceLabels` - Can contain any number of key-value string pairs. The only system-reserved key-value pair is the following which specifies the machine learning use set of the data item:\n - \"aiplatform.googleapis.com/ml_use\" : \"\u003cvar translate=\"no\"\u003etraining/test/validation\u003c/var\u003e\"\n\n#### Example JSON Lines - `object_detection.jsonl`:\n\n```\n\n\n{\"imageGcsUri\": \"gs://bucket/filename1.jpeg\", \"boundingBoxAnnotations\": [{\"displayName\": \"Tomato\", \"xMin\": \"0.3\", \"yMin\": \"0.3\", \"xMax\": \"0.7\", \"yMax\": \"0.6\"}], \"dataItemResourceLabels\": {\"aiplatform.googleapis.com/ml_use\": \"test\"}}\n{\"imageGcsUri\": \"gs://bucket/filename2.gif\", \"boundingBoxAnnotations\": [{\"displayName\": \"Tomato\", \"xMin\": \"0.8\", \"yMin\": \"0.2\", \"xMax\": \"1.0\", \"yMax\": \"0.4\"},{\"displayName\": \"Salad\", \"xMin\": \"0.0\", \"yMin\": \"0.0\", \"xMax\": \"1.0\", \"yMax\": \"1.0\"}], \"dataItemResourceLabels\": {\"aiplatform.googleapis.com/ml_use\": \"training\"}}\n{\"imageGcsUri\": \"gs://bucket/filename3.png\", \"boundingBoxAnnotations\": [{\"displayName\": \"Baked goods\", \"xMin\": \"0.5\", \"yMin\": \"0.7\", \"xMax\": \"0.8\", \"yMax\": \"0.8\"}], \"dataItemResourceLabels\": {\"aiplatform.googleapis.com/ml_use\": \"training\"}}\n{\"imageGcsUri\": \"gs://bucket/filename4.tiff\", \"boundingBoxAnnotations\": [{\"displayName\": \"Salad\", \"xMin\": \"0.1\", \"yMin\": \"0.2\", \"xMax\": \"0.8\", \"yMax\": \"0.9\"}], \"dataItemResourceLabels\": {\"aiplatform.googleapis.com/ml_use\": \"validation\"}}\n...\n```\n\n### CSV\n\nCSV format: \n\n```\n[ML_USE],GCS_FILE_PATH,[LABEL],[BOUNDING_BOX]*\n```\n**List of columns**\n\n- `ML_USE` (Optional). For data split purposes when training a model. Use TRAINING, TEST, or VALIDATION. For more information about manual data splitting, see [About data splits for AutoML models](/vertex-ai/docs/general/ml-use).\n- `GCS_FILE_PATH`. This field contains the Cloud Storage URI for the image. Cloud Storage URIs are case-sensitive.\n- `LABEL`. Labels must start with a letter and only contain letters, numbers, and underscores.\n- `BOUNDING_BOX`. A bounding box for an object in the image. Specifying a bounding box involves more than one column. \n\n **A.** `X_MIN`,`Y_MIN` \n **B.** `X_MAX`,`Y_MIN` \n **C.** `X_MAX`,`Y_MAX` \n **D.** `X_MIN`,`Y_MAX` \n\n Each vertex is specified by x, y coordinate values. Coordinates are normalized float\n values \\[0,1\\]; 0.0 is X_MIN or Y_MIN, 1.0 is X_MAX or Y_MAX.\n\n For example, a bounding box for the entire image\n is expressed as (0.0,0.0,,,1.0,1.0,,), or (0.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0).\n\n The bounding box for an object can be specified in one of two ways:\n 1. Two vertices (two sets of x,y coordinates) that are diagonally opposite points of the rectangle: \n **A.** `X_MIN`,`Y_MIN` \n **C.** `X_MAX`,`Y_MAX` \n as shown in this example: \n **A,,C,** \n \u003cvar translate=\"no\"\u003eX_MIN\u003c/var\u003e`,`\u003cvar translate=\"no\"\u003eY_MIN\u003c/var\u003e`,,,`\u003cvar translate=\"no\"\u003eX_MAX\u003c/var\u003e`,`\u003cvar translate=\"no\"\u003eY_MAX\u003c/var\u003e`,,`\n 2. All four vertices specified as shown in: \n \u003cvar translate=\"no\"\u003eX_MIN\u003c/var\u003e`,`\u003cvar translate=\"no\"\u003eY_MIN\u003c/var\u003e`,`\u003cvar translate=\"no\"\u003eX_MAX\u003c/var\u003e`,`\u003cvar translate=\"no\"\u003eY_MIN\u003c/var\u003e`,\n `\u003cvar translate=\"no\"\u003eX_MAX\u003c/var\u003e`,`\u003cvar translate=\"no\"\u003eY_MAX\u003c/var\u003e`,`\u003cvar translate=\"no\"\u003eX_MIN\u003c/var\u003e`,`\u003cvar translate=\"no\"\u003eY_MAX\u003c/var\u003e`,` \n If the four specified vertices don't form a rectangle parallel to image edges, Vertex AI specifies vertices that do form such a rectangle.\n\n#### Example CSV - `object_detection.csv`:\n\n```\ntest,gs://bucket/filename1.jpeg,Tomato,0.3,0.3,,,0.7,0.6,,\ntraining,gs://bucket/filename2.gif,Tomato,0.8,0.2,,,1.0,0.4,,\ngs://bucket/filename2.gif\ngs://bucket/filename3.png,Baked goods,0.5,0.7,0.8,0.7,0.8,0.8,0.5,0.8\nvalidation,gs://bucket/filename4.tiff,Salad,0.1,0.2,,,0.8,0.9,,\n...\n \n```"]]