The Imagen lets you edit images in seconds, using text prompts, masks, and existing images to guide the edits.
View Imagen for Editing and Customization model card
Supported model versions
Imagen API supports the following models:
-
imagen-3.0-capability-001
For more information about the features that the model supports, see Imagen models .
HTTP request
curl
-X
POST
\
-H
"Authorization: Bearer
$(
gcloud
auth
print-access-token )
"
\
-H
"Content-Type: application/json"
\
https:// LOCATION
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID
/locations/ LOCATION
/publishers/google/models/imagen-3.0-capability-001:predict
\
-d
'{
"instances": [
{
"referenceImages": [
{
"referenceType": "REFERENCE_TYPE_RAW",
"referenceId": 1,
"referenceImage": {
"bytesBase64Encoded": string
}
},
{
"referenceType": "REFERENCE_TYPE_MASK",
"referenceId": 2,
"referenceImage": {
"bytesBase64Encoded": string
},
"maskImageConfig": {
"maskMode": "MASK_MODE_USER_PROVIDED"
}
}
],
"prompt": string
}
],
"parameters": {
"addWatermark": boolean,
"baseSteps": integer,
"editMode": string,
"guidanceScale": integer,
"includeRaiReason": boolean,
"includeSafetyAttributes": boolean,
"language": string,
"negativePrompt": string,
"outputOptions": {
"mimeType": string,
"compressionQuality": integer
},
"personGeneration": string,
"safetySetting": string,
"sampleCount": integer,
"seed": integer,
"storageUri": string
}
}'
Instances
| Instances | |
|---|---|
prompt
|
Optional. The text prompt for the image. If a |
referenceImages
|
List of Required. For mask editing, exactly two reference images must be
specified, one with |
referenceImages
object
The referenceImages
object describes the image assets for
Imagen to edit.
referenceType
string
Required. The type of reference image. One of the following:
-
REFERENCE_TYPE_RAW: The base image to edit. -
REFERENCE_TYPE_MASK: The mask image, whose non-zero values indicate where to edit the base image.
referenceId
integer
Required. A unique identifier for the reference image. Not used for masked editing.
referenceImage.bytesBase64Encoded
string
Required. Base64-encoded image bytes. Accepts PNG, JPEG, GIF, and BMP files. The maximum size is 20MB after transcoding to PNG. If you provide a mask image, it must be the same dimensions as the base image.
maskImageConfig.maskMode
string
Required when referenceType
is REFERENCE_TYPE_MASK
. Must be one of the following:
-
MASK_MODE_USER_PROVIDED: Use the mask fromreferenceImage.bytesBase64Encoded. -
MASK_MODE_BACKGROUND: Use an auto-generated mask from background segmentation. -
MASK_MODE_FOREGROUND: Use an auto-generated mask from foreground segmentation. -
MASK_MODE_SEMANTIC: Use an auto-generated mask from semantic segmentation with the given mask class.
maskImageConfig.dilation
float
Optional. Range: [0, 1]. The percentage of image width to dilate
(grow) the mask by. This can help compensate for imprecise masks.
For best results, we recommend the following maskImageConfig.maskMode
settings, we recommend the
listed values:
-
EDIT_MODE_INPAINT_INSERTION:0.01 -
EDIT_MODE_INPAINT_REMOVAL:0.01 -
EDIT_MODE_BGSWAP:0.0 -
EDIT_MODE_OUTPAINT:0.01-0.03
maskImageConfig.maskClasses
list[integer]
Optional. Mask classes
for MASK_MODE_SEMANTIC
mode.
Parameters
addWatermark
bool
Optional. Add an invisible watermark to the generated images.
The default value is true
.
baseSteps
integer
Optional. The number of sampling steps. A higher value has better
image quality, while a lower value has better latency. Defaults to 75
.
For smaller mask areas or for removal or insert modes, use 16
- 35
steps to reduce latency while
returning a similar level of quality.
editMode
string
Required for mask editing.
An enum with one of the following values:
-
EDIT_MODE_INPAINT_REMOVAL: Remove objects and fill in the image background in the mask area. -
EDIT_MODE_INPAINT_INSERTION: Add objects from a given prompt. -
EDIT_MODE_BGSWAP: Add background content in the mask area, while preserving the object content in the unmasked area. Useful for product editing. -
EDIT_MODE_OUTPAINT: Extends the image into the mask area. UnlikeEDIT_MODE_BGSWAP, this will generate object completion for partial objects at the image boundary.
guidanceScale
integer
Optional. Controls how much the model adheres to the text prompt. Large values increase output and prompt alignment, but might compromise image quality.
Accepted range: 0
- 500
Default: 60
for insert mode, 75
for
remove, bgswap, outpaint.
includeRaiReason
boolean
Optional. Whether to include a safety reason for filtered images in
the response. The default value is false
.
includeSafetyAttributes
boolean
Optional. Whether to report the safety scores of each image in the
response. The default value is false
.
language
string
Optional. The language code that corresponds to your text prompt language. The following values are supported:
-
"auto": Automatic detection. If Imagen detects a supported language, the prompt and an optional negative prompt are translated to English. If the language detected isn't supported, Imagen uses the input text verbatim, which might result in an unexpected output. No error code is returned. -
"en": English (if omitted, the default value) -
"zh"or"zh-CN": Chinese (simplified) -
"zh-TW": Chinese (traditional) -
"hi": Hindi -
"ja": Japanese -
"ko": Korean -
"pt": Portuguese -
"es": Spanish
language
is supported only by imagen-3.0-capability-001
.
negativePrompt
string
Optional. A description of what to discourage in the generated images.
outputOptions
Optional. Describes the output image format in an outputOptions
object.
personGeneration
string
Optional. Allow generation of people by the model. The following values are supported:
-
"dont_allow": Disallow the inclusion of people or faces in images. -
"allow_adult": Allow generation of adults only. -
"allow_all": Allow generation of people of all ages.
For mask-based editing personGeneration
defaults to allow_adult
. For mask-free editing, personGeneration
defaults to allow_adult
.
sampleCount
integer
Optional. The number of images to generate. The default value is 4.
seed
Uint32
Optional. The random seed for image generation. This isn't available
when addWatermark
is set to true
.
safetySetting
string
Optional. Adds a filter level to safety filtering. The following values are supported:
-
"block_low_and_above": Strongest filtering level, most strict blocking. Deprecated value:"block_most". -
"block_medium_and_above": Block some problematic prompts and responses. Deprecated value:"block_some". -
"block_only_high": Reduces the number of requests blocked due to safety filters. May increase objectionable content generated by Imagen. Deprecated value:"block_few". -
"block_none": Block very few problematic prompts and responses. Access to this feature is restricted. Previous field value:"block_fewest".
The default value is "block_medium_and_above"
.
safetySetting
is supported only by imagen-3.0-capability-001
.
storageUri
string
Optional. The Cloud Storage URI to store the generated images.
Output options object
The outputOptions
object describes the image output.
outputOptions.mimeType
string
Optional. The image format that the output should be saved as. The following values are supported:
-
"image/png": Save as a PNG image -
"image/jpeg": Save as a JPEG image
The default value is "image/png"
.
outputOptions.compressionQuality
integer
Optional. The level of compression if the output type is "image/jpeg"
. Accepted values are 0
through 100
. The default value is 75
.
Sample request
REST
Before using any of the request data, make the following replacements:
-
REGION: The region that your project is located in. For more information about supported regions, see Generative AI on Vertex AI locations . -
PROJECT_ID: Your Google Cloud project ID. -
TEXT_PROMPT: Optional. A text prompt to guide the images that the model generates. For best results, use a description of the masked area and avoid single-word prompts. For example, use "a cute corgi" instead of "corgi". -
B64_BASE_IMAGE: A base64-encoded image of the image being edited that is 10MB or less in size. For more information about base64-encoding, see Base64 encode and decode files . -
B64_MASK_IMAGE: A base64-encoded black and white mask image that is 10MB or less in size. -
MASK_DILATION: Optional. A float value between 0 and 1, inclusive, that represents the percentage of the image width to grow the mask by. Usingdilationhelps compensate for imprecise masks. We recommend a value of0.01. -
EDIT_STEPS: Optional. An integer that represents the number of sampling steps. A higher value offers better image quality, a lower value offers better latency.We recommend that you try
35steps to start. If the quality doesn't meet your requirements, then we recomment increasing the value towards an upper limit of75. -
SAMPLE_COUNT: Optional. An integer that describes the number of images to generate. The accepted range of values is1-4. The default value is4.
HTTP method and URL:
POST https:// REGION -aiplatform.googleapis.com/v1/projects/ PROJECT_ID /locations/ REGION /publishers/google/models/imagen-3.0-capability-001:predict
Request JSON body:
{ "instances": [ { "prompt": " TEXT_PROMPT ", "referenceImages": [ { "referenceType": "REFERENCE_TYPE_RAW", "referenceId": 1, "referenceImage": { "bytesBase64Encoded": " B64_BASE_IMAGE " } }, { "referenceType": "REFERENCE_TYPE_MASK", "referenceImage": { "bytesBase64Encoded": " B64_MASK_IMAGE " }, "maskImageConfig": { "maskMode": "MASK_MODE_USER_PROVIDED", "dilation": MASK_DILATION } } ] } ], "parameters": { "editConfig": { "baseSteps": EDIT_STEPS }, "editMode": "EDIT_MODE_INPAINT_INSERTION","sampleCount": SAMPLE_COUNT } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https:// REGION -aiplatform.googleapis.com/v1/projects/ PROJECT_ID /locations/ REGION /publishers/google/models/imagen-3.0-capability-001:predict"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https:// REGION -aiplatform.googleapis.com/v1/projects/ PROJECT_ID /locations/ REGION /publishers/google/models/imagen-3.0-capability-001:predict" | Select-Object -Expand Content
"sampleCount": 2
. The response returns two prediction objects, with
the generated image bytes base64-encoded. { "predictions": [ { "bytesBase64Encoded": " BASE64_IMG_BYTES ", "mimeType": "image/png" }, { "mimeType": "image/png", "bytesBase64Encoded": " BASE64_IMG_BYTES " } ] }
Class IDs
Use the following object class IDs to automatically create an image mask based on specific objects.
Class ID ( class_
) |
Object |
|---|---|
| 0 | backpack |
| 1 | umbrella |
| 2 | bag |
| 3 | tie |
| 4 | suitcase |
| 5 | case |
| 6 | bird |
| 7 | cat |
| 8 | dog |
| 9 | horse |
| 10 | sheep |
| 11 | cow |
| 12 | elephant |
| 13 | bear |
| 14 | zebra |
| 15 | giraffe |
| 16 | animal (other) |
| 17 | microwave |
| 18 | radiator |
| 19 | oven |
| 20 | toaster |
| 21 | storage tank |
| 22 | conveyor belt |
| 23 | sink |
| 24 | refrigerator |
| 25 | washer dryer |
| 26 | fan |
| 27 | dishwasher |
| 28 | toilet |
| 29 | bathtub |
| 30 | shower |
| 31 | tunnel |
| 32 | bridge |
| 33 | pier wharf |
| 34 | tent |
| 35 | building |
| 36 | ceiling |
| 37 | laptop |
| 38 | keyboard |
| 39 | mouse |
| 40 | remote |
| 41 | cell phone |
| 42 | television |
| 43 | floor |
| 44 | stage |
| 45 | banana |
| 46 | apple |
| 47 | sandwich |
| 48 | orange |
| 49 | broccoli |
| 50 | carrot |
| 51 | hot dog |
| 52 | pizza |
| 53 | donut |
| 54 | cake |
| 55 | fruit (other) |
| 56 | food (other) |
| 57 | chair (other) |
| 58 | armchair |
| 59 | swivel chair |
| 60 | stool |
| 61 | seat |
| 62 | couch |
| 63 | trash can |
| 64 | potted plant |
| 65 | nightstand |
| 66 | bed |
| 67 | table |
| 68 | pool table |
| 69 | barrel |
| 70 | desk |
| 71 | ottoman |
| 72 | wardrobe |
| 73 | crib |
| 74 | basket |
| 75 | chest of drawers |
| 76 | bookshelf |
| 77 | counter (other) |
| 78 | bathroom counter |
| 79 | kitchen island |
| 80 | door |
| 81 | light (other) |
| 82 | lamp |
| 83 | sconce |
| 84 | chandelier |
| 85 | mirror |
| 86 | whiteboard |
| 87 | shelf |
| 88 | stairs |
| 89 | escalator |
| 90 | cabinet |
| 91 | fireplace |
| 92 | stove |
| 93 | arcade machine |
| 94 | gravel |
| 95 | platform |
| 96 | playingfield |
| 97 | railroad |
| 98 | road |
| 99 | snow |
| 100 | sidewalk pavement |
| 101 | runway |
| 102 | terrain |
| 103 | book |
| 104 | box |
| 105 | clock |
| 106 | vase |
| 107 | scissors |
| 108 | plaything (other) |
| 109 | teddy bear |
| 110 | hair dryer |
| 111 | toothbrush |
| 112 | painting |
| 113 | poster |
| 114 | bulletin board |
| 115 | bottle |
| 116 | cup |
| 117 | wine glass |
| 118 | knife |
| 119 | fork |
| 120 | spoon |
| 121 | bowl |
| 122 | tray |
| 123 | range hood |
| 124 | plate |
| 125 | person |
| 126 | rider (other) |
| 127 | bicyclist |
| 128 | motorcyclist |
| 129 | paper |
| 130 | streetlight |
| 131 | road barrier |
| 132 | mailbox |
| 133 | cctv camera |
| 134 | junction box |
| 135 | traffic sign |
| 136 | traffic light |
| 137 | fire hydrant |
| 138 | parking meter |
| 139 | bench |
| 140 | bike rack |
| 141 | billboard |
| 142 | sky |
| 143 | pole |
| 144 | fence |
| 145 | railing banister |
| 146 | guard rail |
| 147 | mountain hill |
| 148 | rock |
| 149 | frisbee |
| 150 | skis |
| 151 | snowboard |
| 152 | sports ball |
| 153 | kite |
| 154 | baseball bat |
| 155 | baseball glove |
| 156 | skateboard |
| 157 | surfboard |
| 158 | tennis racket |
| 159 | net |
| 160 | base |
| 161 | sculpture |
| 162 | column |
| 163 | fountain |
| 164 | awning |
| 165 | apparel |
| 166 | banner |
| 167 | flag |
| 168 | blanket |
| 169 | curtain (other) |
| 170 | shower curtain |
| 171 | pillow |
| 172 | towel |
| 173 | rug floormat |
| 174 | vegetation |
| 175 | bicycle |
| 176 | car |
| 177 | autorickshaw |
| 178 | motorcycle |
| 179 | airplane |
| 180 | bus |
| 181 | train |
| 182 | truck |
| 183 | trailer |
| 184 | boat ship |
| 185 | slow wheeled object |
| 186 | river lake |
| 187 | sea |
| 188 | water (other) |
| 189 | swimming pool |
| 190 | waterfall |
| 191 | wall |
| 192 | window |
| 193 | window blind |
What's next
- For more information, see Imagen on Vertex AI .

