Edit images

The Imagen lets you edit images in seconds, using text prompts, masks, and existing images to guide the edits.

View Imagen for Editing and Customization model card

Supported model versions

Imagen API supports the following models:

imagen-3.0-capability-001

For more information about the features that the model supports, see Imagen models .

HTTP request

 curl  
-X  
POST  
 \ 
  
-H  
 "Authorization: Bearer 
 $( 
gcloud  
auth  
print-access-token ) 
 " 
  
 \ 
  
-H  
 "Content-Type: application/json" 
  
 \ 
https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/publishers/google/models/imagen-3.0-capability-001:predict  
 \ 
-d  
 '{ 
 "instances": [ 
 { 
 "referenceImages": [ 
 { 
 "referenceType": "REFERENCE_TYPE_RAW", 
 "referenceId": 1, 
 "referenceImage": { 
 "bytesBase64Encoded": string 
 } 
 }, 
 { 
 "referenceType": "REFERENCE_TYPE_MASK", 
 "referenceId": 2, 
 "referenceImage": { 
 "bytesBase64Encoded": string 
 }, 
 "maskImageConfig": { 
 "maskMode": "MASK_MODE_USER_PROVIDED" 
 } 
 } 
 ], 
 "prompt": string 
 } 
 ], 
 "parameters": { 
 "addWatermark": boolean, 
 "baseSteps": integer, 
 "editMode": string, 
 "guidanceScale": integer, 
 "includeRaiReason": boolean, 
 "includeSafetyAttributes": boolean, 
 "language": string, 
 "negativePrompt": string, 
 "outputOptions": { 
 "mimeType": string, 
 "compressionQuality": integer 
 }, 
 "personGeneration": string, 
 "safetySetting": string, 
 "sampleCount": integer, 
 "seed": integer, 
 "storageUri": string 
 } 
 }'

Instances

Instances
`prompt`	`string` Optional. The text prompt for the image. If a `prompt` isn't specified, the model fills in content from the image context.
`referenceImages`	List of `ReferenceImage` objects. Required. For mask editing, exactly two reference images must be specified, one with `REFERENCE_TYPE_RAW` , and one with `REFERENCE_TYPE_MASK` .

prompt

string

Optional. The text prompt for the image. If a prompt isn't specified, the model fills in content from the image context.

referenceImages

List of ReferenceImage objects.

Required. For mask editing, exactly two reference images must be specified, one with REFERENCE_TYPE_RAW , and one with REFERENCE_TYPE_MASK .

`referenceImages` object

The referenceImages object describes the image assets for Imagen to edit.

Parameters

referenceType

string

Required. The type of reference image. One of the following:

REFERENCE_TYPE_RAW : The base image to edit.
REFERENCE_TYPE_MASK : The mask image, whose non-zero values indicate where to edit the base image.

referenceId

integer

Required. A unique identifier for the reference image. Not used for masked editing.

referenceImage.bytesBase64Encoded

string

Required. Base64-encoded image bytes. Accepts PNG, JPEG, GIF, and BMP files. The maximum size is 20MB after transcoding to PNG. If you provide a mask image, it must be the same dimensions as the base image.

maskImageConfig.maskMode

string

Required when referenceType is REFERENCE_TYPE_MASK . Must be one of the following:

MASK_MODE_USER_PROVIDED : Use the mask from referenceImage.bytesBase64Encoded .
MASK_MODE_BACKGROUND : Use an auto-generated mask from background segmentation.
MASK_MODE_FOREGROUND : Use an auto-generated mask from foreground segmentation.
MASK_MODE_SEMANTIC : Use an auto-generated mask from semantic segmentation with the given mask class.

maskImageConfig.dilation

float

Optional. Range: [0, 1]. The percentage of image width to dilate (grow) the mask by. This can help compensate for imprecise masks. For best results, we recommend the following maskImageConfig.maskMode settings, we recommend the listed values:

EDIT_MODE_INPAINT_INSERTION : 0.01
EDIT_MODE_INPAINT_REMOVAL : 0.01
EDIT_MODE_BGSWAP : 0.0
EDIT_MODE_OUTPAINT : 0.01 - 0.03

maskImageConfig.maskClasses

list[integer]

Optional. Mask classes for MASK_MODE_SEMANTIC mode.

Parameters

addWatermark

bool

Optional. Add an invisible watermark to the generated images.

The default value is true .

baseSteps

integer

Optional. The number of sampling steps. A higher value has better image quality, while a lower value has better latency. Defaults to 75 .

For smaller mask areas or for removal or insert modes, use 16 - 35 steps to reduce latency while returning a similar level of quality.

editMode

string

Required for mask editing.

An enum with one of the following values:

EDIT_MODE_INPAINT_REMOVAL : Remove objects and fill in the image background in the mask area.
EDIT_MODE_INPAINT_INSERTION : Add objects from a given prompt.
EDIT_MODE_BGSWAP : Add background content in the mask area, while preserving the object content in the unmasked area. Useful for product editing.
EDIT_MODE_OUTPAINT : Extends the image into the mask area. Unlike EDIT_MODE_BGSWAP , this will generate object completion for partial objects at the image boundary.

guidanceScale

integer

Optional. Controls how much the model adheres to the text prompt. Large values increase output and prompt alignment, but might compromise image quality.

Accepted range: 0 - 500

Default: 60 for insert mode, 75 for remove, bgswap, outpaint.

includeRaiReason

boolean

Optional. Whether to include a safety reason for filtered images in the response. The default value is false .

includeSafetyAttributes

boolean

Optional. Whether to report the safety scores of each image in the response. The default value is false .

language

string

Optional. The language code that corresponds to your text prompt language. The following values are supported:

"auto" : Automatic detection. If Imagen detects a supported language, the prompt and an optional negative prompt are translated to English. If the language detected isn't supported, Imagen uses the input text verbatim, which might result in an unexpected output. No error code is returned.
"en" : English (if omitted, the default value)
"zh" or "zh-CN" : Chinese (simplified)
"zh-TW" : Chinese (traditional)
"hi" : Hindi
"ja" : Japanese
"ko" : Korean
"pt" : Portuguese
"es" : Spanish

language is supported only by imagen-3.0-capability-001 .

negativePrompt

string

Optional. A description of what to discourage in the generated images.

outputOptions

Optional. Describes the output image format in an outputOptions object.

personGeneration

string

Optional. Allow generation of people by the model. The following values are supported:

"dont_allow" : Disallow the inclusion of people or faces in images.
"allow_adult" : Allow generation of adults only.
"allow_all" : Allow generation of people of all ages.

For mask-based editing personGeneration defaults to allow_adult . For mask-free editing, personGeneration defaults to allow_adult .

sampleCount

integer

Optional. The number of images to generate. The default value is 4.

seed

Uint32

Optional. The random seed for image generation. This isn't available when addWatermark is set to true .

safetySetting

string

Optional. Adds a filter level to safety filtering. The following values are supported:

"block_low_and_above" : Strongest filtering level, most strict blocking. Deprecated value: "block_most" .
"block_medium_and_above" : Block some problematic prompts and responses. Deprecated value: "block_some" .
"block_only_high" : Reduces the number of requests blocked due to safety filters. May increase objectionable content generated by Imagen. Deprecated value: "block_few" .
"block_none" : Block very few problematic prompts and responses. Access to this feature is restricted. Previous field value: "block_fewest" .

The default value is "block_medium_and_above" .

safetySetting is supported only by imagen-3.0-capability-001 .

storageUri

string

Optional. The Cloud Storage URI to store the generated images.

Output options object

The outputOptions object describes the image output.

Parameters

outputOptions.mimeType

string

Optional. The image format that the output should be saved as. The following values are supported:

"image/png" : Save as a PNG image
"image/jpeg" : Save as a JPEG image

The default value is "image/png" .

outputOptions.compressionQuality

integer

Optional. The level of compression if the output type is "image/jpeg" . Accepted values are 0 through 100 . The default value is 75 .

Sample request

REST

Before using any of the request data, make the following replacements:

REGION : The region that your project is located in. For more information about supported regions, see Generative AI on Vertex AI locations .
PROJECT_ID : Your Google Cloud project ID.
TEXT_PROMPT : Optional. A text prompt to guide the images that the model generates. For best results, use a description of the masked area and avoid single-word prompts. For example, use "a cute corgi" instead of "corgi".
B64_BASE_IMAGE : A base64-encoded image of the image being edited that is 10MB or less in size. For more information about base64-encoding, see Base64 encode and decode files .
B64_MASK_IMAGE : A base64-encoded black and white mask image that is 10MB or less in size.
MASK_DILATION : Optional. A float value between 0 and 1, inclusive, that represents the percentage of the image width to grow the mask by. Using dilation helps compensate for imprecise masks. We recommend a value of 0.01 .
EDIT_STEPS : Optional. An integer that represents the number of sampling steps. A higher value offers better image quality, a lower value offers better latency.

We recommend that you try 35 steps to start. If the quality doesn't meet your requirements, then we recomment increasing the value towards an upper limit of 75 .
SAMPLE_COUNT : Optional. An integer that describes the number of images to generate. The accepted range of values is 1 - 4 . The default value is 4 .

HTTP method and URL:

POST https:// REGION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ REGION 
/publishers/google/models/imagen-3.0-capability-001:predict

Request JSON body:

{
  "instances": [
    {
      "prompt": " TEXT_PROMPT 
",
      "referenceImages": [
        {
          "referenceType": "REFERENCE_TYPE_RAW",
          "referenceId": 1,
          "referenceImage": {
            "bytesBase64Encoded": " B64_BASE_IMAGE 
"
          }
        },
        {
          "referenceType": "REFERENCE_TYPE_MASK",
          "referenceImage": {
            "bytesBase64Encoded": " B64_MASK_IMAGE 
"
          },
          "maskImageConfig": {
            "maskMode": "MASK_MODE_USER_PROVIDED",
            "dilation": MASK_DILATION 
}
        }
      ]
    }
  ],
  "parameters": {
    "editConfig": {
      "baseSteps": EDIT_STEPS 
}, "editMode": "EDIT_MODE_INPAINT_INSERTION","sampleCount": SAMPLE_COUNT 
}
}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell , which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list .

Save the request body in a file named request.json , and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https:// REGION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ REGION 
/publishers/google/models/imagen-3.0-capability-001:predict"

PowerShell

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list .

Save the request body in a file named request.json , and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https:// REGION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ REGION 
/publishers/google/models/imagen-3.0-capability-001:predict" | Select-Object -Expand Content

The following sample response is for a request with "sampleCount": 2 . The response returns two prediction objects, with the generated image bytes base64-encoded.

{
  "predictions": [
    {
      "bytesBase64Encoded": " BASE64_IMG_BYTES 
",
      "mimeType": "image/png"
    },
    {
      "mimeType": "image/png",
      "bytesBase64Encoded": " BASE64_IMG_BYTES 
"
    }
  ]
}

Class IDs

Use the following object class IDs to automatically create an image mask based on specific objects.

Class ID ( `class_ id` )	Object
0	backpack
1	umbrella
2	bag
3	tie
4	suitcase
5	case
6	bird
7	cat
8	dog
9	horse
10	sheep
11	cow
12	elephant
13	bear
14	zebra
15	giraffe
16	animal (other)
17	microwave
18	radiator
19	oven
20	toaster
21	storage tank
22	conveyor belt
23	sink
24	refrigerator
25	washer dryer
26	fan
27	dishwasher
28	toilet
29	bathtub
30	shower
31	tunnel
32	bridge
33	pier wharf
34	tent
35	building
36	ceiling
37	laptop
38	keyboard
39	mouse
40	remote
41	cell phone
42	television
43	floor
44	stage
45	banana
46	apple
47	sandwich
48	orange
49	broccoli
50	carrot
51	hot dog
52	pizza
53	donut
54	cake
55	fruit (other)
56	food (other)
57	chair (other)
58	armchair
59	swivel chair
60	stool
61	seat
62	couch
63	trash can
64	potted plant
65	nightstand
66	bed
67	table
68	pool table
69	barrel
70	desk
71	ottoman
72	wardrobe
73	crib
74	basket
75	chest of drawers
76	bookshelf
77	counter (other)
78	bathroom counter
79	kitchen island
80	door
81	light (other)
82	lamp
83	sconce
84	chandelier
85	mirror
86	whiteboard
87	shelf
88	stairs
89	escalator
90	cabinet
91	fireplace
92	stove
93	arcade machine
94	gravel
95	platform
96	playingfield
97	railroad
98	road
99	snow
100	sidewalk pavement
101	runway
102	terrain
103	book
104	box
105	clock
106	vase
107	scissors
108	plaything (other)
109	teddy bear
110	hair dryer
111	toothbrush
112	painting
113	poster
114	bulletin board
115	bottle
116	cup
117	wine glass
118	knife
119	fork
120	spoon
121	bowl
122	tray
123	range hood
124	plate
125	person
126	rider (other)
127	bicyclist
128	motorcyclist
129	paper
130	streetlight
131	road barrier
132	mailbox
133	cctv camera
134	junction box
135	traffic sign
136	traffic light
137	fire hydrant
138	parking meter
139	bench
140	bike rack
141	billboard
142	sky
143	pole
144	fence
145	railing banister
146	guard rail
147	mountain hill
148	rock
149	frisbee
150	skis
151	snowboard
152	sports ball
153	kite
154	baseball bat
155	baseball glove
156	skateboard
157	surfboard
158	tennis racket
159	net
160	base
161	sculpture
162	column
163	fountain
164	awning
165	apparel
166	banner
167	flag
168	blanket
169	curtain (other)
170	shower curtain
171	pillow
172	towel
173	rug floormat
174	vegetation
175	bicycle
176	car
177	autorickshaw
178	motorcycle
179	airplane
180	bus
181	train
182	truck
183	trailer
184	boat ship
185	slow wheeled object
186	river lake
187	sea
188	water (other)
189	swimming pool
190	waterfall
191	wall
192	window
193	window blind

What's next

For more information, see Imagen on Vertex AI .

Edit images Stay organized with collections Save and categorize content based on your preferences.

Supported model versions

HTTP request

Instances

referenceImages object

Parameters

Output options object

Sample request

REST

curl

PowerShell

Class IDs

What's next

Edit images

`referenceImages` object