Classes for working with vision models.
Classes
ControlImageConfig
ControlImageConfig
(
control_type
:
typing
.
Literal
[
"CONTROL_TYPE_DEFAULT"
,
"CONTROL_TYPE_SCRIBBLE"
,
"CONTROL_TYPE_FACE_MESH"
,
"CONTROL_TYPE_CANNY"
,
],
enable_control_image_computation
:
typing
.
Optional
[
bool
]
=
False
,
)
Control image config.
ControlReferenceImage
ControlReferenceImage
(
reference_id
,
image
:
typing
.
Optional
[
typing
.
Union
[
bytes
,
vertexai
.
vision_models
.
Image
,
str
]
]
=
None
,
control_type
:
typing
.
Optional
[
typing
.
Literal
[
"default"
,
"scribble"
,
"face_mesh"
,
"canny"
]
]
=
None
,
enable_control_image_computation
:
typing
.
Optional
[
bool
]
=
False
,
)
Control reference image.
This encapsulates the control reference image type.
EntityLabel
EntityLabel
(
label
:
typing
.
Optional
[
str
]
=
None
,
score
:
typing
.
Optional
[
float
]
=
None
)
Entity label holding a text label and any associated confidence score.
GeneratedImage
GeneratedImage
(
image_bytes
:
typing
.
Optional
[
bytes
],
generation_parameters
:
typing
.
Dict
[
str
,
typing
.
Any
],
gcs_uri
:
typing
.
Optional
[
str
]
=
None
,
)
Generated image.
GeneratedMask
GeneratedMask
(
image_bytes
:
typing
.
Optional
[
bytes
],
gcs_uri
:
typing
.
Optional
[
str
]
=
None
,
labels
:
typing
.
Optional
[
typing
.
List
[
vertexai
.
preview
.
vision_models
.
EntityLabel
]
]
=
None
,
)
Generated image mask.
Image
Image
(
image_bytes
:
typing
.
Optional
[
bytes
]
=
None
,
gcs_uri
:
typing
.
Optional
[
str
]
=
None
)
Image.
ImageCaptioningModel
ImageCaptioningModel
(
model_id
:
str
,
endpoint_name
:
typing
.
Optional
[
str
]
=
None
)
Generates captions from image.
Examples::
model = ImageCaptioningModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
captions = model.get_captions(
image=image,
# Optional:
number_of_results=1,
language="en",
)
ImageGenerationModel
ImageGenerationModel
(
model_id
:
str
,
endpoint_name
:
typing
.
Optional
[
str
]
=
None
)
Generates images from text prompt.
Examples::
model = ImageGenerationModel.from_pretrained("imagegeneration@002")
response = model.generate_images(
prompt="Astronaut riding a horse",
# Optional:
number_of_images=1,
seed=0,
)
response[0].show()
response[0].save("image1.png")
ImageGenerationResponse
ImageGenerationResponse
(
images
:
typing
.
List
[
GeneratedImage
])
Image generation response.
ImageQnAModel
ImageQnAModel
(
model_id
:
str
,
endpoint_name
:
typing
.
Optional
[
str
]
=
None
)
Answers questions about an image.
Examples::
model = ImageQnAModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
answers = model.ask_question(
image=image,
question="What color is the car in this image?",
# Optional:
number_of_results=1,
)
ImageSegmentationModel
ImageSegmentationModel
(
model_id
:
str
,
endpoint_name
:
typing
.
Optional
[
str
]
=
None
)
Segments an image.
ImageSegmentationResponse
ImageSegmentationResponse
(
_prediction_response
:
typing
.
Any
,
masks
:
typing
.
List
[
vertexai
.
preview
.
vision_models
.
GeneratedMask
],
)
Image Segmentation response.
ImageTextModel
ImageTextModel
(
model_id
:
str
,
endpoint_name
:
typing
.
Optional
[
str
]
=
None
)
Generates text from images.
Examples::
model = ImageTextModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
captions = model.get_captions(
image=image,
# Optional:
number_of_results=1,
language="en",
)
answers = model.ask_question(
image=image,
question="What color is the car in this image?",
# Optional:
number_of_results=1,
)
MaskImageConfig
MaskImageConfig
(
mask_mode
:
typing
.
Literal
[
"MASK_MODE_DEFAULT"
,
"MASK_MODE_USER_PROVIDED"
,
"MASK_MODE_BACKGROUND"
,
"MASK_MODE_FOREGROUND"
,
"MASK_MODE_SEMANTIC"
,
],
segmentation_classes
:
typing
.
Optional
[
typing
.
List
[
int
]]
=
None
,
dilation
:
typing
.
Optional
[
float
]
=
None
,
)
Mask image config.
MaskReferenceImage
MaskReferenceImage
(
reference_id
,
image
:
typing
.
Optional
[
typing
.
Union
[
bytes
,
vertexai
.
vision_models
.
Image
,
str
]
]
=
None
,
mask_mode
:
typing
.
Optional
[
typing
.
Literal
[
"default"
,
"user_provided"
,
"background"
,
"foreground"
,
"semantic"
]
]
=
None
,
dilation
:
typing
.
Optional
[
float
]
=
None
,
segmentation_classes
:
typing
.
Optional
[
typing
.
List
[
int
]]
=
None
,
)
Mask reference image. This encapsulates the mask reference image type.
MultiModalEmbeddingModel
MultiModalEmbeddingModel
(
model_id
:
str
,
endpoint_name
:
typing
.
Optional
[
str
]
=
None
)
Generates embedding vectors from images and videos.
Examples::
model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")
image = Image.load_from_file("image.png")
video = Video.load_from_file("video.mp4")
embeddings = model.get_embeddings(
image=image,
video=video,
contextual_text="Hello world",
)
image_embedding = embeddings.image_embedding
video_embeddings = embeddings.video_embeddings
text_embedding = embeddings.text_embedding
MultiModalEmbeddingResponse
MultiModalEmbeddingResponse
(
_prediction_response
:
typing
.
Any
,
image_embedding
:
typing
.
Optional
[
typing
.
List
[
float
]]
=
None
,
video_embeddings
:
typing
.
Optional
[
typing
.
List
[
vertexai
.
vision_models
.
VideoEmbedding
]
]
=
None
,
text_embedding
:
typing
.
Optional
[
typing
.
List
[
float
]]
=
None
,
)
The multimodal embedding response.
RawReferenceImage
RawReferenceImage
(
reference_id
,
image
:
typing
.
Optional
[
typing
.
Union
[
bytes
,
vertexai
.
vision_models
.
Image
,
str
]
]
=
None
,
)
Raw reference image.
This encapsulates the raw reference image type.
ReferenceImage
ReferenceImage
(
reference_id
,
image
:
typing
.
Optional
[
typing
.
Union
[
bytes
,
vertexai
.
vision_models
.
Image
,
str
]
]
=
None
,
)
Reference image.
This is a new base API object for Imagen 3.0 Capabilities.
Scribble
Scribble
(
image_bytes
:
typing
.
Optional
[
bytes
],
gcs_uri
:
typing
.
Optional
[
str
]
=
None
)
Input scribble for image segmentation.
StyleImageConfig
StyleImageConfig
(
style_description
:
str
)
Style image config.
StyleReferenceImage
StyleReferenceImage
(
reference_id
,
image
:
typing
.
Optional
[
typing
.
Union
[
bytes
,
vertexai
.
vision_models
.
Image
,
str
]
]
=
None
,
style_description
:
typing
.
Optional
[
str
]
=
None
,
)
Style reference image. This encapsulates the style reference image type.
SubjectImageConfig
SubjectImageConfig
(
subject_description
:
str
,
subject_type
:
typing
.
Literal
[
"SUBJECT_TYPE_DEFAULT"
,
"SUBJECT_TYPE_PERSON"
,
"SUBJECT_TYPE_ANIMAL"
,
"SUBJECT_TYPE_PRODUCT"
,
],
)
Subject image config.
SubjectReferenceImage
SubjectReferenceImage
(
reference_id
,
image
:
typing
.
Optional
[
typing
.
Union
[
bytes
,
vertexai
.
vision_models
.
Image
,
str
]
]
=
None
,
subject_description
:
typing
.
Optional
[
str
]
=
None
,
subject_type
:
typing
.
Optional
[
typing
.
Literal
[
"default"
,
"person"
,
"animal"
,
"product"
]
]
=
None
,
)
Subject reference image.
This encapsulates the subject reference image type.
Video
Video
(
video_bytes
:
typing
.
Optional
[
bytes
]
=
None
,
gcs_uri
:
typing
.
Optional
[
str
]
=
None
)
Video.
VideoEmbedding
VideoEmbedding
(
start_offset_sec
:
int
,
end_offset_sec
:
int
,
embedding
:
typing
.
List
[
float
]
)
Embeddings generated from video with offset times.
VideoSegmentConfig
VideoSegmentConfig
(
start_offset_sec
:
int
=
0
,
end_offset_sec
:
int
=
120
,
interval_sec
:
int
=
16
)
The specific video segments (in seconds) the embeddings are generated for.
WatermarkVerificationModel
WatermarkVerificationModel
(
model_id
:
str
,
endpoint_name
:
typing
.
Optional
[
str
]
=
None
)
Verifies if an image has a watermark.
WatermarkVerificationResponse
WatermarkVerificationResponse
(
_prediction_response
:
Any
,
watermark_verification_result
:
Optional
[
str
]
=
None
)
WatermarkVerificationResponse(_prediction_response: Any, watermark_verification_result: Optional[str] = None)