Module vision_models (1.88.0)

Classes for working with vision models.

Classes

ControlImageConfig

  ControlImageConfig 
 ( 
 control_type 
 : 
 typing 
 . 
 Literal 
 [ 
 "CONTROL_TYPE_DEFAULT" 
 , 
 "CONTROL_TYPE_SCRIBBLE" 
 , 
 "CONTROL_TYPE_FACE_MESH" 
 , 
 "CONTROL_TYPE_CANNY" 
 , 
 ], 
 enable_control_image_computation 
 : 
 typing 
 . 
 Optional 
 [ 
 bool 
 ] 
 = 
 False 
 , 
 ) 
 

Control image config.

ControlReferenceImage

  ControlReferenceImage 
 ( 
 reference_id 
 , 
 image 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Union 
 [ 
 bytes 
 , 
 vertexai 
 . 
 vision_models 
 . 
 Image 
 , 
 str 
 ] 
 ] 
 = 
 None 
 , 
 control_type 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Literal 
 [ 
 "default" 
 , 
 "scribble" 
 , 
 "face_mesh" 
 , 
 "canny" 
 ] 
 ] 
 = 
 None 
 , 
 enable_control_image_computation 
 : 
 typing 
 . 
 Optional 
 [ 
 bool 
 ] 
 = 
 False 
 , 
 ) 
 

Control reference image.

This encapsulates the control reference image type.

EntityLabel

  EntityLabel 
 ( 
 label 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 , 
 score 
 : 
 typing 
 . 
 Optional 
 [ 
 float 
 ] 
 = 
 None 
 ) 
 

Entity label holding a text label and any associated confidence score.

GeneratedImage

  GeneratedImage 
 ( 
 image_bytes 
 : 
 typing 
 . 
 Optional 
 [ 
 bytes 
 ], 
 generation_parameters 
 : 
 typing 
 . 
 Dict 
 [ 
 str 
 , 
 typing 
 . 
 Any 
 ], 
 gcs_uri 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 , 
 ) 
 

Generated image.

GeneratedMask

  GeneratedMask 
 ( 
 image_bytes 
 : 
 typing 
 . 
 Optional 
 [ 
 bytes 
 ], 
 gcs_uri 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 , 
 labels 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 List 
 [ 
 vertexai 
 . 
 preview 
 . 
 vision_models 
 . 
 EntityLabel 
 ] 
 ] 
 = 
 None 
 , 
 ) 
 

Generated image mask.

Image

  Image 
 ( 
 image_bytes 
 : 
 typing 
 . 
 Optional 
 [ 
 bytes 
 ] 
 = 
 None 
 , 
 gcs_uri 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 ) 
 

Image.

ImageCaptioningModel

  ImageCaptioningModel 
 ( 
 model_id 
 : 
 str 
 , 
 endpoint_name 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 ) 
 

Generates captions from image.

Examples::

 model = ImageCaptioningModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
captions = model.get_captions(
    image=image,
    # Optional:
    number_of_results=1,
    language="en",
) 

ImageGenerationModel

  ImageGenerationModel 
 ( 
 model_id 
 : 
 str 
 , 
 endpoint_name 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 ) 
 

Generates images from text prompt.

Examples::

 model = ImageGenerationModel.from_pretrained("imagegeneration@002")
response = model.generate_images(
    prompt="Astronaut riding a horse",
    # Optional:
    number_of_images=1,
    seed=0,
)
response[0].show()
response[0].save("image1.png") 

ImageGenerationResponse

  ImageGenerationResponse 
 ( 
 images 
 : 
 typing 
 . 
 List 
 [ 
 GeneratedImage 
 ]) 
 

Image generation response.

ImageQnAModel

  ImageQnAModel 
 ( 
 model_id 
 : 
 str 
 , 
 endpoint_name 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 ) 
 

Answers questions about an image.

Examples::

 model = ImageQnAModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
answers = model.ask_question(
    image=image,
    question="What color is the car in this image?",
    # Optional:
    number_of_results=1,
) 

ImageSegmentationModel

  ImageSegmentationModel 
 ( 
 model_id 
 : 
 str 
 , 
 endpoint_name 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 ) 
 

Segments an image.

ImageSegmentationResponse

  ImageSegmentationResponse 
 ( 
 _prediction_response 
 : 
 typing 
 . 
 Any 
 , 
 masks 
 : 
 typing 
 . 
 List 
 [ 
 vertexai 
 . 
 preview 
 . 
 vision_models 
 . 
 GeneratedMask 
 ], 
 ) 
 

Image Segmentation response.

ImageTextModel

  ImageTextModel 
 ( 
 model_id 
 : 
 str 
 , 
 endpoint_name 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 ) 
 

Generates text from images.

Examples::

 model = ImageTextModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")

captions = model.get_captions(
    image=image,
    # Optional:
    number_of_results=1,
    language="en",
)

answers = model.ask_question(
    image=image,
    question="What color is the car in this image?",
    # Optional:
    number_of_results=1,
) 

MaskImageConfig

  MaskImageConfig 
 ( 
 mask_mode 
 : 
 typing 
 . 
 Literal 
 [ 
 "MASK_MODE_DEFAULT" 
 , 
 "MASK_MODE_USER_PROVIDED" 
 , 
 "MASK_MODE_BACKGROUND" 
 , 
 "MASK_MODE_FOREGROUND" 
 , 
 "MASK_MODE_SEMANTIC" 
 , 
 ], 
 segmentation_classes 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 List 
 [ 
 int 
 ]] 
 = 
 None 
 , 
 dilation 
 : 
 typing 
 . 
 Optional 
 [ 
 float 
 ] 
 = 
 None 
 , 
 ) 
 

Mask image config.

MaskReferenceImage

  MaskReferenceImage 
 ( 
 reference_id 
 , 
 image 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Union 
 [ 
 bytes 
 , 
 vertexai 
 . 
 vision_models 
 . 
 Image 
 , 
 str 
 ] 
 ] 
 = 
 None 
 , 
 mask_mode 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Literal 
 [ 
 "default" 
 , 
 "user_provided" 
 , 
 "background" 
 , 
 "foreground" 
 , 
 "semantic" 
 ] 
 ] 
 = 
 None 
 , 
 dilation 
 : 
 typing 
 . 
 Optional 
 [ 
 float 
 ] 
 = 
 None 
 , 
 segmentation_classes 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 List 
 [ 
 int 
 ]] 
 = 
 None 
 , 
 ) 
 

Mask reference image. This encapsulates the mask reference image type.

MultiModalEmbeddingModel

  MultiModalEmbeddingModel 
 ( 
 model_id 
 : 
 str 
 , 
 endpoint_name 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 ) 
 

Generates embedding vectors from images and videos.

Examples::

 model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")
image = Image.load_from_file("image.png")
video = Video.load_from_file("video.mp4")

embeddings = model.get_embeddings(
    image=image,
    video=video,
    contextual_text="Hello world",
)
image_embedding = embeddings.image_embedding
video_embeddings = embeddings.video_embeddings
text_embedding = embeddings.text_embedding 

MultiModalEmbeddingResponse

  MultiModalEmbeddingResponse 
 ( 
 _prediction_response 
 : 
 typing 
 . 
 Any 
 , 
 image_embedding 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 List 
 [ 
 float 
 ]] 
 = 
 None 
 , 
 video_embeddings 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 List 
 [ 
 vertexai 
 . 
 vision_models 
 . 
 VideoEmbedding 
 ] 
 ] 
 = 
 None 
 , 
 text_embedding 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 List 
 [ 
 float 
 ]] 
 = 
 None 
 , 
 ) 
 

The multimodal embedding response.

RawReferenceImage

  RawReferenceImage 
 ( 
 reference_id 
 , 
 image 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Union 
 [ 
 bytes 
 , 
 vertexai 
 . 
 vision_models 
 . 
 Image 
 , 
 str 
 ] 
 ] 
 = 
 None 
 , 
 ) 
 

Raw reference image.

This encapsulates the raw reference image type.

ReferenceImage

  ReferenceImage 
 ( 
 reference_id 
 , 
 image 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Union 
 [ 
 bytes 
 , 
 vertexai 
 . 
 vision_models 
 . 
 Image 
 , 
 str 
 ] 
 ] 
 = 
 None 
 , 
 ) 
 

Reference image.

This is a new base API object for Imagen 3.0 Capabilities.

Scribble

  Scribble 
 ( 
 image_bytes 
 : 
 typing 
 . 
 Optional 
 [ 
 bytes 
 ], 
 gcs_uri 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 ) 
 

Input scribble for image segmentation.

StyleImageConfig

  StyleImageConfig 
 ( 
 style_description 
 : 
 str 
 ) 
 

Style image config.

StyleReferenceImage

  StyleReferenceImage 
 ( 
 reference_id 
 , 
 image 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Union 
 [ 
 bytes 
 , 
 vertexai 
 . 
 vision_models 
 . 
 Image 
 , 
 str 
 ] 
 ] 
 = 
 None 
 , 
 style_description 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 , 
 ) 
 

Style reference image. This encapsulates the style reference image type.

SubjectImageConfig

  SubjectImageConfig 
 ( 
 subject_description 
 : 
 str 
 , 
 subject_type 
 : 
 typing 
 . 
 Literal 
 [ 
 "SUBJECT_TYPE_DEFAULT" 
 , 
 "SUBJECT_TYPE_PERSON" 
 , 
 "SUBJECT_TYPE_ANIMAL" 
 , 
 "SUBJECT_TYPE_PRODUCT" 
 , 
 ], 
 ) 
 

Subject image config.

SubjectReferenceImage

  SubjectReferenceImage 
 ( 
 reference_id 
 , 
 image 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Union 
 [ 
 bytes 
 , 
 vertexai 
 . 
 vision_models 
 . 
 Image 
 , 
 str 
 ] 
 ] 
 = 
 None 
 , 
 subject_description 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 , 
 subject_type 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Literal 
 [ 
 "default" 
 , 
 "person" 
 , 
 "animal" 
 , 
 "product" 
 ] 
 ] 
 = 
 None 
 , 
 ) 
 

Subject reference image.

This encapsulates the subject reference image type.

Video

  Video 
 ( 
 video_bytes 
 : 
 typing 
 . 
 Optional 
 [ 
 bytes 
 ] 
 = 
 None 
 , 
 gcs_uri 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 ) 
 

Video.

VideoEmbedding

  VideoEmbedding 
 ( 
 start_offset_sec 
 : 
 int 
 , 
 end_offset_sec 
 : 
 int 
 , 
 embedding 
 : 
 typing 
 . 
 List 
 [ 
 float 
 ] 
 ) 
 

Embeddings generated from video with offset times.

VideoSegmentConfig

  VideoSegmentConfig 
 ( 
 start_offset_sec 
 : 
 int 
 = 
 0 
 , 
 end_offset_sec 
 : 
 int 
 = 
 120 
 , 
 interval_sec 
 : 
 int 
 = 
 16 
 ) 
 

The specific video segments (in seconds) the embeddings are generated for.

WatermarkVerificationModel

  WatermarkVerificationModel 
 ( 
 model_id 
 : 
 str 
 , 
 endpoint_name 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 ) 
 

Verifies if an image has a watermark.

WatermarkVerificationResponse

  WatermarkVerificationResponse 
 ( 
 _prediction_response 
 : 
 Any 
 , 
 watermark_verification_result 
 : 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 ) 
 

WatermarkVerificationResponse(_prediction_response: Any, watermark_verification_result: Optional[str] = None)

Create a Mobile Website
View Site in Mobile | Classic
Share by: