Bounding box detection

In this experimental launch, we are providing developers with a powerful tool for object detection and localization within images and video. By accurately identifying and delineating objects with bounding boxes, developers can unlock a wide range of applications and enhance the intelligence of their projects.

Key Benefits:

  • Simple:Integrate object detection capabilities into your applications with ease, regardless of your computer vision expertise.
  • Customizable:Produce bounding boxes based on custom instructions (e.g. "I want to see bounding boxes of all the green objects in this image"), without having to train a custom model.

Technical Details:

  • Input:Your prompt and associated images or video frames.
  • Output:Bounding boxes in the [y_min, x_min, y_max, x_max] format. The top left corner is the origin. The x and y axis go horizontally and vertically, respectively. Coordinate values are normalized to 0-1000 for every image.
  • Visualization:AI Studio users will see bounding boxes plotted within the UI. Vertex AI users should visualize their bounding boxes through custom visualization code.

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation .

Set environment variables to use the Gen AI SDK with Vertex AI:

 # Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values 
 # with appropriate values for your project. 
 export 
  
 GOOGLE_CLOUD_PROJECT 
 = 
 GOOGLE_CLOUD_PROJECT 
 export 
  
 GOOGLE_CLOUD_LOCATION 
 = 
 global 
 export 
  
 GOOGLE_GENAI_USE_VERTEXAI 
 = 
True
  import 
  
 requests 
 from 
  
 google 
  
 import 
 genai 
 from 
  
 google.genai.types 
  
 import 
 ( 
 GenerateContentConfig 
 , 
 HarmBlockThreshold 
 , 
 HarmCategory 
 , 
 HttpOptions 
 , 
 Part 
 , 
 SafetySetting 
 , 
 ) 
 from 
  
 PIL 
  
 import 
 Image 
 , 
 ImageColor 
 , 
 ImageDraw 
 from 
  
 pydantic 
  
 import 
 BaseModel 
 # Helper class to represent a bounding box 
 class 
  
 BoundingBox 
 ( 
 BaseModel 
 ): 
  
 """ 
 Represents a bounding box with its 2D coordinates and associated label. 
 Attributes: 
 box_2d (list[int]): A list of integers representing the 2D coordinates of the bounding box, 
 typically in the format [y_min, x_min, y_max, x_max]. 
 label (str): A string representing the label or class associated with the object within the bounding box. 
 """ 
 box_2d 
 : 
 list 
 [ 
 int 
 ] 
 label 
 : 
 str 
 # Helper function to plot bounding boxes on an image 
 def 
  
 plot_bounding_boxes 
 ( 
 image_uri 
 : 
 str 
 , 
 bounding_boxes 
 : 
 list 
 [ 
 BoundingBox 
 ]) 
 - 
> None 
 : 
  
 """ 
 Plots bounding boxes on an image with labels, using PIL and normalized coordinates. 
 Args: 
 image_uri: The URI of the image file. 
 bounding_boxes: A list of BoundingBox objects. Each box's coordinates are in 
 normalized [y_min, x_min, y_max, x_max] format. 
 """ 
 with 
 Image 
 . 
 open 
 ( 
 requests 
 . 
 get 
 ( 
 image_uri 
 , 
 stream 
 = 
 True 
 , 
 timeout 
 = 
 10 
 ) 
 . 
 raw 
 ) 
 as 
 im 
 : 
 width 
 , 
 height 
 = 
 im 
 . 
 size 
 draw 
 = 
 ImageDraw 
 . 
 Draw 
 ( 
 im 
 ) 
 colors 
 = 
 list 
 ( 
 ImageColor 
 . 
 colormap 
 . 
 keys 
 ()) 
 for 
 i 
 , 
 bbox 
 in 
 enumerate 
 ( 
 bounding_boxes 
 ): 
 # Scale normalized coordinates to image dimensions 
 abs_y_min 
 = 
 int 
 ( 
 bbox 
 . 
 box_2d 
 [ 
 0 
 ] 
 / 
 1000 
 * 
 height 
 ) 
 abs_x_min 
 = 
 int 
 ( 
 bbox 
 . 
 box_2d 
 [ 
 1 
 ] 
 / 
 1000 
 * 
 width 
 ) 
 abs_y_max 
 = 
 int 
 ( 
 bbox 
 . 
 box_2d 
 [ 
 2 
 ] 
 / 
 1000 
 * 
 height 
 ) 
 abs_x_max 
 = 
 int 
 ( 
 bbox 
 . 
 box_2d 
 [ 
 3 
 ] 
 / 
 1000 
 * 
 width 
 ) 
 color 
 = 
 colors 
 [ 
 i 
 % 
 len 
 ( 
 colors 
 )] 
 # Draw the rectangle using the correct (x, y) pairs 
 draw 
 . 
 rectangle 
 ( 
 (( 
 abs_x_min 
 , 
 abs_y_min 
 ), 
 ( 
 abs_x_max 
 , 
 abs_y_max 
 )), 
 outline 
 = 
 color 
 , 
 width 
 = 
 4 
 , 
 ) 
 if 
 bbox 
 . 
 label 
 : 
 # Position the text at the top-left corner of the box 
 draw 
 . 
 text 
 (( 
 abs_x_min 
 + 
 8 
 , 
 abs_y_min 
 + 
 6 
 ), 
 bbox 
 . 
 label 
 , 
 fill 
 = 
 color 
 ) 
 im 
 . 
 show 
 () 
 client 
 = 
 genai 
 . 
 Client 
 ( 
 http_options 
 = 
 HttpOptions 
 ( 
 api_version 
 = 
 "v1" 
 )) 
 config 
 = 
 GenerateContentConfig 
 ( 
 system_instruction 
 = 
 """ 
 Return bounding boxes as an array with labels. 
 Never return masks. Limit to 25 objects. 
 If an object is present multiple times, give each object a unique label 
 according to its distinct characteristics (colors, size, position, etc..). 
 """ 
 , 
 temperature 
 = 
 0.5 
 , 
 safety_settings 
 = 
 [ 
 SafetySetting 
 ( 
 category 
 = 
 HarmCategory 
 . 
 HARM_CATEGORY_DANGEROUS_CONTENT 
 , 
 threshold 
 = 
 HarmBlockThreshold 
 . 
 BLOCK_ONLY_HIGH 
 , 
 ), 
 ], 
 response_mime_type 
 = 
 "application/json" 
 , 
 response_schema 
 = 
 list 
 [ 
 BoundingBox 
 ], 
 ) 
 image_uri 
 = 
 "https://storage.googleapis.com/generativeai-downloads/images/socks.jpg" 
 response 
 = 
 client 
 . 
 models 
 . 
 generate_content 
 ( 
 model 
 = 
 "gemini-2.5-flash" 
 , 
 contents 
 = 
 [ 
 Part 
 . 
 from_uri 
 ( 
 file_uri 
 = 
 image_uri 
 , 
 mime_type 
 = 
 "image/jpeg" 
 , 
 ), 
 "Output the positions of the socks with a face. Label according to position in the image." 
 , 
 ], 
 config 
 = 
 config 
 , 
 ) 
 print 
 ( 
 response 
 . 
 text 
 ) 
 plot_bounding_boxes 
 ( 
 image_uri 
 , 
 response 
 . 
 parsed 
 ) 
 # Example response: 
 # [ 
 #     {"box_2d": [6, 246, 386, 526], "label": "top-left light blue sock with cat face"}, 
 #     {"box_2d": [234, 649, 650, 863], "label": "top-right light blue sock with cat face"}, 
 # ] 
 
Create a Mobile Website
View Site in Mobile | Classic
Share by: