Track objects

Object tracking tracks objects detected in an input video. To make an object tracking request, call the annotate method and specify OBJECT_TRACKING in the features field.

For entities and spatial locations that are detected in a video or video segments, an object tracking request annotates the video with the appropriate labels for these entities and spatial locations. For example, a video of vehicles crossing a traffic signal might produce labels such as "car", "truck", "bike," "tires", "lights", "window" and so on. Each label can include a series of bounding boxes, with each bounding box having an associated time segment containing a time offset that indicates the duration offset from the beginning of the video. The annotation also contains additional entity information including an entity ID that you can use to find more information about the entity in the Google Knowledge Graph Search API .

Object tracking vs. label detection

Object tracking differs from label detection . Label detection provides labels without bounding boxes, whereas object tracking provides the labels of the individual objects present in a given video along with the bounding box of each object instance at every time step.

Multiple instances of the same object type are assigned to different instances of ObjectTrackingAnnotation message where all the occurrences of a given object track are kept in its own instance of ObjectTrackingAnnotation . For example, if there is a red car and a blue car appearing for 5 seconds in a video, the tracking request should return two instances of ObjectTrackingAnnotation . The first instance will contain the locations of one of the two cars, for example, the red car, while the second will contain the locations of the other car.

Request object tracking for a video on Cloud Storage

The following samples demonstrate object tracking on a file located in Cloud Storage.

REST

Send the process request

The following shows how to send a POST request to the annotate method. The example uses the access token for a service account set up for the project using the Google Cloud CLI. For instructions on installing the Google Cloud CLI, setting up a project with a service account, and obtaining an access token, see the Video Intelligence quickstart .

Before using any of the request data, make the following replacements:

  • INPUT_URI : STORAGE_URI
    For example:
    "inputUri": "gs://cloud-videointelligence-demo/assistant.mp4",
  • PROJECT_NUMBER : The numeric identifier for your Google Cloud project

HTTP method and URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

Request JSON body:

{
  "inputUri": " STORAGE_URI 
",
  "features": ["OBJECT_TRACKING"]
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/ PROJECT_NUMBER 
/locations/ LOCATION_ID 
/operations/ OPERATION_ID 
"
}

If the request is successful, the Video Intelligence API returns the name of your operation. The above shows an example of such a response, where PROJECT_NUMBER is the number of your project and OPERATION_ID is the ID of the long-running operation created for the request.

Get the results

To get the results of your request, send a GET , using the operation name returned from the call to videos:annotate , as shown in the following example.

Before using any of the request data, make the following replacements:

  • OPERATION_NAME : the name of the operation as returned by the Video Intelligence API. The operation name has the format projects/ PROJECT_NUMBER /locations/ LOCATION_ID /operations/ OPERATION_ID
  • PROJECT_NUMBER : The numeric identifier for your Google Cloud project

HTTP method and URL:

GET https://videointelligence.googleapis.com/v1/ OPERATION_NAME 

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

Download annotation results

Copy the annotation from the source to the destination bucket: (see Copy files and objects )

gcloud storage cp gcs_uri gs://my-bucket

Note: If the output gcs uri is provided by the user, then the annotation is stored in that gcs uri.

Go

  import 
  
 ( 
  
 "context" 
  
 "fmt" 
  
 "io" 
  
 video 
  
 "cloud.google.com/go/videointelligence/apiv1" 
  
 videopb 
  
 "cloud.google.com/go/videointelligence/apiv1/videointelligencepb" 
  
 "github.com/golang/protobuf/ptypes" 
 ) 
 // objectTrackingGCS analyzes a video and extracts entities with their bounding boxes. 
 func 
  
 objectTrackingGCS 
 ( 
 w 
  
 io 
 . 
 Writer 
 , 
  
 gcsURI 
  
 string 
 ) 
  
 error 
  
 { 
  
 // gcsURI := "gs://cloud-samples-data/video/cat.mp4" 
  
 ctx 
  
 := 
  
 context 
 . 
 Background 
 () 
  
 // Creates a client. 
  
 client 
 , 
  
 err 
  
 := 
  
 video 
 . 
  NewClient 
 
 ( 
 ctx 
 ) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 fmt 
 . 
 Errorf 
 ( 
 "video.NewClient: %w" 
 , 
  
 err 
 ) 
  
 } 
  
 defer 
  
 client 
 . 
  Close 
 
 () 
  
 op 
 , 
  
 err 
  
 := 
  
 client 
 . 
 AnnotateVideo 
 ( 
 ctx 
 , 
  
& videopb 
 . 
 AnnotateVideoRequest 
 { 
  
 InputUri 
 : 
  
 gcsURI 
 , 
  
 Features 
 : 
  
 [] 
 videopb 
 . 
 Feature 
 { 
  
 videopb 
 . 
 Feature_OBJECT_TRACKING 
 , 
  
 }, 
  
 }) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 fmt 
 . 
 Errorf 
 ( 
 "AnnotateVideo: %w" 
 , 
  
 err 
 ) 
  
 } 
  
 resp 
 , 
  
 err 
  
 := 
  
 op 
 . 
  Wait 
 
 ( 
 ctx 
 ) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 fmt 
 . 
 Errorf 
 ( 
 "Wait: %w" 
 , 
  
 err 
 ) 
  
 } 
  
 // Only one video was processed, so get the first result. 
  
 result 
  
 := 
  
 resp 
 . 
 GetAnnotationResults 
 ()[ 
 0 
 ] 
  
 for 
  
 _ 
 , 
  
 annotation 
  
 := 
  
 range 
  
 result 
 . 
 ObjectAnnotations 
  
 { 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "Description: %q\n" 
 , 
  
 annotation 
 . 
 Entity 
 . 
 GetDescription 
 ()) 
  
 if 
  
 len 
 ( 
 annotation 
 . 
 Entity 
 . 
 EntityId 
 ) 
 > 
 0 
  
 { 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\tEntity ID: %q\n" 
 , 
  
 annotation 
 . 
 Entity 
 . 
 GetEntityId 
 ()) 
  
 } 
  
 segment 
  
 := 
  
 annotation 
 . 
 GetSegment 
 () 
  
 start 
 , 
  
 _ 
  
 := 
  
 ptypes 
 . 
 Duration 
 ( 
 segment 
 . 
 GetStartTimeOffset 
 ()) 
  
 end 
 , 
  
 _ 
  
 := 
  
 ptypes 
 . 
 Duration 
 ( 
 segment 
 . 
 GetEndTimeOffset 
 ()) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\tSegment: %v to %v\n" 
 , 
  
 start 
 , 
  
 end 
 ) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\tConfidence: %f\n" 
 , 
  
 annotation 
 . 
 GetConfidence 
 ()) 
  
 // Here we print only the bounding box of the first frame in this segment. 
  
 frame 
  
 := 
  
 annotation 
 . 
 GetFrames 
 ()[ 
 0 
 ] 
  
 seconds 
  
 := 
  
 float32 
 ( 
 frame 
 . 
 GetTimeOffset 
 (). 
 GetSeconds 
 ()) 
  
 nanos 
  
 := 
  
 float32 
 ( 
 frame 
 . 
 GetTimeOffset 
 (). 
 GetNanos 
 ()) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\tTime offset of the first frame: %fs\n" 
 , 
  
 seconds 
 + 
 nanos 
 / 
 1e9 
 ) 
  
 box 
  
 := 
  
 frame 
 . 
 GetNormalizedBoundingBox 
 () 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\tBounding box position:\n" 
 ) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\t\tleft  : %f\n" 
 , 
  
 box 
 . 
 GetLeft 
 ()) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\t\ttop   : %f\n" 
 , 
  
 box 
 . 
 GetTop 
 ()) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\t\tright : %f\n" 
 , 
  
 box 
 . 
 GetRight 
 ()) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\t\tbottom: %f\n" 
 , 
  
 box 
 . 
 GetBottom 
 ()) 
  
 } 
  
 return 
  
 nil 
 } 
 

Java

  /** 
 * Track objects in a video. 
 * 
 * @param gcsUri the path to the video file to analyze. 
 */ 
 public 
  
 static 
  
 VideoAnnotationResults 
  
 trackObjectsGcs 
 ( 
 String 
  
 gcsUri 
 ) 
  
 throws 
  
 Exception 
  
 { 
  
 try 
  
 ( 
 VideoIntelligenceServiceClient 
  
 client 
  
 = 
  
 VideoIntelligenceServiceClient 
 . 
 create 
 ()) 
  
 { 
  
 // Create the request 
  
 AnnotateVideoRequest 
  
 request 
  
 = 
  
 AnnotateVideoRequest 
 . 
 newBuilder 
 () 
  
 . 
 setInputUri 
 ( 
 gcsUri 
 ) 
  
 . 
 addFeatures 
 ( 
 Feature 
 . 
 OBJECT_TRACKING 
 ) 
  
 . 
 setLocationId 
 ( 
 "us-east1" 
 ) 
  
 . 
 build 
 (); 
  
 // asynchronously perform object tracking on videos 
  
 OperationFuture<AnnotateVideoResponse 
 , 
  
 AnnotateVideoProgress 
>  
 future 
  
 = 
  
 client 
 . 
 annotateVideoAsync 
 ( 
 request 
 ); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Waiting for operation to complete..." 
 ); 
  
 // The first result is retrieved because a single video was processed. 
  
 AnnotateVideoResponse 
  
 response 
  
 = 
  
 future 
 . 
 get 
 ( 
 450 
 , 
  
 TimeUnit 
 . 
 SECONDS 
 ); 
  
 VideoAnnotationResults 
  
 results 
  
 = 
  
 response 
 . 
 getAnnotationResults 
 ( 
 0 
 ); 
  
 // Get only the first annotation for demo purposes. 
  
 ObjectTrackingAnnotation 
  
 annotation 
  
 = 
  
 results 
 . 
 getObjectAnnotations 
 ( 
 0 
 ); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Confidence: " 
  
 + 
  
 annotation 
 . 
 getConfidence 
 ()); 
  
 if 
  
 ( 
 annotation 
 . 
 hasEntity 
 ()) 
  
 { 
  
 Entity 
  
 entity 
  
 = 
  
 annotation 
 . 
 getEntity 
 (); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Entity description: " 
  
 + 
  
 entity 
 . 
 getDescription 
 ()); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Entity id:: " 
  
 + 
  
 entity 
 . 
 getEntityId 
 ()); 
  
 } 
  
 if 
  
 ( 
 annotation 
 . 
 hasSegment 
 ()) 
  
 { 
  
 VideoSegment 
  
 videoSegment 
  
 = 
  
 annotation 
 . 
 getSegment 
 (); 
  
 Duration 
  
 startTimeOffset 
  
 = 
  
 videoSegment 
 . 
 getStartTimeOffset 
 (); 
  
 Duration 
  
 endTimeOffset 
  
 = 
  
 videoSegment 
 . 
 getEndTimeOffset 
 (); 
  
 // Display the segment time in seconds, 1e9 converts nanos to seconds 
  
 System 
 . 
 out 
 . 
 println 
 ( 
  
 String 
 . 
 format 
 ( 
  
 "Segment: %.2fs to %.2fs" 
 , 
  
 startTimeOffset 
 . 
 getSeconds 
 () 
  
 + 
  
 startTimeOffset 
 . 
 getNanos 
 () 
  
 / 
  
 1e9 
 , 
  
 endTimeOffset 
 . 
 getSeconds 
 () 
  
 + 
  
 endTimeOffset 
 . 
 getNanos 
 () 
  
 / 
  
 1e9 
 )); 
  
 } 
  
 // Here we print only the bounding box of the first frame in this segment. 
  
 ObjectTrackingFrame 
  
 frame 
  
 = 
  
 annotation 
 . 
 getFrames 
 ( 
 0 
 ); 
  
 // Display the offset time in seconds, 1e9 converts nanos to seconds 
  
 Duration 
  
 timeOffset 
  
 = 
  
 frame 
 . 
 getTimeOffset 
 (); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
  
 String 
 . 
 format 
 ( 
  
 "Time offset of the first frame: %.2fs" 
 , 
  
 timeOffset 
 . 
 getSeconds 
 () 
  
 + 
  
 timeOffset 
 . 
 getNanos 
 () 
  
 / 
  
 1e9 
 )); 
  
 // Display the bounding box of the detected object 
  
 NormalizedBoundingBox 
  
 normalizedBoundingBox 
  
 = 
  
 frame 
 . 
 getNormalizedBoundingBox 
 (); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Bounding box position:" 
 ); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "\tleft: " 
  
 + 
  
 normalizedBoundingBox 
 . 
 getLeft 
 ()); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "\ttop: " 
  
 + 
  
 normalizedBoundingBox 
 . 
 getTop 
 ()); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "\tright: " 
  
 + 
  
 normalizedBoundingBox 
 . 
 getRight 
 ()); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "\tbottom: " 
  
 + 
  
 normalizedBoundingBox 
 . 
 getBottom 
 ()); 
  
 return 
  
 results 
 ; 
  
 } 
 } 
 

Node.js

To authenticate to Video Intelligence, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  // Imports the Google Cloud Video Intelligence library 
 const 
  
 Video 
  
 = 
  
 require 
 ( 
 ' @google-cloud/video-intelligence 
' 
 ); 
 // Creates a client 
 const 
  
 video 
  
 = 
  
 new 
  
 Video 
 . 
  VideoIntelligenceServiceClient 
 
 (); 
 /** 
 * TODO(developer): Uncomment the following line before running the sample. 
 */ 
 // const gcsUri = 'GCS URI of the video to analyze, e.g. gs://my-bucket/my-video.mp4'; 
 const 
  
 request 
  
 = 
  
 { 
  
 inputUri 
 : 
  
 gcsUri 
 , 
  
 features 
 : 
  
 [ 
 'OBJECT_TRACKING' 
 ], 
  
 //recommended to use us-east1 for the best latency due to different types of processors used in this region and others 
  
 locationId 
 : 
  
 'us-east1' 
 , 
 }; 
 // Detects objects in a video 
 const 
  
 [ 
 operation 
 ] 
  
 = 
  
 await 
  
 video 
 . 
 annotateVideo 
 ( 
 request 
 ); 
 const 
  
 results 
  
 = 
  
 await 
  
 operation 
 . 
 promise 
 (); 
 console 
 . 
 log 
 ( 
 'Waiting for operation to complete...' 
 ); 
 //Gets annotations for video 
 const 
  
 annotations 
  
 = 
  
 results 
 [ 
 0 
 ]. 
 annotationResults 
 [ 
 0 
 ]; 
 const 
  
 objects 
  
 = 
  
 annotations 
 . 
 objectAnnotations 
 ; 
 objects 
 . 
 forEach 
 ( 
 object 
  
 = 
>  
 { 
  
 console 
 . 
 log 
 ( 
 `Entity description: 
 ${ 
 object 
 . 
 entity 
 . 
 description 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 `Entity id: 
 ${ 
 object 
 . 
 entity 
 . 
 entityId 
 } 
 ` 
 ); 
  
 const 
  
 time 
  
 = 
  
 object 
 . 
 segment 
 ; 
  
 console 
 . 
 log 
 ( 
  
 `Segment: 
 ${ 
 time 
 . 
 startTimeOffset 
 . 
 seconds 
  
 || 
  
 0 
 } 
 ` 
  
 + 
  
 `. 
 ${ 
 ( 
 time 
 . 
 startTimeOffset 
 . 
 nanos 
  
 / 
  
 1e6 
 ). 
 toFixed 
 ( 
 0 
 ) 
 } 
 s to 
 ${ 
  
 time 
 . 
 endTimeOffset 
 . 
 seconds 
  
 || 
  
 0 
  
 } 
 .` 
  
 + 
  
 ` 
 ${ 
 ( 
 time 
 . 
 endTimeOffset 
 . 
 nanos 
  
 / 
  
 1e6 
 ). 
 toFixed 
 ( 
 0 
 ) 
 } 
 s` 
  
 ); 
  
 console 
 . 
 log 
 ( 
 `Confidence: 
 ${ 
 object 
 . 
 confidence 
 } 
 ` 
 ); 
  
 const 
  
 frame 
  
 = 
  
 object 
 . 
 frames 
 [ 
 0 
 ]; 
  
 const 
  
 box 
  
 = 
  
 frame 
 . 
 normalizedBoundingBox 
 ; 
  
 const 
  
 timeOffset 
  
 = 
  
 frame 
 . 
 timeOffset 
 ; 
  
 console 
 . 
 log 
 ( 
  
 `Time offset for the first frame: 
 ${ 
 timeOffset 
 . 
 seconds 
  
 || 
  
 0 
 } 
 ` 
  
 + 
  
 `. 
 ${ 
 ( 
 timeOffset 
 . 
 nanos 
  
 / 
  
 1e6 
 ). 
 toFixed 
 ( 
 0 
 ) 
 } 
 s` 
  
 ); 
  
 console 
 . 
 log 
 ( 
 'Bounding box position:' 
 ); 
  
 console 
 . 
 log 
 ( 
 ` left   : 
 ${ 
 box 
 . 
 left 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 ` top    : 
 ${ 
 box 
 . 
 top 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 ` right  : 
 ${ 
 box 
 . 
 right 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 ` bottom : 
 ${ 
 box 
 . 
 bottom 
 } 
 ` 
 ); 
 }); 
 

Python

  """Object tracking in a video stored on GCS.""" 
 from 
  
 google.cloud 
  
 import 
 videointelligence 
 video_client 
 = 
 videointelligence 
 . 
  VideoIntelligenceServiceClient 
 
 () 
 features 
 = 
 [ 
 videointelligence 
 . 
  Feature 
 
 . 
 OBJECT_TRACKING 
 ] 
 operation 
 = 
 video_client 
 . 
  annotate_video 
 
 ( 
 request 
 = 
 { 
 "features" 
 : 
 features 
 , 
 "input_uri" 
 : 
 gcs_uri 
 } 
 ) 
 print 
 ( 
 " 
 \n 
 Processing video for object annotations." 
 ) 
 result 
 = 
 operation 
 . 
 result 
 ( 
 timeout 
 = 
 500 
 ) 
 print 
 ( 
 " 
 \n 
 Finished processing. 
 \n 
 " 
 ) 
 # The first result is retrieved because a single video was processed. 
 object_annotations 
 = 
 result 
 . 
 annotation_results 
 [ 
 0 
 ] 
 . 
 object_annotations 
 for 
 object_annotation 
 in 
 object_annotations 
 : 
 print 
 ( 
 "Entity description: 
 {} 
 " 
 . 
 format 
 ( 
 object_annotation 
 . 
 entity 
 . 
 description 
 )) 
 if 
 object_annotation 
 . 
 entity 
 . 
 entity_id 
 : 
 print 
 ( 
 "Entity id: 
 {} 
 " 
 . 
 format 
 ( 
 object_annotation 
 . 
 entity 
 . 
 entity_id 
 )) 
 print 
 ( 
 "Segment: 
 {} 
 s to 
 {} 
 s" 
 . 
 format 
 ( 
 object_annotation 
 . 
 segment 
 . 
 start_time_offset 
 . 
 seconds 
 + 
 object_annotation 
 . 
 segment 
 . 
 start_time_offset 
 . 
 microseconds 
 / 
 1e6 
 , 
 object_annotation 
 . 
 segment 
 . 
 end_time_offset 
 . 
 seconds 
 + 
 object_annotation 
 . 
 segment 
 . 
 end_time_offset 
 . 
 microseconds 
 / 
 1e6 
 , 
 ) 
 ) 
 print 
 ( 
 "Confidence: 
 {} 
 " 
 . 
 format 
 ( 
 object_annotation 
 . 
 confidence 
 )) 
 # Here we print only the bounding box of the first frame in the segment 
 frame 
 = 
 object_annotation 
 . 
 frames 
 [ 
 0 
 ] 
 box 
 = 
 frame 
 . 
 normalized_bounding_box 
 print 
 ( 
 "Time offset of the first frame: 
 {} 
 s" 
 . 
 format 
 ( 
 frame 
 . 
 time_offset 
 . 
 seconds 
 + 
 frame 
 . 
 time_offset 
 . 
 microseconds 
 / 
 1e6 
 ) 
 ) 
 print 
 ( 
 "Bounding box position:" 
 ) 
 print 
 ( 
 " 
 \t 
 left  : 
 {} 
 " 
 . 
 format 
 ( 
 box 
 . 
 left 
 )) 
 print 
 ( 
 " 
 \t 
 top   : 
 {} 
 " 
 . 
 format 
 ( 
 box 
 . 
 top 
 )) 
 print 
 ( 
 " 
 \t 
 right : 
 {} 
 " 
 . 
 format 
 ( 
 box 
 . 
 right 
 )) 
 print 
 ( 
 " 
 \t 
 bottom: 
 {} 
 " 
 . 
 format 
 ( 
 box 
 . 
 bottom 
 )) 
 print 
 ( 
 " 
 \n 
 " 
 ) 
 

Additional languages

C#: Please follow the C# setup instructions on the client libraries page and then visit the Video Intelligence reference documentation for .NET.

PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Video Intelligence reference documentation for PHP.

Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Video Intelligence reference documentation for Ruby.

Request object tracking for video from a local file

The following samples demonstrate object tracking on a file stored locally.

REST

Send the process request

To perform annotation on a local video file, base64-encode the contents of the video file. Include the base64-encoded contents in the inputContent field of the request. For information on how to base64-encode the contents of a video file, see Base64 Encoding .

The following shows how to send a POST request to the videos:annotate method. The example uses the access token for a service account set up for the project using the Google Cloud CLI. For instructions on installing the Google Cloud CLI, setting up a project with a service account, and obtaining an access token, see the Video Intelligence quickstart .

Before using any of the request data, make the following replacements:

  • inputContent : BASE64_ENCODED_CONTENT
    For example: "UklGRg41AwBBVkkgTElTVAwBAABoZHJsYXZpaDgAAAA1ggAAxPMBAAAAAAAQCAA..."
  • PROJECT_NUMBER : The numeric identifier for your Google Cloud project

HTTP method and URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

Request JSON body:

{
  "inputContent": " BASE64_ENCODED_CONTENT 
",
  "features": ["OBJECT_TRACKING"]
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

If the request is successful, Video Intelligence the name for your operation. The following shows an example of such a response, where PROJECT_NUMBER is the number of your project and OPERATION_ID is the ID of the long-running operation created for the request.

Get the results

To get the results of your request, you must send a GET , using the operation name returned from the call to videos:annotate , as shown in the following example.

Before using any of the request data, make the following replacements:

  • OPERATION_NAME : the name of the operation as returned by the Video Intelligence API. The operation name has the format projects/ PROJECT_NUMBER /locations/ LOCATION_ID /operations/ OPERATION_ID
  • PROJECT_NUMBER : The numeric identifier for your Google Cloud project

HTTP method and URL:

GET https://videointelligence.googleapis.com/v1/ OPERATION_NAME 

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

Go

  import 
  
 ( 
  
 "context" 
  
 "fmt" 
  
 "io" 
  
 "os" 
  
 video 
  
 "cloud.google.com/go/videointelligence/apiv1" 
  
 videopb 
  
 "cloud.google.com/go/videointelligence/apiv1/videointelligencepb" 
  
 "github.com/golang/protobuf/ptypes" 
 ) 
 // objectTracking analyzes a video and extracts entities with their bounding boxes. 
 func 
  
 objectTracking 
 ( 
 w 
  
 io 
 . 
 Writer 
 , 
  
 filename 
  
 string 
 ) 
  
 error 
  
 { 
  
 // filename := "../testdata/cat.mp4" 
  
 ctx 
  
 := 
  
 context 
 . 
 Background 
 () 
  
 // Creates a client. 
  
 client 
 , 
  
 err 
  
 := 
  
 video 
 . 
  NewClient 
 
 ( 
 ctx 
 ) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 fmt 
 . 
 Errorf 
 ( 
 "video.NewClient: %w" 
 , 
  
 err 
 ) 
  
 } 
  
 defer 
  
 client 
 . 
  Close 
 
 () 
  
 fileBytes 
 , 
  
 err 
  
 := 
  
 os 
 . 
 ReadFile 
 ( 
 filename 
 ) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 err 
  
 } 
  
 op 
 , 
  
 err 
  
 := 
  
 client 
 . 
 AnnotateVideo 
 ( 
 ctx 
 , 
  
& videopb 
 . 
 AnnotateVideoRequest 
 { 
  
 InputContent 
 : 
  
 fileBytes 
 , 
  
 Features 
 : 
  
 [] 
 videopb 
 . 
 Feature 
 { 
  
 videopb 
 . 
 Feature_OBJECT_TRACKING 
 , 
  
 }, 
  
 }) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 fmt 
 . 
 Errorf 
 ( 
 "AnnotateVideo: %w" 
 , 
  
 err 
 ) 
  
 } 
  
 resp 
 , 
  
 err 
  
 := 
  
 op 
 . 
  Wait 
 
 ( 
 ctx 
 ) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 fmt 
 . 
 Errorf 
 ( 
 "Wait: %w" 
 , 
  
 err 
 ) 
  
 } 
  
 // Only one video was processed, so get the first result. 
  
 result 
  
 := 
  
 resp 
 . 
 GetAnnotationResults 
 ()[ 
 0 
 ] 
  
 for 
  
 _ 
 , 
  
 annotation 
  
 := 
  
 range 
  
 result 
 . 
 ObjectAnnotations 
  
 { 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "Description: %q\n" 
 , 
  
 annotation 
 . 
 Entity 
 . 
 GetDescription 
 ()) 
  
 if 
  
 len 
 ( 
 annotation 
 . 
 Entity 
 . 
 EntityId 
 ) 
 > 
 0 
  
 { 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\tEntity ID: %q\n" 
 , 
  
 annotation 
 . 
 Entity 
 . 
 GetEntityId 
 ()) 
  
 } 
  
 segment 
  
 := 
  
 annotation 
 . 
 GetSegment 
 () 
  
 start 
 , 
  
 _ 
  
 := 
  
 ptypes 
 . 
 Duration 
 ( 
 segment 
 . 
 GetStartTimeOffset 
 ()) 
  
 end 
 , 
  
 _ 
  
 := 
  
 ptypes 
 . 
 Duration 
 ( 
 segment 
 . 
 GetEndTimeOffset 
 ()) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\tSegment: %v to %v\n" 
 , 
  
 start 
 , 
  
 end 
 ) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\tConfidence: %f\n" 
 , 
  
 annotation 
 . 
 GetConfidence 
 ()) 
  
 // Here we print only the bounding box of the first frame in this segment. 
  
 frame 
  
 := 
  
 annotation 
 . 
 GetFrames 
 ()[ 
 0 
 ] 
  
 seconds 
  
 := 
  
 float32 
 ( 
 frame 
 . 
 GetTimeOffset 
 (). 
 GetSeconds 
 ()) 
  
 nanos 
  
 := 
  
 float32 
 ( 
 frame 
 . 
 GetTimeOffset 
 (). 
 GetNanos 
 ()) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\tTime offset of the first frame: %fs\n" 
 , 
  
 seconds 
 + 
 nanos 
 / 
 1e9 
 ) 
  
 box 
  
 := 
  
 frame 
 . 
 GetNormalizedBoundingBox 
 () 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\tBounding box position:\n" 
 ) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\t\tleft  : %f\n" 
 , 
  
 box 
 . 
 GetLeft 
 ()) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\t\ttop   : %f\n" 
 , 
  
 box 
 . 
 GetTop 
 ()) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\t\tright : %f\n" 
 , 
  
 box 
 . 
 GetRight 
 ()) 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "\t\tbottom: %f\n" 
 , 
  
 box 
 . 
 GetBottom 
 ()) 
  
 } 
  
 return 
  
 nil 
 } 
 

Java

  /** 
 * Track objects in a video. 
 * 
 * @param filePath the path to the video file to analyze. 
 */ 
 public 
  
 static 
  
 VideoAnnotationResults 
  
 trackObjects 
 ( 
 String 
  
 filePath 
 ) 
  
 throws 
  
 Exception 
  
 { 
  
 try 
  
 ( 
 VideoIntelligenceServiceClient 
  
 client 
  
 = 
  
 VideoIntelligenceServiceClient 
 . 
 create 
 ()) 
  
 { 
  
 // Read file 
  
 Path 
  
 path 
  
 = 
  
 Paths 
 . 
 get 
 ( 
 filePath 
 ); 
  
 byte 
 [] 
  
 data 
  
 = 
  
 Files 
 . 
 readAllBytes 
 ( 
 path 
 ); 
  
 // Create the request 
  
 AnnotateVideoRequest 
  
 request 
  
 = 
  
 AnnotateVideoRequest 
 . 
 newBuilder 
 () 
  
 . 
 setInputContent 
 ( 
 ByteString 
 . 
 copyFrom 
 ( 
 data 
 )) 
  
 . 
 addFeatures 
 ( 
 Feature 
 . 
 OBJECT_TRACKING 
 ) 
  
 . 
 setLocationId 
 ( 
 "us-east1" 
 ) 
  
 . 
 build 
 (); 
  
 // asynchronously perform object tracking on videos 
  
 OperationFuture<AnnotateVideoResponse 
 , 
  
 AnnotateVideoProgress 
>  
 future 
  
 = 
  
 client 
 . 
 annotateVideoAsync 
 ( 
 request 
 ); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Waiting for operation to complete..." 
 ); 
  
 // The first result is retrieved because a single video was processed. 
  
 AnnotateVideoResponse 
  
 response 
  
 = 
  
 future 
 . 
 get 
 ( 
 450 
 , 
  
 TimeUnit 
 . 
 SECONDS 
 ); 
  
 VideoAnnotationResults 
  
 results 
  
 = 
  
 response 
 . 
 getAnnotationResults 
 ( 
 0 
 ); 
  
 // Get only the first annotation for demo purposes. 
  
 ObjectTrackingAnnotation 
  
 annotation 
  
 = 
  
 results 
 . 
 getObjectAnnotations 
 ( 
 0 
 ); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Confidence: " 
  
 + 
  
 annotation 
 . 
 getConfidence 
 ()); 
  
 if 
  
 ( 
 annotation 
 . 
 hasEntity 
 ()) 
  
 { 
  
 Entity 
  
 entity 
  
 = 
  
 annotation 
 . 
 getEntity 
 (); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Entity description: " 
  
 + 
  
 entity 
 . 
 getDescription 
 ()); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Entity id:: " 
  
 + 
  
 entity 
 . 
 getEntityId 
 ()); 
  
 } 
  
 if 
  
 ( 
 annotation 
 . 
 hasSegment 
 ()) 
  
 { 
  
 VideoSegment 
  
 videoSegment 
  
 = 
  
 annotation 
 . 
 getSegment 
 (); 
  
 Duration 
  
 startTimeOffset 
  
 = 
  
 videoSegment 
 . 
 getStartTimeOffset 
 (); 
  
 Duration 
  
 endTimeOffset 
  
 = 
  
 videoSegment 
 . 
 getEndTimeOffset 
 (); 
  
 // Display the segment time in seconds, 1e9 converts nanos to seconds 
  
 System 
 . 
 out 
 . 
 println 
 ( 
  
 String 
 . 
 format 
 ( 
  
 "Segment: %.2fs to %.2fs" 
 , 
  
 startTimeOffset 
 . 
 getSeconds 
 () 
  
 + 
  
 startTimeOffset 
 . 
 getNanos 
 () 
  
 / 
  
 1e9 
 , 
  
 endTimeOffset 
 . 
 getSeconds 
 () 
  
 + 
  
 endTimeOffset 
 . 
 getNanos 
 () 
  
 / 
  
 1e9 
 )); 
  
 } 
  
 // Here we print only the bounding box of the first frame in this segment. 
  
 ObjectTrackingFrame 
  
 frame 
  
 = 
  
 annotation 
 . 
 getFrames 
 ( 
 0 
 ); 
  
 // Display the offset time in seconds, 1e9 converts nanos to seconds 
  
 Duration 
  
 timeOffset 
  
 = 
  
 frame 
 . 
 getTimeOffset 
 (); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
  
 String 
 . 
 format 
 ( 
  
 "Time offset of the first frame: %.2fs" 
 , 
  
 timeOffset 
 . 
 getSeconds 
 () 
  
 + 
  
 timeOffset 
 . 
 getNanos 
 () 
  
 / 
  
 1e9 
 )); 
  
 // Display the bounding box of the detected object 
  
 NormalizedBoundingBox 
  
 normalizedBoundingBox 
  
 = 
  
 frame 
 . 
 getNormalizedBoundingBox 
 (); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Bounding box position:" 
 ); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "\tleft: " 
  
 + 
  
 normalizedBoundingBox 
 . 
 getLeft 
 ()); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "\ttop: " 
  
 + 
  
 normalizedBoundingBox 
 . 
 getTop 
 ()); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "\tright: " 
  
 + 
  
 normalizedBoundingBox 
 . 
 getRight 
 ()); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "\tbottom: " 
  
 + 
  
 normalizedBoundingBox 
 . 
 getBottom 
 ()); 
  
 return 
  
 results 
 ; 
  
 } 
 } 
 

Node.js

To authenticate to Video Intelligence, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  // Imports the Google Cloud Video Intelligence library 
 const 
  
 Video 
  
 = 
  
 require 
 ( 
 ' @google-cloud/video-intelligence 
' 
 ); 
 const 
  
 fs 
  
 = 
  
 require 
 ( 
 'fs' 
 ); 
 const 
  
 util 
  
 = 
  
 require 
 ( 
 'util' 
 ); 
 // Creates a client 
 const 
  
 video 
  
 = 
  
 new 
  
 Video 
 . 
  VideoIntelligenceServiceClient 
 
 (); 
 /** 
 * TODO(developer): Uncomment the following line before running the sample. 
 */ 
 // const path = 'Local file to analyze, e.g. ./my-file.mp4'; 
 // Reads a local video file and converts it to base64 
 const 
  
 file 
  
 = 
  
 await 
  
 util 
 . 
 promisify 
 ( 
 fs 
 . 
 readFile 
 )( 
 path 
 ); 
 const 
  
 inputContent 
  
 = 
  
 file 
 . 
 toString 
 ( 
 'base64' 
 ); 
 const 
  
 request 
  
 = 
  
 { 
  
 inputContent 
 : 
  
 inputContent 
 , 
  
 features 
 : 
  
 [ 
 'OBJECT_TRACKING' 
 ], 
  
 //recommended to use us-east1 for the best latency due to different types of processors used in this region and others 
  
 locationId 
 : 
  
 'us-east1' 
 , 
 }; 
 // Detects objects in a video 
 const 
  
 [ 
 operation 
 ] 
  
 = 
  
 await 
  
 video 
 . 
 annotateVideo 
 ( 
 request 
 ); 
 const 
  
 results 
  
 = 
  
 await 
  
 operation 
 . 
 promise 
 (); 
 console 
 . 
 log 
 ( 
 'Waiting for operation to complete...' 
 ); 
 //Gets annotations for video 
 const 
  
 annotations 
  
 = 
  
 results 
 [ 
 0 
 ]. 
 annotationResults 
 [ 
 0 
 ]; 
 const 
  
 objects 
  
 = 
  
 annotations 
 . 
 objectAnnotations 
 ; 
 objects 
 . 
 forEach 
 ( 
 object 
  
 = 
>  
 { 
  
 console 
 . 
 log 
 ( 
 `Entity description: 
 ${ 
 object 
 . 
 entity 
 . 
 description 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 `Entity id: 
 ${ 
 object 
 . 
 entity 
 . 
 entityId 
 } 
 ` 
 ); 
  
 const 
  
 time 
  
 = 
  
 object 
 . 
 segment 
 ; 
  
 console 
 . 
 log 
 ( 
  
 `Segment: 
 ${ 
 time 
 . 
 startTimeOffset 
 . 
 seconds 
  
 || 
  
 0 
 } 
 ` 
  
 + 
  
 `. 
 ${ 
 ( 
 time 
 . 
 startTimeOffset 
 . 
 nanos 
  
 / 
  
 1e6 
 ). 
 toFixed 
 ( 
 0 
 ) 
 } 
 s to 
 ${ 
  
 time 
 . 
 endTimeOffset 
 . 
 seconds 
  
 || 
  
 0 
  
 } 
 .` 
  
 + 
  
 ` 
 ${ 
 ( 
 time 
 . 
 endTimeOffset 
 . 
 nanos 
  
 / 
  
 1e6 
 ). 
 toFixed 
 ( 
 0 
 ) 
 } 
 s` 
  
 ); 
  
 console 
 . 
 log 
 ( 
 `Confidence: 
 ${ 
 object 
 . 
 confidence 
 } 
 ` 
 ); 
  
 const 
  
 frame 
  
 = 
  
 object 
 . 
 frames 
 [ 
 0 
 ]; 
  
 const 
  
 box 
  
 = 
  
 frame 
 . 
 normalizedBoundingBox 
 ; 
  
 const 
  
 timeOffset 
  
 = 
  
 frame 
 . 
 timeOffset 
 ; 
  
 console 
 . 
 log 
 ( 
  
 `Time offset for the first frame: 
 ${ 
 timeOffset 
 . 
 seconds 
  
 || 
  
 0 
 } 
 ` 
  
 + 
  
 `. 
 ${ 
 ( 
 timeOffset 
 . 
 nanos 
  
 / 
  
 1e6 
 ). 
 toFixed 
 ( 
 0 
 ) 
 } 
 s` 
  
 ); 
  
 console 
 . 
 log 
 ( 
 'Bounding box position:' 
 ); 
  
 console 
 . 
 log 
 ( 
 ` left   : 
 ${ 
 box 
 . 
 left 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 ` top    : 
 ${ 
 box 
 . 
 top 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 ` right  : 
 ${ 
 box 
 . 
 right 
 } 
 ` 
 ); 
  
 console 
 . 
 log 
 ( 
 ` bottom : 
 ${ 
 box 
 . 
 bottom 
 } 
 ` 
 ); 
 }); 
 

Python

  """Object tracking in a local video.""" 
 from 
  
 google.cloud 
  
 import 
 videointelligence 
 video_client 
 = 
 videointelligence 
 . 
  VideoIntelligenceServiceClient 
 
 () 
 features 
 = 
 [ 
 videointelligence 
 . 
  Feature 
 
 . 
 OBJECT_TRACKING 
 ] 
 with 
 io 
 . 
 open 
 ( 
 path 
 , 
 "rb" 
 ) 
 as 
 file 
 : 
 input_content 
 = 
 file 
 . 
 read 
 () 
 operation 
 = 
 video_client 
 . 
  annotate_video 
 
 ( 
 request 
 = 
 { 
 "features" 
 : 
 features 
 , 
 "input_content" 
 : 
 input_content 
 } 
 ) 
 print 
 ( 
 " 
 \n 
 Processing video for object annotations." 
 ) 
 result 
 = 
 operation 
 . 
 result 
 ( 
 timeout 
 = 
 500 
 ) 
 print 
 ( 
 " 
 \n 
 Finished processing. 
 \n 
 " 
 ) 
 # The first result is retrieved because a single video was processed. 
 object_annotations 
 = 
 result 
 . 
 annotation_results 
 [ 
 0 
 ] 
 . 
 object_annotations 
 # Get only the first annotation for demo purposes. 
 object_annotation 
 = 
 object_annotations 
 [ 
 0 
 ] 
 print 
 ( 
 "Entity description: 
 {} 
 " 
 . 
 format 
 ( 
 object_annotation 
 . 
 entity 
 . 
 description 
 )) 
 if 
 object_annotation 
 . 
 entity 
 . 
 entity_id 
 : 
 print 
 ( 
 "Entity id: 
 {} 
 " 
 . 
 format 
 ( 
 object_annotation 
 . 
 entity 
 . 
 entity_id 
 )) 
 print 
 ( 
 "Segment: 
 {} 
 s to 
 {} 
 s" 
 . 
 format 
 ( 
 object_annotation 
 . 
 segment 
 . 
 start_time_offset 
 . 
 seconds 
 + 
 object_annotation 
 . 
 segment 
 . 
 start_time_offset 
 . 
 microseconds 
 / 
 1e6 
 , 
 object_annotation 
 . 
 segment 
 . 
 end_time_offset 
 . 
 seconds 
 + 
 object_annotation 
 . 
 segment 
 . 
 end_time_offset 
 . 
 microseconds 
 / 
 1e6 
 , 
 ) 
 ) 
 print 
 ( 
 "Confidence: 
 {} 
 " 
 . 
 format 
 ( 
 object_annotation 
 . 
 confidence 
 )) 
 # Here we print only the bounding box of the first frame in this segment 
 frame 
 = 
 object_annotation 
 . 
 frames 
 [ 
 0 
 ] 
 box 
 = 
 frame 
 . 
 normalized_bounding_box 
 print 
 ( 
 "Time offset of the first frame: 
 {} 
 s" 
 . 
 format 
 ( 
 frame 
 . 
 time_offset 
 . 
 seconds 
 + 
 frame 
 . 
 time_offset 
 . 
 microseconds 
 / 
 1e6 
 ) 
 ) 
 print 
 ( 
 "Bounding box position:" 
 ) 
 print 
 ( 
 " 
 \t 
 left  : 
 {} 
 " 
 . 
 format 
 ( 
 box 
 . 
 left 
 )) 
 print 
 ( 
 " 
 \t 
 top   : 
 {} 
 " 
 . 
 format 
 ( 
 box 
 . 
 top 
 )) 
 print 
 ( 
 " 
 \t 
 right : 
 {} 
 " 
 . 
 format 
 ( 
 box 
 . 
 right 
 )) 
 print 
 ( 
 " 
 \t 
 bottom: 
 {} 
 " 
 . 
 format 
 ( 
 box 
 . 
 bottom 
 )) 
 print 
 ( 
 " 
 \n 
 " 
 ) 
 

Additional languages

C#: Please follow the C# setup instructions on the client libraries page and then visit the Video Intelligence reference documentation for .NET.

PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Video Intelligence reference documentation for PHP.

Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Video Intelligence reference documentation for Ruby.

Create a Mobile Website
View Site in Mobile | Classic
Share by: