The Vision API can run offline (asynchronous) detection services and
annotation of a large batch of image files using any Vision feature type 
. For example, you
can specify one or more Vision API features (such as TEXT_DETECTION 
, LABEL_DETECTION 
,
and LANDMARK_DETECTION 
) for a single batch of images.
Output from an offline batch request is written to a JSON file created in the specified Cloud Storage bucket.
-  online (synchronous) 
requests - An online annotation request ( images:annotateorfiles:annotate) immediately returns inline annotations to the user. Online annotation requests limit the number of files you can annotate in a single request. With animages:annotaterequest you can only specify a small number of images (<=16) to be annotated ; with afiles:annotaterequest you can only specify a single file and specify a small number of pages (<=5) in that file to be annotated.
-  offline (asynchronous) 
requests - An offline annotation request
      ( images:asyncBatchAnnotateorfiles:asyncBatchAnnotate) starts a long-running operation (LRO) and does not immediately return a response to the caller. When the LRO completes, annotations are stored as files in a Cloud Storage bucket you specify. Aimages:asyncBatchAnnotaterequest allows you to specify up to 2000 images per request ; afiles:asyncBatchAnnotaterequest allows you to specify larger batches of files and can specify more pages (<=2000) per file for annotation at a single time than you are able to with online requests.
Limitations
The Vision API accepts up to 2,000 image files. A larger batch of image files will return an error.
Currently supported feature types
| Feature type | |
|---|---|
| CROP_HINTS | Determine suggested vertices for a crop region on an image. | 
| DOCUMENT_TEXT_DETECTION | Perform OCR on dense text images, such as documents (PDF/TIFF), and images with
        handwriting. TEXT_DETECTIONcan be used for sparse text images.
        Takes precedence when bothDOCUMENT_TEXT_DETECTIONandTEXT_DETECTIONare present. | 
| FACE_DETECTION | Detect faces within the image. | 
| IMAGE_PROPERTIES | Compute a set of image properties, such as the image's dominant colors. | 
| LABEL_DETECTION | Add labels based on image content. | 
| LANDMARK_DETECTION | Detect geographic landmarks within the image. | 
| LOGO_DETECTION | Detect company logos within the image. | 
| OBJECT_LOCALIZATION | Detect and extract multiple objects in an image. | 
| SAFE_SEARCH_DETECTION | Run SafeSearch to detect potentially unsafe or undesirable content. | 
| TEXT_DETECTION | Perform Optical Character Recognition (OCR) on text within the image.
        Text detection is optimized for areas of sparse text within a larger image.
        If the image is a document (PDF/TIFF), has dense text, or contains handwriting,
        use DOCUMENT_TEXT_DETECTIONinstead. | 
| WEB_DETECTION | Detect topical entities such as news, events, or celebrities within the image, and find similar images on the web using the power of Google Image Search. | 
Sample code
Use the following code samples to run offline annotation services on a batch of image files in Cloud Storage.
Java
Before trying this sample, follow the Java setup instructions in the Vision API Quickstart Using Client Libraries . For more information, see the Vision API Java reference documentation .
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vision quickstart using client libraries . For more information, see the Vision Node.js API reference documentation .
To authenticate to Vision, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Python
Before trying this sample, follow the Python setup instructions in the Vision quickstart using client libraries . For more information, see the Vision Python API reference documentation .
To authenticate to Vision, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .
Response
A successful request returns response JSON files in the Cloud Storage
bucket you indicated in the code sample. The number of responses per JSON file
is dictated by batch_size 
in the code sample.
The returned response is similar to regular Vision API feature responses, depending on which features you request for an image.
The following responses
show LABEL_DETECTION 
and TEXT_DETECTION 
annotations for image1.png 
, IMAGE_PROPERTIES 
annotations for image2.jpg 
, and OBJECT_LOCALIZATION 
annotations for image3.jpg 
.
The response also contain a context 
field showing
the file's URI.
 offline_batch_output/output-1-to-2.json 
 
 { "responses" : [ { " labelAnnotations " : [ { "mid" : "/m/07s6nbt" , "description" : "Text" , "score" : 0.93413997 , "topicality" : 0.93413997 }, { "mid" : "/m/0dwx7" , "description" : "Logo" , "score" : 0.8733531 , "topicality" : 0.8733531 }, ... { "mid" : "/m/03bxgrp" , "description" : "Company" , "score" : 0.5682425 , "topicality" : 0.5682425 } ], " textAnnotations " : [ { "locale" : "en" , "description" : "Google\n" , "boundingPoly" : { "vertices" : [ { "x" : 72 , "y" : 40 }, { "x" : 613 , "y" : 40 }, { "x" : 613 , "y" : 233 }, { "x" : 72 , "y" : 233 } ] } }, ... ], "blockType" : "TEXT" } ] } ], "text" : "Google\n" }, "context" : { "uri" : " gs://cloud-samples-data/vision/document_understanding/image1.png " } }, { " imagePropertiesAnnotation " : { "dominantColors" : { "colors" : [ { "color" : { "red" : 229 , "green" : 230 , "blue" : 238 }, "score" : 0.2744754 , "pixelFraction" : 0.075339235 }, ... { "color" : { "red" : 86 , "green" : 87 , "blue" : 95 }, "score" : 0.025770646 , "pixelFraction" : 0.13109145 } ] } }, "cropHintsAnnotation" : { "cropHints" : [ { "boundingPoly" : { "vertices" : [ {}, { "x" : 1599 }, { "x" : 1599 , "y" : 1199 }, { "y" : 1199 } ] }, "confidence" : 0.79999995 , "importanceFraction" : 1 } ] }, "context" : { "uri" : " gs://cloud-samples-data/vision/document_understanding/image2.jpg " } } ] }
 offline_batch_output/output-3-to-3.json 
 
 { "responses" : [ { "context" : { "uri" : " gs://cloud-samples-data/vision/document_understanding/image3.jpg " }, " localizedObjectAnnotations " : [ { "mid" : "/m/0bt9lr" , "name" : "Dog" , "score" : 0.9669734 , "boundingPoly" : { "normalizedVertices" : [ { "x" : 0.6035543 , "y" : 0.1357359 }, { "x" : 0.98546547 , "y" : 0.1357359 }, { "x" : 0.98546547 , "y" : 0.98426414 }, { "x" : 0.6035543 , "y" : 0.98426414 } ] } }, ... { "mid" : "/m/0jbk" , "name" : "Animal" , "score" : 0.58003056 , "boundingPoly" : { "normalizedVertices" : [ { "x" : 0.014534635 , "y" : 0.1357359 }, { "x" : 0.37197515 , "y" : 0.1357359 }, { "x" : 0.37197515 , "y" : 0.98426414 }, { "x" : 0.014534635 , "y" : 0.98426414 } ] } } ] } ] }

