File formats
The Vision API supports the following image types:
- JPEG
- PNG8
- PNG24
- GIF
- Animated GIF (first frame only)
- BMP
- WEBP
- RAW
- ICO
- TIFF
Note that some of these image formats are "lossy" (for example, JPEG). Reducing file sizes for such lossy formats may result in a degradation of image quality, and hence, Vision API accuracy.
Image sizing
To enable accurate image detection within the Vision API, images should generally be a minimum of 640 x 480 pixels (about 300k pixels). Full details for different types of Vision API Feature requests are shown below:
Vision API Feature | Recommended Size * | Notes |
---|---|---|
FACE_DETECTION
|
1600 x 1200 | Distance between eyes is most important |
LANDMARK_DETECTION
|
640 x 480 | |
LOGO_DETECTION
|
640 x 480 | |
LABEL_DETECTION
|
640 x 480 | |
TEXT_DETECTION and DOCUMENT_TEXT_DETECTION
|
1024 x 768 | OCR requires more resolution to detect characters |
SAFE_SEARCH_DETECTION
|
640 x 480 |
These recommended sizes differ based on the feature being detected. For example, FACE_DETECTION
requests generally requires larger image sizes because the
features being detected (faces) are smaller than the image itself. LABEL_DETECTION
requests, on the other hand, generally evaluate an entire
image.
In practice, a standard size of 640 x 480 pixels works well in most cases; sizes larger than this may not gain much in accuracy, while greatly diminishing throughput. When at all possible, pre-process your images to reduce their size to these minimum standards.
File size
Image files sent to the Vision API should not exceed 20MB. Files exceeding 20MB generate an error. The Vision API does not resize files of this size. Reducing your file size can significantly improve throughput; however, be careful not to reduce image quality in the process. Note that the Vision API imposes a 10MB JSON request size limit; larger files should be hosted on Cloud Storage or on the web, rather than being passed as base64-encoded content in the JSON itself.