Vision API currently allows you to use the following features:
- Optical character recognition (OCR) for an image; text recognition and conversion to machine-coded text. Identifies and extracts UTF-8 text in an image.
- Images : Optimized for sparse areas of text within a larger image.
- Response
: Returns both a list of words identifed with text,
bounding boxes, and
textAnnotations
, as well as the structural hierarchy for the OCR detected text (fullTextAnnotation
). - Hierarchy of extracted text structure:
- TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol.
- Each structural component from Page on may further have their own properties such as detected languages, breaks, etc.
- Languages supported : Works with currently supported, mapped, and experimental languages.
- Feature enum value:
TEXT_DETECTION
.
- Optical character recognition (OCR) for a file (PDF/TIFF) or dense text image; dense text recognition and conversion to machine-coded text.
- Files : Optimized for document files (PDF/TIFF).
- Images : Optimized for dense areas of text in an image (images that are documents), and images that contain handwriting.
- Response
: Returns the structural hierarchy for the OCR detected
text (
fullTextAnnotation
). - Hierarchy of extracted text structure:
- TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol.
- Each structural component from Page on may further have their own properties such as detected languages, breaks, etc.
- Languages supported : Works with currently supported, mapped, and experimental languages.
- Feature enum value:
DOCUMENT_TEXT_DETECTION
. - Takes precedence when both
DOCUMENT_TEXT_DETECTION
andTEXT_DETECTION
are requested.
- Provides the name of the landmark, a confidence score and a bounding box in the image for the landmark.
- Gives coordinates for the detected entity.
- Provides a textual description of the entity identified, a confidence score, and a bounding polygon for the logo in the file.
- Provides generalized labels for an image.
- For each label returns a textual description, confidence score, and topicality rating.
- Returns dominant colors in an image.
- Each color is represented in the RGBA color space, has a confidence score, and displays the fraction of pixels occupied by the color [0, 1].
- Provides general label and bounding box annotations for multiple objects recognized in a single image.
- For each object detected the following elements are returned: a textual description, a confidence score, and normalized vertices [0,1] for the bounding polygon around the object.
- Provides a bounding polygon for the cropped image, a confidence score, and an importance fraction of this salient region with respect to the original image for each request.
- You can provide up to 16 image ratio values (width:height) for a single image.
- Provides a series of related Web content to an image.
- Returns the following information:
- Web entities : Inferred entities (labels/descriptions) from similar images on the Web.
- Full matching images : A list of URLs for fully matching images of any size on the Internet.
- Partial matching images : A list of URLs for images that share key-point features, such as a cropped version of the original image.
- Pages with matching images : A list of Webpages (identified by page URL, page title, matching image URL) with an image that satisfies the conditions described above.
- Visually similar images : A list of URLs for images that share some features with the original image.
- Best guess label : A best guess as to the topic of the requested image inferred from similar images on the Internet.
- Provides likelihood ratings for the following explicit content
categories:
adult
,spoof
,medical
,violence
, andracy
. - Likelihoods ratings are expressed
as 6 different values:
UNKNOWN
,VERY_UNLIKELY
,UNLIKELY
,POSSIBLE
,LIKELY
, orVERY_LIKELY
.
- Locates faces with bounding polygons, and identifies specific facial "landmarks" such as eyes, ears, nose, mouth, etc. along with their corresponding confidence values.
- Returns likelihood ratings for emotion (joy, sorrow, anger, surprise) and general image properties (underexposed, blurred, headwear present).
- Likelihoods ratings are expressed
as 6 different values:
UNKNOWN
,VERY_UNLIKELY
,UNLIKELY
,POSSIBLE
,LIKELY
, orVERY_LIKELY
. - Specific individual Facial Recognition is not supported.
1. Image credit : Nikolay Vorobyev on Unsplash ( annotations added ). ↩
2. Image credit : Robert Scoble ( CC BY 2.0 , annotation added ). ↩
3. Image credit : Alex Knight on Unsplash . ↩
4. Image credit : Jeremy Bishop on Unsplash . ↩
5. Image credit: Bogdan Dada on Unsplash ( annotations added ). ↩
6. Image credit : Yasmin Dangor on Unsplash ( original and cropped image shown ). ↩
7. Image credit : Quinten de Graaf on Unsplash . ↩