This page describes the supported features and limitations for Document AI Warehouse.
Key features
Feature
Description
Supports
Controls who has access to which resource in Document AI Warehouse, and what
level of access they have.
A document schema defines the structure for a document type (for example,
Invoice or Pay Stub) in Document AI Warehouse, where admins can specify properties
of different data types (Text | Numeric | Date | Enumeration).
Provides operations to create, fetch, update, and delete documents.
Document AI Warehouse uses documents as a data model to organize real world documents,
for example, PDF or .txt and their associated properties.
A folder serves as a container to group and label documents. Users can attach
a document to multiple folders and a folder can contain multiple documents.
- Full-text search (Text search)
- It provides the capability to identify natural-language documents that satisfy a query and optionally to sort them by relevance to the query. Using Document AI Warehouse, customers can specify their query in string format in the search request.
- Property filtering (Customer metadata filtering)
- Mark a property filterable if you want to use that property to include or exclude a portion of documents for a search. For example, you might make a property that represents a "Vendor" filterable because your users want to search for invoices from a specific vendor.
Document AI Warehouse provides a feature called "Custom Synonyms" that enables
customers to provide their own synonyms for their specific domains
Files supported
Full details for formats supported and MIME types .
Format | API supported | UI manual upload | UI render | raw_document_file_type
/ content_category used
|
---|---|---|---|---|
Joint Photographic Experts Group (jpeg/jpg)
|
CONTENT_CATEGORY_IMAGE
|
|||
Tag Image File Format (tif/tiff)
|
Files should be uploaded manually as TIFF files | RAW_DOCUMENT_FILE_TYPE_TIFF
|
||
Microsoft Word (doc/docx)
|
Files should be uploaded manually as docx files. | RAW_DOCUMENT_FILE_TYPE_DOCX
|
||
Microsoft Excel files (xls/xlsx)
|
RAW_DOCUMENT_FILE_TYPE_XLSX
|
|||
Microsoft PowerPoint files (ppt/pptx)
|
RAW_DOCUMENT_FILE_TYPE_PPTX
|
|||
Portable Document Format (pdf)
|
RAW_DOCUMENT_FILE_TYPE_PDF
|
|||
Plain text (txt)
|
RAW_DOCUMENT_FILE_TYPE_TEXT
|
|||
Portable Network Graphics (png)
|
CONTENT_CATEGORY_IMAGE
|
|||
Bitmap (bmp)
|
CONTENT_CATEGORY_IMAGE
|
|||
Graphics Interchange Format (gif)
|
CONTENT_CATEGORY_IMAGE
|
|||
Hypertext (html)
|
RAW_DOCUMENT_FILE_TYPE_TEXT
|
|||
XML (xml)
|
RAW_DOCUMENT_FILE_TYPE_TEXT
|
|||
Rich Text Format (rtf)
|
RAW_DOCUMENT_FILE_TYPE_UNSPECIFIED
|
Provisioning
Feature | Stable | Regular | Rapid |
---|---|---|---|
Working with documents
Feature | Stable | Regular | Rapid |
---|---|---|---|
API client libraries
Client libraries for Document AI Warehouse help support writing custom code that integrates with Google Cloud. All services are accessible through the client libraries.
Library | Stable | Regular | Rapid |
---|---|---|---|
Java
|
|||
Python
|