The Process-with-Document AI pipeline allows users to process existing documents with a Document AI processor and update the document properties with the newly extracted entities.
Prerequisites
Before you begin, you need the following:
-
A Document AI processor ready under the same Google Cloud project.
- If you don't have a processor, follow the steps to create one . You can choose to create any type as long as the processor type matches the document type.
-
Dedicated Cloud Storage folders for storing exported documents and processed documents.
- Make sure the folders are empty before you start the pipeline.
-
A schema with mappings between Document AI entities and Document AI Warehouse properties.
-
The newly extracted entities might not be correctly converted to Document AI Warehouse entities without such a mapping.
-
To add mappings to the schema, follow set schemas with mapping .
-
Run the pipeline
REST
curl
--location
--request
POST
'https://contentwarehouse.googleapis.com/v1/projects/ PROJECT_NUMBER
/locations/ LOCATION
:runPipeline'
\
--header
'Content-Type: application/json'
\
--header
"Authorization: Bearer
${
AUTH_TOKEN
}
"
\
--data
'{
"name": "projects/ PROJECT_NUMBER
/locations/ LOCATION
",
"process_with_doc_ai_pipeline": {
"documents": [
"projects/ PROJECT_NUMBER
/locations/ LOCATION
/documents/ DOCUMENT
"
],
"export_folder_path": "gs:// EXPORT_FOLDER
",
"processor_info": {
"processor_name": "projects/ PROJECT_NUMBER
/locations/ LOCATION
/processors/ PROCESSOR
"
},
"processor_results_folder_path": "gs:// PROCESS_FOLDER
"
},
"request_metadata": {
"user_info": {
"id": "user: USER EMAIL ADDRESS
"
}
}
}'
The documents
list is the resource names of the documents to be processed. The Cloud Storage folder path export_folder_path
is used to store the exported documents before being sent to the processor. For more information about the request body fields, refer to the API documentation
.
This command returns a resource name for a long-running operation. With this resource name, you can track the progress of the pipeline by following the next step.
Get long-running operation result
REST
curl
--location
--request
GET
'https://contentwarehouse.googleapis.com/v1/projects/ PROJECT_NUMBER
/locations/ LOCATION
/operations/ OPERATION
'
\
--header
"Authorization: Bearer
${
AUTH_TOKEN
}
"
Next steps
Go to Document AI Warehouse UI or call the document:get API to check if documents are successfully updated.

