Try Gemini 1.5 models , the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window.

Process-with-Document-AI pipeline

The Process-with-Document AI pipeline allows users to process existing documents with a Document AI processor and update the document properties with the newly extracted entities.

Prerequisites

Before you begin, you need the following:

A Document AI processor ready under the same Google Cloud project.
- If you don't have a processor, follow the steps to create one . You can choose to create any type as long as the processor type matches the document type.
Dedicated Cloud Storage folders for storing exported documents and processed documents.
- Make sure the folders are empty before you start the pipeline.
A schema with mappings between Document AI entities and Document AI Warehouse properties.
- The newly extracted entities might not be correctly converted to Document AI Warehouse entities without such a mapping.
- To add mappings to the schema, follow set schemas with mapping .

Run the pipeline

REST

 curl  
--location  
--request  
POST  
 'https://contentwarehouse.googleapis.com/v1/projects/ PROJECT_NUMBER 
/locations/ LOCATION 
:runPipeline' 
  
 \ 
--header  
 'Content-Type: application/json' 
  
 \ 
--header  
 "Authorization: Bearer 
 ${ 
 AUTH_TOKEN 
 } 
 " 
  
 \ 
--data  
 '{ 
 "name": "projects/ PROJECT_NUMBER 
/locations/ LOCATION 
", 
 "process_with_doc_ai_pipeline": { 
 "documents": [ 
 "projects/ PROJECT_NUMBER 
/locations/ LOCATION 
/documents/ DOCUMENT 
" 
 ], 
 "export_folder_path": "gs:// EXPORT_FOLDER 
", 
 "processor_info": { 
 "processor_name": "projects/ PROJECT_NUMBER 
/locations/ LOCATION 
/processors/ PROCESSOR 
" 
 }, 
 "processor_results_folder_path": "gs:// PROCESS_FOLDER 
" 
 }, 
 "request_metadata": { 
 "user_info": { 
 "id": "user: USER EMAIL ADDRESS 
" 
 } 
 } 
 }'

The documents list is the resource names of the documents to be processed. The Cloud Storage folder path export_folder_path is used to store the exported documents before being sent to the processor. For more information about the request body fields, refer to the API documentation .

This command returns a resource name for a long-running operation. With this resource name, you can track the progress of the pipeline by following the next step.

Get long-running operation result

REST

 curl  
--location  
--request  
GET  
 'https://contentwarehouse.googleapis.com/v1/projects/ PROJECT_NUMBER 
/locations/ LOCATION 
/operations/ OPERATION 
' 
  
 \ 
--header  
 "Authorization: Bearer 
 ${ 
 AUTH_TOKEN 
 } 
 "

Next steps

Go to Document AI Warehouse UI or call the document:get API to check if documents are successfully updated.