Document AI: Node.js Client

release level npm version

Document AI client for Node.js

A comprehensive list of changes in each version may be found in the CHANGELOG .

Read more about the client libraries for Cloud APIs, including the older Google APIs Client Libraries, in Client Libraries Explained .

Table of contents:

Quickstart

Before you begin

  1. Select or create a Cloud Platform project .
  2. Enable billing for your project .
  3. Enable the Document AI API .
  4. Set up authentication with a service account so you can access the API from your local workstation.

Installing the client library

 npm install @google-cloud/documentai 

Using the client library

 /**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION'; // Format is 'us' or 'eu'
// const processorId = 'YOUR_PROCESSOR_ID'; // Create processor in Cloud Console
// const filePath = '/path/to/local/pdf';

const {DocumentProcessorServiceClient} =
  require(' @google-cloud/documentai 
').v1;

// Instantiates a client
// apiEndpoint regions available: eu-documentai.googleapis.com, us-documentai.googleapis.com (Required if using eu based processor)
// const client = new DocumentProcessorServiceClient({apiEndpoint: 'eu-documentai.googleapis.com'});
const client = new DocumentProcessorServiceClient 
();

async function quickstart() {
  // The full resource name of the processor, e.g.:
  // projects/project-id/locations/location/processor/processor-id
  // You must create new processors in the Cloud Console first
  const name = `projects/${projectId}/locations/${location}/processors/${processorId}`;

  // Read the file into memory.
  const fs = require('fs').promises;
  const imageFile = await fs.readFile(filePath);

  // Convert the image data to a Buffer and base64 encode it.
  const encodedImage = Buffer.from(imageFile).toString('base64');

  const request = {
    name,
    rawDocument: {
      content: encodedImage,
      mimeType: 'application/pdf',
    },
  };

  // Recognizes text entities in the PDF document
  const [result] = await client.processDocument(request);
  const {document} = result;

  // Get all of the document text as one big string
  const {text} = document;

  // Extract shards from the text field
  const getText = textAnchor => {
    if (!textAnchor.textSegments || textAnchor.textSegments.length === 0) {
      return '';
    }

    // First shard in document doesn't have startIndex property
    const startIndex = textAnchor.textSegments[0].startIndex || 0;
    const endIndex = textAnchor.textSegments[0].endIndex;

    return text.substring(startIndex, endIndex);
  };

  // Read the text recognition output from the processor
  console.log('The document contains the following paragraphs:');
  const [page1] = document.pages;
  const {paragraphs} = page1;

  for (const paragraph of paragraphs) {
    const paragraphText = getText(paragraph.layout.textAnchor);
    console.log(`Paragraph text:\n${paragraphText}`);
  }
} 

Samples

Samples are in the samples/ directory. Each sample's README.md has instructions for running its sample.

Sample Source Code Try it
Batch-parse-form.v1beta2
source code Open in Cloud Shell
Batch-parse-table.v1beta2
source code Open in Cloud Shell
Batch-process-document
source code Open in Cloud Shell
Parse-form.v1beta2
source code Open in Cloud Shell
Parse-table.v1beta2
source code Open in Cloud Shell
Parse-with-model.v1beta2
source code Open in Cloud Shell
Process-document-form
source code Open in Cloud Shell
Process-document-ocr
source code Open in Cloud Shell
Process-document-quality
source code Open in Cloud Shell
Process-document-specialized
source code Open in Cloud Shell
Process-document-splitter
source code Open in Cloud Shell
Process-document
source code Open in Cloud Shell
Quickstart
source code Open in Cloud Shell
Set-endpoint.v1beta2
source code Open in Cloud Shell

The Document AI Node.js Client API Reference documentation also contains samples.

Supported Node.js Versions

Our client libraries follow the Node.js release schedule . Libraries are compatible with all current active and maintenance versions of Node.js. If you are using an end-of-life version of Node.js, we recommend that you update as soon as possible to an actively supported LTS version.

Google's client libraries support legacy versions of Node.js runtimes on a best-efforts basis with the following warnings:

  • Legacy versions are not tested in continuous integration.
  • Some security patches and features cannot be backported.
  • Dependencies cannot be kept up-to-date.

Client libraries targeting some end-of-life versions of Node.js are available, and can be installed through npm dist-tags . The dist-tags follow the naming convention legacy-(version) . For example, npm install @google-cloud/documentai@legacy-8 installs client libraries for versions compatible with Node.js 8.

Versioning

This library follows Semantic Versioning .

This library is considered to be stable. The code surface will not change in backwards-incompatible ways unless absolutely necessary (e.g. because of critical security issues) or with an extensive deprecation period. Issues and requests against stablelibraries are addressed with the highest priority.

More Information: Google Cloud Platform Launch Stages

Contributing

Contributions welcome! See the Contributing Guide .

Please note that this README.md , the samples/README.md , and a variety of configuration files in this repository (including .nycrc and tsconfig.json ) are generated from a central template. To edit one of these files, make an edit to its templates in directory .

License

Apache Version 2.0

See LICENSE

Design a Mobile Site
View Site in Mobile | Classic
Share by: