Document AI: Node.js Client

release level npm version codecov

Document AI client for Node.js

A comprehensive list of changes in each version may be found in the CHANGELOG .

Read more about the client libraries for Cloud APIs, including the older Google APIs Client Libraries, in Client Libraries Explained .

Table of contents:

Quickstart

Before you begin

  1. Select or create a Cloud Platform project .
  2. Enable billing for your project .
  3. Enable the Document AI API .
  4. Set up authentication with a service account so you can access the API from your local workstation.

Installing the client library

 npm install @google-cloud/documentai 

Using the client library

 /**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION'; // Format is 'us' or 'eu'
// const processorId = 'YOUR_PROCESSOR_ID'; // Create processor in Cloud Console
// const filePath = '/path/to/local/pdf';

const {DocumentProcessorServiceClient} =
  require(' @google-cloud/documentai 
').v1;

// Instantiates a client
// apiEndpoint regions available: eu-documentai.googleapis.com, us-documentai.googleapis.com (Required if using eu based processor)
// const client = new DocumentProcessorServiceClient({apiEndpoint: 'eu-documentai.googleapis.com'});
const client = new DocumentProcessorServiceClient 
();

async function quickstart() {
  // The full resource name of the processor, e.g.:
  // projects/project-id/locations/location/processor/processor-id
  // You must create new processors in the Cloud Console first
  const name = `projects/${projectId}/locations/${location}/processors/${processorId}`;

  // Read the file into memory.
  const fs = require('fs').promises;
  const imageFile = await fs.readFile(filePath);

  // Convert the image data to a Buffer and base64 encode it.
  const encodedImage = Buffer.from(imageFile).toString('base64');

  const request = {
    name,
    rawDocument: {
      content: encodedImage,
      mimeType: 'application/pdf',
    },
  };

  // Recognizes text entities in the PDF document
  const [result] = await client.processDocument(request);
  const {document} = result;

  // Get all of the document text as one big string
  const {text} = document;

  // Extract shards from the text field
  const getText = textAnchor => {
    if (!textAnchor.textSegments || textAnchor.textSegments.length === 0) {
      return '';
    }

    // First shard in document doesn't have startIndex property
    const startIndex = textAnchor.textSegments[0].startIndex || 0;
    const endIndex = textAnchor.textSegments[0].endIndex;

    return text.substring(startIndex, endIndex);
  };

  // Read the text recognition output from the processor
  console.log('The document contains the following paragraphs:');
  const [page1] = document.pages;
  const {paragraphs} = page1;

  for (const paragraph of paragraphs) {
    const paragraphText = getText(paragraph.layout.textAnchor);
    console.log(`Paragraph text:\n${paragraphText}`);
  }
} 

Samples

Samples are in the samples/ directory. Each sample's README.md has instructions for running its sample.

Sample Source Code Try it
Batch-parse-form.v1beta2
source code Open in Cloud Shell
Batch-parse-table.v1beta2
source code Open in Cloud Shell
Batch-process-document
source code Open in Cloud Shell
Parse-form.v1beta2
source code Open in Cloud Shell
Parse-table.v1beta2
source code Open in Cloud Shell
Parse-with-model.v1beta2
source code Open in Cloud Shell
Process-document
source code Open in Cloud Shell
Quickstart
source code Open in Cloud Shell
Set-endpoint.v1beta2
source code Open in Cloud Shell

The Document AI Node.js Client API Reference documentation also contains samples.

Supported Node.js Versions

Our client libraries follow the Node.js release schedule . Libraries are compatible with all current active and maintenance versions of Node.js.

Client libraries targeting some end-of-life versions of Node.js are available, and can be installed via npm dist-tags . The dist-tags follow the naming convention legacy-(version) .

Legacy Node.js versions are supported as a best effort:

  • Legacy versions will not be tested in continuous integration.
  • Some security patches may not be able to be backported.
  • Dependencies will not be kept up-to-date, and features will not be backported.

Legacy tags available

  • legacy-8 : install client libraries from this dist-tag for versions compatible with Node.js 8.

Versioning

This library follows Semantic Versioning .

This library is considered to be in beta. This means it is expected to be mostly stable while we work toward a general availability release; however, complete stability is not guaranteed. We will address issues and requests against beta libraries with a high priority.

More Information: Google Cloud Platform Launch Stages

Contributing

Contributions welcome! See the Contributing Guide .

Please note that this README.md , the samples/README.md , and a variety of configuration files in this repository (including .nycrc and tsconfig.json ) are generated from a central template. To edit one of these files, make an edit to its templates in directory .

License

Apache Version 2.0

See LICENSE

Create a Mobile Website
View Site in Mobile | Classic
Share by: