Summary of entries of Methods for documentai-toolbox.
google.cloud.documentai_toolbox.utilities.gcs_utilities._get_client_info
_get_client_info
(
module
:
typing
.
Optional
[
str
]
=
None
,
)
-
> google
.
api_core
.
gapic_v1
.
client_info
.
ClientInfo
Returns a custom user agent header.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities._get_client_info
google.cloud.documentai_toolbox.utilities.gcs_utilities._get_storage_client
_get_storage_client
(
module
:
typing
.
Optional
[
str
]
=
None
,
)
-
> google
.
cloud
.
storage
.
client
.
Client
Returns a Storage client with custom user agent header.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities._get_storage_client
google.cloud.documentai_toolbox.utilities.gcs_utilities.create_batches
create_batches
(
gcs_bucket_name
:
str
,
gcs_prefix
:
str
,
batch_size
:
int
=
1000
)
-
> typing
.
List
[
google
.
cloud
.
documentai_v1
.
types
.
document_io
.
BatchDocumentsInputConfig
]
Create batches of documents in Cloud Storage to process with batch_process_documents()
.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.create_batches
google.cloud.documentai_toolbox.utilities.gcs_utilities.create_gcs_uri
create_gcs_uri
(
gcs_bucket_name
:
str
,
gcs_prefix
:
str
)
-
> str
Creates a Cloud Storage uri from the bucket_name and prefix.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.create_gcs_uri
google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blob
get_blob
(
gcs_uri
:
str
,
module
:
typing
.
Optional
[
str
]
=
"get-bytes"
)
-
> google
.
cloud
.
storage
.
blob
.
Blob
Returns a blob from Cloud Storage.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blob
google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blobs
get_blobs
(
gcs_uri
:
typing
.
Optional
[
str
]
=
None
,
gcs_bucket_name
:
typing
.
Optional
[
str
]
=
None
,
gcs_prefix
:
typing
.
Optional
[
str
]
=
"/"
,
module
:
typing
.
Optional
[
str
]
=
"get-bytes"
,
)
-
> typing
.
List
[
google
.
cloud
.
storage
.
blob
.
Blob
]
Returns a list of blobs from Cloud Storage.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.get_blobs
google.cloud.documentai_toolbox.utilities.gcs_utilities.get_bytes
get_bytes
(
gcs_bucket_name
:
str
,
gcs_prefix
:
str
)
-
> typing
.
List
[
bytes
]
Returns a list of bytes of json files from Cloud Storage.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.get_bytes
google.cloud.documentai_toolbox.utilities.gcs_utilities.list_gcs_document_tree
list_gcs_document_tree
(
gcs_bucket_name
:
str
,
gcs_prefix
:
str
)
-
> typing
.
Dict
[
str
,
typing
.
List
[
str
]]
Returns a list path to files in Cloud Storage folder.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.list_gcs_document_tree
google.cloud.documentai_toolbox.utilities.gcs_utilities.print_gcs_document_tree
print_gcs_document_tree
(
gcs_bucket_name
:
str
,
gcs_prefix
:
str
,
files_to_display
:
int
=
4
)
-
> None
Prints a tree of filenames in a Cloud Storage folder.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.print_gcs_document_tree
google.cloud.documentai_toolbox.utilities.gcs_utilities.split_gcs_uri
split_gcs_uri
(
gcs_uri
:
str
)
-
> typing
.
Tuple
[
str
,
str
]
Splits a Cloud Storage uri into the bucket_name and prefix.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.split_gcs_uri
google.cloud.documentai_toolbox.utilities.gcs_utilities.upload_file
upload_file
(
gcs_output_directory
:
str
,
file_name
:
str
,
file_content
:
str
,
content_type
:
str
=
"application/json"
,
module
:
typing
.
Optional
[
str
]
=
"upload-file"
,
)
-
> None
Uploads the converted docproto to gcs.
See more: google.cloud.documentai_toolbox.utilities.gcs_utilities.upload_file
google.cloud.documentai_toolbox.wrappers.document._apply_text_offset
_apply_text_offset
(
documentai_object
:
typing
.
Union
[
typing
.
Dict
[
str
,
typing
.
Dict
],
typing
.
List
],
text_offset
:
int
,
)
-
> None
Applies a text offset to all text_segments in documentai_object
.
See more: google.cloud.documentai_toolbox.wrappers.document._apply_text_offset
google.cloud.documentai_toolbox.wrappers.document._bigquery_column_name
_bigquery_column_name
(
input_string
:
str
)
-
> str
Converts a string into a BigQuery column name.
See more: google.cloud.documentai_toolbox.wrappers.document._bigquery_column_name
google.cloud.documentai_toolbox.wrappers.document._dict_to_bigquery
_dict_to_bigquery
(
dic
:
typing
.
Dict
[
str
,
typing
.
Union
[
str
,
typing
.
List
[
str
]]],
dataset_name
:
str
,
table_name
:
str
,
project_id
:
typing
.
Optional
[
str
],
)
-
> google
.
cloud
.
bigquery
.
job
.
load
.
LoadJob
Loads dictionary to a BigQuery table.
See more: google.cloud.documentai_toolbox.wrappers.document._dict_to_bigquery
google.cloud.documentai_toolbox.wrappers.document._entities_from_shards
_entities_from_shards
(
shards
:
typing
.
List
[
google
.
cloud
.
documentai_v1
.
types
.
document
.
Document
],
)
-
> typing
.
List
[
google
.
cloud
.
documentai_toolbox
.
wrappers
.
entity
.
Entity
]
Returns a list of Entities and Properties from a list of documentai.Document shards.
See more: google.cloud.documentai_toolbox.wrappers.document._entities_from_shards
google.cloud.documentai_toolbox.wrappers.document._get_batch_process_metadata
_get_batch_process_metadata
(
operation_name
:
str
,
location
:
typing
.
Optional
[
str
]
=
None
,
timeout
:
typing
.
Optional
[
float
]
=
None
,
)
-
> google
.
cloud
.
documentai_v1
.
types
.
document_processor_service
.
BatchProcessMetadata
Get BatchProcessMetadata
from a batch_process_documents()
long-running operation.
See more: google.cloud.documentai_toolbox.wrappers.document._get_batch_process_metadata
google.cloud.documentai_toolbox.wrappers.document._get_shards
_get_shards
(
gcs_bucket_name
:
str
,
gcs_prefix
:
str
)
-
> typing
.
List
[
google
.
cloud
.
documentai_v1
.
types
.
document
.
Document
]
Returns a list of documentai.Document
shards from a Cloud Storage folder.
See more: google.cloud.documentai_toolbox.wrappers.document._get_shards
google.cloud.documentai_toolbox.wrappers.document._insert_into_dictionary_with_list
_insert_into_dictionary_with_list
(
dic
:
typing
.
Dict
[
str
,
typing
.
Union
[
str
,
typing
.
List
[
str
]]],
key
:
str
,
value
:
str
)
-
> typing
.
Dict
[
str
,
typing
.
Union
[
str
,
typing
.
List
[
str
]]]
Inserts value into a dictionary that can contain lists.
See more: google.cloud.documentai_toolbox.wrappers.document._insert_into_dictionary_with_list
google.cloud.documentai_toolbox.wrappers.document._pages_from_shards
_pages_from_shards
(
shards
:
typing
.
List
[
google
.
cloud
.
documentai_v1
.
types
.
document
.
Document
],
)
-
> typing
.
List
[
google
.
cloud
.
documentai_toolbox
.
wrappers
.
page
.
Page
]
Returns a list of Pages from a list of documentai.Document shards.
See more: google.cloud.documentai_toolbox.wrappers.document._pages_from_shards
google.cloud.documentai_toolbox.wrappers.page._get_hocr_bounding_box
_get_hocr_bounding_box
(
element_with_layout
:
typing
.
Union
[
google
.
cloud
.
documentai_v1
.
types
.
document
.
Document
.
Page
.
Paragraph
,
google
.
cloud
.
documentai_v1
.
types
.
document
.
Document
.
Page
,
google
.
cloud
.
documentai_v1
.
types
.
document
.
Document
.
Page
.
Token
,
google
.
cloud
.
documentai_v1
.
types
.
document
.
Document
.
Page
.
Block
,
google
.
cloud
.
documentai_v1
.
types
.
document
.
Document
.
Page
.
Symbol
,
],
page_dimension
:
google
.
cloud
.
documentai_v1
.
types
.
document
.
Document
.
Page
.
Dimension
,
)
-
> typing
.
Optional
[
str
]
Returns a hOCR bounding box string.
See more: google.cloud.documentai_toolbox.wrappers.page._get_hocr_bounding_box
google.cloud.documentai_toolbox.wrappers.page._text_from_layout
_text_from_layout
(
layout
:
google
.
cloud
.
documentai_v1
.
types
.
document
.
Document
.
Page
.
Layout
,
text
:
str
)
-
> str
Returns a text from a single layout element.
See more: google.cloud.documentai_toolbox.wrappers.page._text_from_layout
google.cloud.documentai_toolbox.wrappers.page._trim_text
_trim_text
(
text
:
str
)
-
> str
Remove extra space characters from text (blank, newline, tab, etc.) .
See more: google.cloud.documentai_toolbox.wrappers.page._trim_text
google.cloud.documentai_toolbox.wrappers.document.Document.convert_document_to_annotate_file_json_response
convert_document_to_annotate_file_json_response
()
-
> str
Convert OCR data from Document.proto
to JSON str of AnnotateFileResponse
for Vision API.
google.cloud.documentai_toolbox.wrappers.document.Document.convert_document_to_annotate_file_response
convert_document_to_annotate_file_response
()
-
> (
google
.
cloud
.
vision_v1
.
types
.
image_annotator
.
AnnotateFileResponse
)
Convert OCR data from Document.proto
to AnnotateFileResponse.proto
for Vision API.
google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_bigquery
entities_to_bigquery
(
dataset_name
:
str
,
table_name
:
str
,
project_id
:
typing
.
Optional
[
str
]
=
None
)
-
> google
.
cloud
.
bigquery
.
job
.
load
.
LoadJob
Adds extracted entities to a BigQuery table.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_bigquery
google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_dict
entities_to_dict
()
-
> typing
.
Dict
[
str
,
typing
.
Union
[
str
,
typing
.
List
[
str
]]]
Returns Dictionary of entities in document.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.entities_to_dict
google.cloud.documentai_toolbox.wrappers.document.Document.export_hocr_str
export_hocr_str
(
title
:
str
)
-
> str
Exports a string hOCR version of the Document.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.export_hocr_str
google.cloud.documentai_toolbox.wrappers.document.Document.export_images
export_images
(
output_path
:
str
,
output_file_prefix
:
str
,
output_file_extension
:
str
)
-
> typing
.
List
[
str
]
Exports images from Document.entities
to files.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.export_images
google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_bigquery
form_fields_to_bigquery
(
dataset_name
:
str
,
table_name
:
str
,
project_id
:
typing
.
Optional
[
str
]
=
None
)
-
> google
.
cloud
.
bigquery
.
job
.
load
.
LoadJob
Adds extracted form fields to a BigQuery table.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_bigquery
google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_dict
form_fields_to_dict
()
-
> typing
.
Dict
[
str
,
typing
.
Union
[
str
,
typing
.
List
[
str
]]]
Returns dictionary of form fields in document.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.form_fields_to_dict
google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_metadata
from_batch_process_metadata
(
metadata
:
google
.
cloud
.
documentai_v1
.
types
.
document_processor_service
.
BatchProcessMetadata
,
)
-
> typing
.
List
[
google
.
cloud
.
documentai_toolbox
.
wrappers
.
document
.
Document
]
Loads Documents from Cloud Storage, using the output from BatchProcessMetadata
.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_metadata
google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_operation
from_batch_process_operation
(
location
:
str
,
operation_name
:
str
,
timeout
:
typing
.
Optional
[
float
]
=
None
)
-
> typing
.
List
[
google
.
cloud
.
documentai_toolbox
.
wrappers
.
document
.
Document
]
Loads Documents from Cloud Storage, using the operation name returned from batch_process_documents()
.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.from_batch_process_operation
google.cloud.documentai_toolbox.wrappers.document.Document.from_document_path
from_document_path
(
document_path
:
str
,
)
-
> google
.
cloud
.
documentai_toolbox
.
wrappers
.
document
.
Document
Loads Document
from local document_path
.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.from_document_path
google.cloud.documentai_toolbox.wrappers.document.Document.from_documentai_document
from_documentai_document
(
documentai_document
:
google
.
cloud
.
documentai_v1
.
types
.
document
.
Document
,
)
-
> google
.
cloud
.
documentai_toolbox
.
wrappers
.
document
.
Document
Loads Document
from local documentai_document
.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.from_documentai_document
google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs
from_gcs
(
gcs_bucket_name
:
str
,
gcs_prefix
:
str
,
gcs_input_uri
:
typing
.
Optional
[
str
]
=
None
)
-
> google
.
cloud
.
documentai_toolbox
.
wrappers
.
document
.
Document
Loads a Document from a Cloud Storage directory.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs
google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs_uri
from_gcs_uri
(
gcs_uri
:
str
,
gcs_input_uri
:
typing
.
Optional
[
str
]
=
None
)
-
> google
.
cloud
.
documentai_toolbox
.
wrappers
.
document
.
Document
Loads a Document from a Cloud Storage uri.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.from_gcs_uri
google.cloud.documentai_toolbox.wrappers.document.Document.get_entity_by_type
get_entity_by_type
(
target_type
:
str
,
)
-
> typing
.
List
[
google
.
cloud
.
documentai_toolbox
.
wrappers
.
entity
.
Entity
]
Returns the list of Entities
of target_type
.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.get_entity_by_type
google.cloud.documentai_toolbox.wrappers.document.Document.get_form_field_by_name
get_form_field_by_name
(
target_field
:
str
,
)
-
> typing
.
List
[
google
.
cloud
.
documentai_toolbox
.
wrappers
.
page
.
FormField
]
Returns the list of FormFields
named target_field
.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.get_form_field_by_name
google.cloud.documentai_toolbox.wrappers.document.Document.search_pages
search_pages
(
target_string
:
typing
.
Optional
[
str
]
=
None
,
pattern
:
typing
.
Optional
[
str
]
=
None
)
-
> typing
.
List
[
google
.
cloud
.
documentai_toolbox
.
wrappers
.
page
.
Page
]
Returns the list of Pages containing target_string or text matching pattern.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.search_pages
google.cloud.documentai_toolbox.wrappers.document.Document.split_pdf
split_pdf
(
pdf_path
:
str
,
output_path
:
str
)
-
> typing
.
List
[
str
]
Splits local PDF file into multiple PDF files based on output from a Splitter processor.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.split_pdf
google.cloud.documentai_toolbox.wrappers.document.Document.to_merged_documentai_document
to_merged_documentai_document
()
-
> (
google
.
cloud
.
documentai_v1
.
types
.
document
.
Document
)
Exports a documentai.Document from the wrapped document with shards merged.
See more: google.cloud.documentai_toolbox.wrappers.document.Document.to_merged_documentai_document
google.cloud.documentai_toolbox.wrappers.entity.Entity.crop_image
crop_image
(
documentai_page
:
google
.
cloud
.
documentai_v1
.
types
.
document
.
Document
.
Page
,
)
-
> typing
.
Optional
[
PIL
.
Image
.
Image
]
Return image cropped from page image for detected entity.
See more: google.cloud.documentai_toolbox.wrappers.entity.Entity.crop_image
google.cloud.documentai_toolbox.wrappers.page.Page._get_elements
_get_elements
(
element_type
:
typing
.
Type
,
attribute_name
:
str
)
-
> typing
.
List
Helper method to create elements based on specified type.
See more: google.cloud.documentai_toolbox.wrappers.page.Page._get_elements
google.cloud.documentai_toolbox.wrappers.page.Table._extract_table_rows
_extract_table_rows
(
table_rows
:
typing
.
Iterable
[
google
.
cloud
.
documentai_v1
.
types
.
document
.
Document
.
Page
.
Table
.
TableRow
],
)
-
> typing
.
List
[
typing
.
List
[
str
]]
Returns a list of rows from table_rows.
See more: google.cloud.documentai_toolbox.wrappers.page.Table._extract_table_rows
google.cloud.documentai_toolbox.wrappers.page.Table.to_dataframe
to_dataframe
()
-
> pandas
.
core
.
frame
.
DataFrame
Returns pd.DataFrame from documentai.table .
See more: google.cloud.documentai_toolbox.wrappers.page.Table.to_dataframe
google.cloud.documentai_toolbox.wrappers.page._BasePageElement._get_children_of_element
_get_children_of_element
(
potential_children
:
typing
.
List
[
google
.
cloud
.
documentai_toolbox
.
wrappers
.
page
.
_BasePageElement
],
)
-
> typing
.
List
[
google
.
cloud
.
documentai_toolbox
.
wrappers
.
page
.
_BasePageElement
]
Filters potential child elements to identify only those fully contained within this element.
See more: google.cloud.documentai_toolbox.wrappers.page._BasePageElement._get_children_of_element