The schema defines the output of the processed document by a processor.
| JSON representation |
|---|
{ "displayName" : string , "description" : string , "entityTypes" : [ { object ( |
| Fields | |
|---|---|
displayName
|
Display name to show users. |
description
|
Description of the schema. |
entityTypes[]
|
Entity types of the schema. |
metadata
|
Metadata of the schema. |
EntityType
EntityType is the wrapper of a label of the corresponding model with detailed attributes and limitations for entity-based processors. Multiple types can also compose a dependency tree to represent nested types.
| JSON representation |
|---|
{ "displayName" : string , "name" : string , "baseTypes" : [ string ] , "properties" : [ { object ( |
displayName
string
User defined name for the type.
name
string
Name of the type. It must be unique within the schema file and cannot be a "Common Type". The following naming conventions are used:
- Use
snake_casing. - Name matching is case-sensitive.
- Maximum 64 characters.
- Must start with a letter.
- Allowed characters: ASCII letters
[a-z0-9_-]. (For backward compatibility internal infrastructure and tooling can handle any ASCII character.) - The
/is sometimes used to denote a property of a type. For example,line_item/amount. This convention is deprecated, but will still be honored for backward compatibility.
baseTypes[]
string
The entity type that this type is derived from. For now, one and only one should be set.
properties[]
object (
Property
)
Description the nested structure, or composition of an entity.
Union field value_source
.
value_source
can be only one of the following:
enumValues
object (
EnumValues
)
If specified, lists all the possible values for this entity. This should not be more than a handful of values. If the number of values is >10 or could change frequently use the EntityType.value_ontology
field and specify a list of all possible values in a value ontology file.
EnumValues
Defines the a list of enum values.
| JSON representation |
|---|
{ "values" : [ string ] } |
| Fields | |
|---|---|
values[]
|
The individual values that this enum values type can include. |
Property
Defines properties that can be part of the entity type.
| JSON representation |
|---|
{ "name" : string , "displayName" : string , "valueType" : string , "occurrenceType" : enum ( |
| Fields | |
|---|---|
name
|
The name of the property. Follows the same guidelines as the EntityType name. |
displayName
|
User defined name for the property. |
valueType
|
A reference to the value type of the property. This type is subject to the same conventions as the |
occurrenceType
|
Occurrence type limits the number of instances an entity type appears in the document. |
method
|
Specifies how the entity's value is obtained. |
OccurrenceType
Types of occurrences of the entity type in the document. This represents the number of instances, not mentions, of an entity. For example, a bank statement might only have one account_number
, but this account number can be mentioned in several places on the document. In this case, the account_number
is considered a REQUIRED_ONCE
entity type. If, on the other hand, it's expected that a bank statement contains the status of multiple different accounts for the customers, the occurrence type is set to REQUIRED_MULTIPLE
.
| Enums | |
|---|---|
OCCURRENCE_TYPE_UNSPECIFIED
|
Unspecified occurrence type. |
OPTIONAL_ONCE
|
There will be zero or one instance of this entity type. The same entity instance may be mentioned multiple times. |
OPTIONAL_MULTIPLE
|
The entity type will appear zero or multiple times. |
REQUIRED_ONCE
|
The entity type will only appear exactly once. The same entity instance may be mentioned multiple times. |
REQUIRED_MULTIPLE
|
The entity type will appear once or more times. |
Method
Specifies how the entity's value is obtained from the document.
| Enums | |
|---|---|
METHOD_UNSPECIFIED
|
Unspecified method. It defaults to EXTRACT
. |
EXTRACT
|
The entity's value is directly extracted as-is from the document text. |
DERIVE
|
The entity's value is derived through inference and is not necessarily an exact text extraction from the document. |
Metadata
Metadata for global schema behavior.
| JSON representation |
|---|
{ "documentSplitter" : boolean , "documentAllowMultipleLabels" : boolean , "prefixedNamingOnProperties" : boolean , "skipNamingValidation" : boolean } |
| Fields | |
|---|---|
documentSplitter
|
If true, a |
documentAllowMultipleLabels
|
If true, on a given page, there can be multiple |
prefixedNamingOnProperties
|
If set, all the nested entities must be prefixed with the parents. |
skipNamingValidation
|
If set, this will skip the naming format validation in the schema. So the string values in |

