Resource: DataScan
Represents a user-visible job which provides the insights for the related data source.
For example:
- Data quality: generates queries based on the rules and runs against the data to get data quality check results. For more information, see Auto data quality overview .
- Data profile: analyzes the data in tables and generates insights about the structure, content and relationships (such as null percent, cardinality, min/max/mean, etc). For more information, see About data profiling .
- Data discovery: scans data in Cloud Storage buckets to extract and then catalog metadata. For more information, see Discover and catalog Cloud Storage data .
JSON representation |
---|
{ "name" : string , "uid" : string , "description" : string , "displayName" : string , "labels" : { string : string , ... } , "state" : enum ( |
name
string
Output only. Identifier. The relative resource name of the scan, of the form: projects/{project}/locations/{locationId}/dataScans/{datascanId}
, where project
refers to a projectId
or project_number
and locationId
refers to a Google Cloud region.
uid
string
Output only. System generated globally unique ID for the scan. This ID will be different if the scan is deleted and re-created with the same name.
description
string
Optional. Description of the scan.
- Must be between 1-1024 characters.
displayName
string
Optional. User friendly display name.
- Must be between 1-256 characters.
labels
map (key: string, value: string)
Optional. User-defined labels for the scan.
An object containing a list of "key": value
pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }
.
state
enum (
State
)
Output only. Current state of the DataScan.
createTime
string (
Timestamp
format)
Output only. The time when the scan was created.
Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z"
, "2014-10-02T15:01:23.045123456Z"
or "2014-10-02T15:01:23+05:30"
.
updateTime
string (
Timestamp
format)
Output only. The time when the scan was last updated.
Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z"
, "2014-10-02T15:01:23.045123456Z"
or "2014-10-02T15:01:23+05:30"
.
data
object (
DataSource
)
Required. The data source for DataScan.
executionSpec
object (
ExecutionSpec
)
Optional. DataScan execution settings.
If not specified, the fields in it will use their default values.
executionStatus
object (
ExecutionStatus
)
Output only. Status of the data scan execution.
type
enum (
DataScanType
)
Output only. The type of DataScan.
spec
. Data scan related setting. The settings are required and immutable. After you configure the settings for one type of data scan, you can't change the data scan to a different type of data scan. spec
can be only one of the following:dataQualitySpec
object (
DataQualitySpec
)
Settings for a data quality scan.
dataProfileSpec
object (
DataProfileSpec
)
Settings for a data profile scan.
dataDiscoverySpec
object (
DataDiscoverySpec
)
Settings for a data discovery scan.
result
. The result of the data scan. result
can be only one of the following:dataQualityResult
object (
DataQualityResult
)
Output only. The result of a data quality scan.
dataProfileResult
object (
DataProfileResult
)
Output only. The result of a data profile scan.
dataDiscoveryResult
object (
DataDiscoveryResult
)
Output only. The result of a data discovery scan.
DataSource
The data source for DataScan.
JSON representation |
---|
{ // Union field |
source
. The source is required and immutable. Once it is set, it cannot be change to others. source
can be only one of the following:entity
string
Immutable. The Dataplex Universal Catalog entity that represents the data source (e.g. BigQuery table) for DataScan, of the form: projects/{project_number}/locations/{locationId}/lakes/{lakeId}/zones/{zoneId}/entities/{entityId}
.
resource
string
Immutable. The service-qualified full resource name of the cloud resource for a DataScan job to scan against. The field could either be: Cloud Storage bucket for DataDiscoveryScan Format: //storage.googleapis.com/projects/PROJECT_ID/buckets/BUCKET_ID or BigQuery table of type "TABLE" for DataProfileScan/DataQualityScan Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID
ExecutionSpec
DataScan execution settings.
JSON representation |
---|
{ "trigger" : { object ( |
trigger
object (
Trigger
)
Optional. Spec related to how often and when a scan should be triggered.
If not specified, the default is OnDemand
, which means the scan will not run until the user calls dataScans.run
API.
Union field incremental
. Spec related to incremental scan of the data
When an option is selected for incremental scan, it cannot be unset or changed. If not specified, a data scan will run for all data in the table. incremental
can be only one of the following:
field
string
Immutable. The unnested field (of type Date or Timestamp ) that contains values which monotonically increase over time.
If not specified, a data scan will run for all data in the table.
Trigger
DataScan scheduling and trigger settings.
JSON representation |
---|
{ // Union field |
OnDemand
This type has no fields.
The scan runs once via dataScans.run
API.
Schedule
The scan is scheduled to run periodically.
JSON representation |
---|
{ "cron" : string } |
Fields | |
---|---|
cron
|
Required. Cron schedule for running scans periodically. To explicitly set a timezone in the cron tab, apply a prefix in the cron tab: "CRON_TZ=${IANA_TIME_ZONE}"or "TZ=${IANA_TIME_ZONE}". The ${IANA_TIME_ZONE}may only be a valid string from IANA time zone database ( wikipedia
). For example, This field is required for Schedule scans. |
ExecutionStatus
Status of the data scan execution.
JSON representation |
---|
{ "latestJobStartTime" : string , "latestJobEndTime" : string , "latestJobCreateTime" : string } |
Fields | |
---|---|
latestJobStartTime
|
Optional. The time when the latest DataScanJob started. Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
latestJobEndTime
|
Optional. The time when the latest DataScanJob ended. Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
latestJobCreateTime
|
Optional. The time when the DataScanJob execution was created. Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
Methods |
|
---|---|
|
Creates a DataScan resource. |
|
Deletes a DataScan resource. |
|
Generates recommended data quality rules based on the results of a data profiling scan. |
|
Gets a DataScan resource. |
|
Gets the access control policy for a resource. |
|
Lists DataScans. |
|
Updates a DataScan resource. |
|
Runs an on-demand execution of a DataScan |
|
Sets the access control policy on the specified resource. |
|
Returns permissions that a caller has on the specified resource. |