Use auto data quality

This document describes how to use Dataplex Universal Catalog data quality scans to measure, monitor, and manage the quality of your data. Data quality scans help you automate the process of validating your data for completeness, validity, and consistency.

With data quality scans, you can define rules to check for missing values, ensure values match a regular expression or belong to a set, verify uniqueness, or use custom SQL for more complex validations like anomaly detection. This document explains how to create and manage data quality scans.

To learn more about data quality scans, see About auto data quality .

Before you begin

Enable the Dataplex API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin ), which contains the serviceusage.services.enable permission. Learn how to grant roles .

Enable the API
Optional: If you want Dataplex Universal Catalog to generate recommendations for data quality rules based on the results of a data profile scan, create and run the data profile scan .

Required roles

To run a data quality scan on a BigQuery table, you need permission to read the BigQuery table and permission to create a BigQuery job in the project used to scan the table.

Note: Dataplex Universal Catalog doesn't create a BigQuery job in your project. However, you need this permission to create a DryRun job to check for permissions for the table.
If the BigQuery table and the data quality scan are in different projects, then you need to give the Dataplex Universal Catalog service account of the project containing the data quality scan read permission for the corresponding BigQuery table.

Note: If you haven't created any data quality or data profile scans or you don't have a Dataplex Universal Catalog lake in this project, create a service identifier by running: gcloud beta services identity create --service=dataplex.googleapis.com . This command returns a Dataplex Universal Catalog service identifier if it exists.
If the data quality rules refer to additional tables, then the scan project's service account must have read permissions on the same tables.
To get the permissions that you need to export the scan results to a BigQuery table, ask your administrator to grant the Dataplex Universal Catalog service account the BigQuery Data Editor ( roles/bigquery.dataEditor ) IAM role on the results dataset and table. This grants the following permissions:
- bigquery.datasets.get
- bigquery.tables.create
- bigquery.tables.get
- bigquery.tables.getData
- bigquery.tables.update
- bigquery.tables.updateData
If the BigQuery data is organized in a Dataplex Universal Catalog lake, grant the Dataplex Universal Catalog service account the Dataplex Metadata Reader ( roles/dataplex.metadataReader ) and Dataplex Viewer ( roles/dataplex.viewer ) IAM roles. Alternatively, you need all of the following permissions:
- dataplex.lakes.list
- dataplex.lakes.get
- dataplex.zones.list
- dataplex.zones.get
- dataplex.entities.list
- dataplex.entities.get
- dataplex.operations.get
If you're scanning a BigQuery external table from Cloud Storage, grant the Dataplex Universal Catalog service account the Storage Object Viewer ( roles/storage.objectViewer ) role for the bucket. Alternatively, assign the Dataplex Universal Catalog service account the following permissions:
- storage.buckets.get
- storage.objects.get
If you want to publish the data quality scan results as Dataplex Universal Catalog metadata, you must be granted the BigQuery Data Editor ( roles/bigquery.dataEditor ) IAM role for the table, and the dataplex.entryGroups.useDataQualityScorecardAspect permission on the @bigquery entry group in the same location as the table. Alternatively, you must be granted the Dataplex Catalog Editor ( roles/dataplex.catalogEditor ) role for the @bigquery entry group in the same location as the table.

Alternatively, you need all of the following permissions:
- bigquery.tables.update - on the table
- dataplex.entryGroups.useDataQualityScorecardAspect - on the @bigquery entry group
Or, you need all of the following permissions:
- dataplex.entries.update - on the @bigquery entry group
- dataplex.entryGroups.useDataQualityScorecardAspect - on the @bigquery entry group
If you need to access columns protected by BigQuery column-level access policies, then assign the Dataplex Universal Catalog service account permissions for those columns. The user creating or updating a data scan also needs permissions for the columns.
If a table has BigQuery row-level access policies enabled, then you can only scan rows visible to the Dataplex Universal Catalog service account. Note that the individual user's access privileges are not evaluated for row-level policies.

Required data scan roles

To use auto data quality, ask your administrator to grant you one of the following IAM roles:

Full access to DataScan resources: Dataplex DataScan Administrator ( roles/dataplex.dataScanAdmin )
To create DataScan resources: Dataplex DataScan Creator ( roles/dataplex.dataScanCreator ) on the project
Write access to DataScan resources: Dataplex DataScan Editor ( roles/dataplex.dataScanEditor )
Read access to DataScan resources excluding rules and results: Dataplex DataScan Viewer ( roles/dataplex.dataScanViewer )
Read access to DataScan resources, including rules and results: Dataplex DataScan DataViewer ( roles/dataplex.dataScanDataViewer )

The following table lists the DataScan permissions:

Permission name	Grants permission to do the following:
`dataplex.datascans.create`	Create a `DataScan`
`dataplex.datascans.delete`	Delete a `DataScan`
`dataplex.datascans.get`	View operational metadata such as ID or schedule, but not results and rules
`dataplex.datascans.getData`	View `DataScan` details including rules and results
`dataplex.datascans.list`	List `DataScan` s
`dataplex.datascans.run`	Run a `DataScan`
`dataplex.datascans.update`	Update the description of a `DataScan`
`dataplex.datascans.getIamPolicy`	View the current IAM permissions on the scan
`dataplex.datascans.setIamPolicy`	Set IAM permissions on the scan

Define data quality rules

You can define data quality rules by using built-in rules or custom SQL checks . If you're using the Google Cloud CLI, you can define these rules in a JSON or YAML file .

The examples in the following sections show how to define a variety of data quality rules. The rules validate a sample table that contains data about customer transactions. Assume the table has the following schema:

Column name	Column type	Column description
transaction_timestamp	Timestamp	Timestamp of the transaction. The table is partitioned on this field.
customer_id	String	A customer ID in the format of 8 letters followed by 16 digits.
transaction_id	String	The transaction ID needs to be unique across the table.
currency_id	String	One of the supported currencies.The currency type must match one of the available currencies in the dimension table `dim_currency` .
amount	float	Transaction amount.
discount_pct	float	Discount percentage. This value must be between 0 and 100.

Define data quality rules using built-in rule types

The following example rules are based on built-in rule types. You can create rules based on built-in rule types using the Google Cloud console or the API. Dataplex Universal Catalog might recommend some of these rules.

Column name	Rule Type	Suggested dimension	Rule parameters
`transaction_id`	Uniqueness check	Uniqueness	Threshold: `Not Applicable`
`amount`	Null check	Completeness	Threshold: `100%`
`customer_id`	Regex (regular expression) check	Validity	Regular expression: `^[0-9]{8}[a-zA-Z]{16}$` Threshold: `100%`
`currency_id`	Value set check	Validity	Set of: `USD,JPY,INR,GBP,CAN` Threshold: `100%`

Define data quality rules using custom SQL rules

To build custom SQL rules, use the following framework:

When you create a rule that evaluates one row at a time, create an expression that generates the number of successful rows when Dataplex Universal Catalog evaluates the query SELECT COUNTIF( CUSTOM_SQL_EXPRESSION ) FROM TABLE . Dataplex Universal Catalog checks the number of successful rows against the threshold.
When you create a rule that evaluates across the rows or uses a table condition, create an expression that returns success or failure when Dataplex Universal Catalog evaluates the query SELECT IF( CUSTOM_SQL_EXPRESSION ) FROM TABLE .
When you create a rule that evaluates the invalid state of a dataset, provide a statement that returns invalid rows. If any rows are returned, the rule fails. Omit the trailing semicolon from the SQL statement.
You can refer to a data source table and all of its precondition filters by using the data reference parameter ${data()} in a rule, instead of explicitly mentioning the source table and its filters. Examples of precondition filters include row filters, sampling percents, and incremental filters. The ${data()} parameter is case-sensitive.

The following example rules are based on custom SQL rules.

Rule type	Rule description	SQL expression
Row condition	Checks if the value of the `discount_pct` is between 0 and 100.	`0` < `discount_pct` AND `discount_pct` < `100`
Row condition	Reference check to validate that `currency_id` is one of the supported currencies.	`currency_id in (select id from my_project_id.dim_dataset.dim_currency)`
Table condition	Aggregate SQL expression that checks if the average `discount_pct` is between 30% and 50%.	`30<avg(discount) AND avg(discount) <50`
Row condition	Checks if a date is not in the future.	`TIMESTAMP(transaction_timestamp) < CURRENT_TIMESTAMP()`
Table condition	A BigQuery user-defined function (UDF) to check that the average transaction amount is less than a predefined value per country. Create the (Javascript) UDF by running the following command: CREATE OR REPLACE FUNCTION myProject.myDataset.average_by_country ( country STRING, average FLOAT64) RETURNS BOOL LANGUAGE js AS R""" if (country = "CAN" && average < 5000){ return 1 } else if (country = "IND" && average < 1000){ return 1 } else { return 0 } """;	Example rule to check the average transaction amount for `country=CAN` . myProject.myDataset.average_by_country( "CAN", (SELECT avg(amount) FROM myProject.myDataset.transactions_table WHERE currency_id = 'CAN' ))
Table condition	A BigQuery ML predict clause to identify anomalies in `discount_pct` . It checks if a discount should be applied based on `customer` , `currency` , and `transaction` . The rule checks if the prediction matches the actual value, at least 99% of times. Assumption: The ML model is created before using the rule. Create the ML model using the following command: CREATE MODEL model-project-id.dataset-id.model-name OPTIONS(model_type='logistic_reg') AS SELECT IF(discount_pct IS NULL, 0, 1) AS label, IFNULL(customer_id, "") AS customer, IFNULL(currency_id, "") AS currency, IFNULL(amount, 0.0) AS amount FROM `data-project-id.dataset-id.table-names` WHERE transaction_timestamp < '2022-01-01';	The following rule checks if prediction accuracy is greater than 99%. SELECT accuracy > 0.99 FROM ML.EVALUATE (MODEL model-project-id.dataset-id.model-name, ( SELECT customer_id, currency_id, amount, discount_pct FROM data-project-id.dataset-id.table-names WHERE transaction_timestamp > '2022-01-01'; ) )
Row condition	A BigQuery ML predict function to identify anomalies in `discount_pct` . The function checks if a discount should be applied based on `customer` , `currency` and `transaction` . The rule identifies all the occurrences where the prediction didn't match. Assumption: The ML model is created before using the rule. Create the ML model using the following command: CREATE MODEL model-project-id.dataset-id.model-name OPTIONS(model_type='logistic_reg') AS SELECT IF(discount_pct IS NULL, 0, 1) AS label, IFNULL(customer_id, "") AS customer, IFNULL(currency_id, "") AS currency, IFNULL(amount, 0.0) AS amount FROM `data-project-id.dataset-id.table-names` WHERE transaction_timestamp < '2022-01-01';	The following rule checks if the discount prediction matches with the actual for every row. IF(discount_pct > 0, 1, 0) =(SELECT predicted_label FROM ML.PREDICT( MODEL model-project-id.dataset-id.model-name, ( SELECT customer_id, currency_id, amount, discount_pct FROM data-project-id.dataset-id.table-names AS t WHERE t.transaction_timestamp = transaction_timestamp LIMIT 1 ) ) )
SQL assertion	Validates if the `discount_pct` is greater than 30% for today by checking whether any rows exist with a discount percent less than or equal to 30.	`SELECT * FROM my_project_id.dim_dataset.dim_currency WHERE discount_pct <= 30 AND transaction_timestamp >= current_date()`
SQL assertion (with data reference parameter )	Checks if the `discount_pct` is greater than 30% for all the supported currencies today. The date filter `transaction_timestamp >= current_date()` is applied as a row filter on the data source table. The data reference parameter `${data()}` acts as a placeholder for `my_project_id.dim_dataset.dim_currency WHERE transaction_timestamp >= current_date()` and applies the row filter.	`SELECT * FROM ${data()} WHERE discount_pct > 30`

Define data quality rules using the gcloud CLI

The following example YAML file uses some of the same rules as the sample rules using built-in types and the sample custom SQL rules . This YAML file also contains other specifications for the data quality scan, such as filters and sampling percent. When you use the gcloud CLI to create or update a data quality scan, you can use a YAML file like this as input to the --data-quality-spec-file argument.

   rules 
 : 
 - 
  
 uniquenessExpectation 
 : 
  
 {} 
  
 column 
 : 
  
 transaction_id 
  
 dimension 
 : 
  
 UNIQUENESS 
 - 
  
 nonNullExpectation 
 : 
  
 {} 
  
 column 
 : 
  
 amount 
  
 dimension 
 : 
  
 COMPLETENESS 
  
 threshold 
 : 
  
 1 
 - 
  
 regexExpectation 
 : 
  
 regex 
 : 
  
 '^[0-9]{8}[a-zA-Z]{16}$' 
  
 column 
 : 
  
 customer_id 
  
 ignoreNull 
 : 
  
 true 
  
 dimension 
 : 
  
 VALIDITY 
  
 threshold 
 : 
  
 1 
 - 
  
 setExpectation 
 : 
  
 values 
 : 
  
 - 
  
 'USD' 
  
 - 
  
 'JPY' 
  
 - 
  
 'INR' 
  
 - 
  
 'GBP' 
  
 - 
  
 'CAN' 
  
 column 
 : 
  
 currency_id 
  
 ignoreNull 
 : 
  
 true 
  
 dimension 
 : 
  
 VALIDITY 
  
 threshold 
 : 
  
 1 
 - 
  
 rangeExpectation 
 : 
  
 minValue 
 : 
  
 '0' 
  
 maxValue 
 : 
  
 '100' 
  
 column 
 : 
  
 discount_pct 
  
 ignoreNull 
 : 
  
 true 
  
 dimension 
 : 
  
 VALIDITY 
  
 threshold 
 : 
  
 1 
 - 
  
 rowConditionExpectation 
 : 
  
 sqlExpression 
 : 
  
 0 < `discount_pct` AND `discount_pct` < 100 
  
 column 
 : 
  
 discount_pct 
  
 dimension 
 : 
  
 VALIDITY 
  
 threshold 
 : 
  
 1 
 - 
  
 rowConditionExpectation 
 : 
  
 sqlExpression 
 : 
  
 currency_id in (select id from `my_project_id.dim_dataset.dim_currency`) 
  
 column 
 : 
  
 currency_id 
  
 dimension 
 : 
  
 VALIDITY 
  
 threshold 
 : 
  
 1 
 - 
  
 tableConditionExpectation 
 : 
  
 sqlExpression 
 : 
  
 30 < avg(discount_pct) AND avg(discount_pct) < 50 
  
 dimension 
 : 
  
 VALIDITY 
 - 
  
 rowConditionExpectation 
 : 
  
 sqlExpression 
 : 
  
 TIMESTAMP(transaction_timestamp) < CURRENT_TIMESTAMP() 
  
 column 
 : 
  
 transaction_timestamp 
  
 dimension 
 : 
  
 VALIDITY 
  
 threshold 
 : 
  
 1 
 - 
  
 sqlAssertion 
 : 
  
 sqlStatement 
 : 
  
 SELECT * FROM `my_project_id.dim_dataset.dim_currency` WHERE discount_pct > 100 
  
 dimension 
 : 
  
 VALIDITY 
 samplingPercent 
 : 
  
 50 
 rowFilter 
 : 
  
 discount_pct > 100 
 postScanActions 
 : 
  
 bigqueryExport 
 : 
  
 resultsTable 
 : 
  
 projects/my_project_id/datasets/dim_dataset/tables/dim_currency 
  
 notificationReport 
 : 
  
 recipients 
 : 
  
 emails 
 : 
  
 - 
  
 '222larabrown@gmail.com' 
  
 - 
  
 'cloudysanfrancisco@gmail.com' 
  
 scoreThresholdTrigger 
 : 
  
 scoreThreshold 
 : 
  
 50 
  
 jobFailureTrigger 
 : 
  
 {} 
  
 jobEndTrigger 
 : 
  
 {} 
 catalogPublishingEnabled 
 : 
  
 true

Create a data quality scan

Console

In the Google Cloud console, go to the Dataplex Universal Catalog Data profiling & qualitypage.

Go to Data profiling & quality
Click Create data quality scan.
In the Define scanwindow, fill in the following fields:
1. Optional: Enter a Display name.
2. Enter an ID. See the resource naming conventions .
3. Optional: Enter a Description.
4. In the Tablefield, click Browse. Choose the table to scan, and then click Select. Only standard BigQuery tables are supported.
  
  For tables in multi-region datasets, choose a region where to create the data scan.
  
  To browse the tables organized within Dataplex Universal Catalog lakes, click Browse within Dataplex Lakes.
5. In the Scopefield, choose Incrementalor Entire data.
  - If you choose Incremental: In the Timestamp columnfield, select a column of type DATE or TIMESTAMP from your BigQuery table that increases as new records are added, and that can be used to identify new records. It can be a column that partitions the table.
6. To filter your data, select the Filter rowscheckbox. Provide a row filter consisting of a valid SQL expression that can be used as a part of a WHERE clause in GoogleSQL syntax . For example, col1 >= 0 . The filter can be a combination of multiple column conditions. For example, col1 >= 0 AND col2 < 10 .
7. To sample your data, in the Sampling sizelist, select a sampling percentage. Choose a percentage value that ranges between 0.0% and 100.0% with up to 3 decimal digits. For larger datasets, choose a lower sampling percentage. For example, for a 1 PB table, if you enter a value between 0.1% and 1.0%, the data quality scan samples between 1-10 TB of data. For incremental data scans, the data quality scan applies sampling to the latest increment.
8. To publish the data quality scan results as Dataplex Universal Catalog metadata, select the Publish results to Dataplex Catalogcheckbox.
  
  You can view the latest scan results on the Data qualitytab in the BigQuery and Dataplex Universal Catalog pages for the source table. To enable users to access the published scan results, see the Grant access to data quality scan results section of this document.
9. In the Schedulesection, choose one of the following options:
  - Repeat: Run the data quality scan on a schedule: hourly, daily, weekly, monthly, or custom. Specify how often the scan runs and at what time. If you choose custom, use cron format to specify the schedule.
  - On-demand: Run the data quality scan on demand.
  - One-time: Run the data quality scan once now, and remove the scan after the time-to-live period.
  - Time to live: The time-to-live value is the time span between when the scan is executed and when the scan is deleted. A data quality scan without a specified time-to-live is automatically deleted 24 hours after its execution. The time-to-live can range from 0 seconds (immediate deletion) to 365 days.
10. Click Continue.
In the Data quality ruleswindow, define the rules to configure for this data quality scan.
1. Click Add rules, and then choose from the following options.
  - Profile based recommendations: Build rules from the recommendations based on an existing data profiling scan.
    1. Choose columns: Select the columns to get recommended rules for.
    2. Choose scan project: If the data profiling scan is in a different project than the project where you are creating the data quality scan, then select the project to pull profile scans from.
    3. Choose profile results: Select one or more profile results and then click OK. This populates a list of suggested rules that you can use as a starting point.
    4. Select the checkbox for the rules that you want to add, and then click Select. Once selected, the rules are added to your current rule list. Then, you can edit the rules.
  - Built-in rule types: Build rules from predefined rules. See the list of predefined rules .
    1. Choose columns: Select the columns to select rules for.
    2. Choose rule types: Select the rule types that you want to choose from, and then click OK. The rule types that appear depend on the columns that you selected.
    3. Select the checkbox for the rules that you want to add, and then click Select. Once selected, the rules are added to your current rules list. Then, you can edit the rules.
  - SQL row check rule: Create a custom SQL rule to apply to each row.
    1. In Dimension, choose one dimension.
    2. In Passing threshold, choose a percentage of records that must pass the check.
    3. In Column name, choose a column.
    4. In the Provide a SQL expressionfield, enter a SQL expression that evaluates to a boolean true (pass) or false (fail). For more information, see Supported custom SQL rule types and the examples in Define data quality rules .
    5. Click Add.
  - SQL aggregate check rule: Create a custom SQL table condition rule.
    1. In Dimension, choose one dimension.
    2. In Column name, choose a column.
    3. In the Provide a SQL expressionfield, enter a SQL expression that evaluates to a boolean true (pass) or false (fail). For more information, see Supported custom SQL rule types and the examples in Define data quality rules .
    4. Click Add.
  - SQL assertion rule: Create a custom SQL assertion rule to check for an invalid state of the data.
    1. In Dimension, choose one dimension.
    2. Optional: In Column name, choose a column.
    3. In the Provide a SQL statementfield, enter a SQL statement that returns rows that match the invalid state. If any rows are returned, this rule fails. Omit the trailing semicolon from the SQL statement. For more information, see Supported custom SQL rule types and the examples in Define data quality rules .
    4. Click Add.
2. Optional: For any data quality rule, you can assign a custom rule name to use for monitoring and alerting, and a description. To do this, edit a rule and specify the following details:
  - Rule name: Enter a custom rule name with up to 63 characters. The rule name can include letters (a-z, A-Z), digits (0-9), and hyphens (-) and must start with a letter and end with a number or a letter.
  - Description: Enter a rule description with a maximum length of 1,024 characters.
3. Repeat the previous steps to add additional rules to the data quality scan. When finished, click Continue.
Optional: Export the scan results to a BigQuery standard table. In the Export scan results to BigQuery tablesection, do the following:
1. In the Select BigQuery datasetfield, click Browse. Select a BigQuery dataset to store the data quality scan results.
2. In the BigQuery tablefield, specify the table to store the data quality scan results. If you're using an existing table, make sure that it is compatible with the export table schema . If the specified table doesn't exist, Dataplex Universal Catalog creates it for you.
  
  Note: You can use the same results table for multiple data quality scans.
Optional: Add labels. Labels are key-value pairs that let you group related objects together or with other Google Cloud resources.
Optional: Set up email notification reports to alert people about the status and results of a data quality scan job. In the Notification reportsection, click Add email IDand enter up to five email addresses. Then, select the scenarios that you want to send reports for:
- Quality score (<=): sends a report when a job succeeds with a data quality score that is lower than the specified target score. Enter a target quality score between 0 and 100.
- Job failures: sends a report when the job itself fails, regardless of the data quality results.
- Job completion (success or failure): sends a report when the job ends, regardless of the data quality results.
Click Create.

After the scan is created, you can run it at any time by clicking Run now.

gcloud

To create a data quality scan, use the gcloud dataplex datascans create data-quality command .

If the source data is organized in a Dataplex Universal Catalog lake, include the --data-source-entity flag:

 gcloud  
dataplex  
datascans  
create  
data-quality  
 DATASCAN 
  
 \ 
  
--location = 
 LOCATION 
  
 \ 
  
--data-quality-spec-file = 
 DATA_QUALITY_SPEC_FILE 
  
 \ 
  
--data-source-entity = 
 DATA_SOURCE_ENTITY

If the source data isn't organized in a Dataplex Universal Catalog lake, include the --data-source-resource flag:

 gcloud  
dataplex  
datascans  
create  
data-quality  
 DATASCAN 
  
 \ 
  
--location = 
 LOCATION 
  
 \ 
  
--data-quality-spec-file = 
 DATA_QUALITY_SPEC_FILE 
  
 \ 
  
--data-source-resource = 
 DATA_SOURCE_RESOURCE

Replace the following variables:

DATASCAN : The name of the data quality scan.
LOCATION : The Google Cloud region in which to create the data quality scan.
DATA_QUALITY_SPEC_FILE : The path to the JSON or YAML file containing the specifications for the data quality scan. The file can be a local file or a Cloud Storage path with the prefix gs:// . Use this file to specify the data quality rules for the scan. You can also specify additional details in this file, such as filters, sampling percent, and post-scan actions like exporting to BigQuery or sending email notification reports. See the documentation for JSON representation and the example YAML representation .
DATA_SOURCE_ENTITY : The Dataplex Universal Catalog entity that contains the data for the data quality scan. For example, projects/test-project/locations/test-location/lakes/test-lake/zones/test-zone/entities/test-entity .
DATA_SOURCE_RESOURCE : The name of the resource that contains the data for the data quality scan. For example, //bigquery.googleapis.com/projects/test-project/datasets/test-dataset/tables/test-table .

C#

Before trying this sample, follow the C# setup instructions in the Dataplex Universal Catalog quickstart using client libraries . For more information, see the Dataplex Universal Catalog C# API reference documentation .

To authenticate to Dataplex Universal Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment .

  using 
  
  Google.Api.Gax.ResourceNames 
 
 ; 
 using 
  
  Google.Cloud.Dataplex.V1 
 
 ; 
 using 
  
  Google.LongRunning 
 
 ; 
 public 
  
 sealed 
  
 partial 
  
 class 
  
 GeneratedDataScanServiceClientSnippets 
 { 
  
 /// <summary>Snippet for CreateDataScan</summary> 
  
 /// <remarks> 
  
 /// This snippet has been automatically generated and should be regarded as a code template only. 
  
 /// It will require modifications to work: 
  
 /// - It may require correct/in-range values for request initialization. 
  
 /// - It may require specifying regional endpoints when creating the service client as shown in 
  
 ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint. 
  
 /// </remarks> 
  
 public 
  
 void 
  
 CreateDataScanRequestObject 
 () 
  
 { 
  
 // Create client 
  
  DataScanServiceClient 
 
  
 dataScanServiceClient 
  
 = 
  
  DataScanServiceClient 
 
 . 
  Create 
 
 (); 
  
 // Initialize request argument(s) 
  
  CreateDataScanRequest 
 
  
 request 
  
 = 
  
 new 
  
  CreateDataScanRequest 
 
  
 { 
  
 ParentAsLocationName 
  
 = 
  
  LocationName 
 
 . 
  FromProjectLocation 
 
 ( 
 "[PROJECT]" 
 , 
  
 "[LOCATION]" 
 ), 
  
 DataScan 
  
 = 
  
 new 
  
  DataScan 
 
 (), 
  
 DataScanId 
  
 = 
  
 "" 
 , 
  
 ValidateOnly 
  
 = 
  
 false 
 , 
  
 }; 
  
 // Make the request 
  
 Operation<DataScan 
 , 
  
 OperationMetadata 
>  
 response 
  
 = 
  
 dataScanServiceClient 
 . 
  CreateDataScan 
 
 ( 
 request 
 ); 
  
 // Poll until the returned long-running operation is complete 
  
 Operation<DataScan 
 , 
  
 OperationMetadata 
>  
 completedResponse 
  
 = 
  
 response 
 . 
 PollUntilCompleted 
 (); 
  
 // Retrieve the operation result 
  
  DataScan 
 
  
 result 
  
 = 
  
 completedResponse 
 . 
 Result 
 ; 
  
 // Or get the name of the operation 
  
 string 
  
 operationName 
  
 = 
  
 response 
 . 
 Name 
 ; 
  
 // This name can be stored, then the long-running operation retrieved later by name 
  
 Operation<DataScan 
 , 
  
 OperationMetadata 
>  
 retrievedResponse 
  
 = 
  
 dataScanServiceClient 
 . 
  PollOnceCreateDataScan 
 
 ( 
 operationName 
 ); 
  
 // Check if the retrieved long-running operation has completed 
  
 if 
  
 ( 
 retrievedResponse 
 . 
 IsCompleted 
 ) 
  
 { 
  
 // If it has completed, then access the result 
  
  DataScan 
 
  
 retrievedResult 
  
 = 
  
 retrievedResponse 
 . 
 Result 
 ; 
  
 } 
  
 } 
 }

Use auto data quality Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Required roles

Required data scan roles

Define data quality rules

Define data quality rules using built-in rule types

Define data quality rules using custom SQL rules

Define data quality rules using the gcloud CLI

Create a data quality scan

Console

gcloud

C#

C#

Go

Go

Java

Java

Node.js

Node.js

Python

Python

Ruby

Ruby

REST

Export table schema

Run a data quality scan

Console

gcloud

C#

C#

Go

Go

Java

Java

Python

Python

Ruby

Ruby

REST

View the data quality scan results

Console

gcloud

C#

C#

Go

Go

Java

Java

Python

Python

Ruby

Ruby

REST

View published results

View historical scan results

Console

gcloud

C#

C#

Go

Go

Java

Java

Python

Python

Ruby

Ruby

REST

Grant access to data quality scan results

Set alerts in Cloud Logging

Console

gcloud

REST

Sample queries for setting job level or dimension level alerts

Sample queries to set per rule alerts

Troubleshoot a data quality failure

Console

gcloud

REST

Manage data quality scans for a specific table

Use auto data quality