Create and use data profile scansStay organized with collectionsSave and categorize content based on your preferences.
Dataplex Universal Catalog lets you identify common statistical
characteristics (common values, data distribution, null counts) of the columns
in your BigQuery tables. This information helps you to understand
and analyze your data more effectively.
For more information about Dataplex Universal Catalog data profile scans, seeAbout data profiling.
Before you begin
Enable the Dataplex API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM
role (roles/serviceusage.serviceUsageAdmin), which
contains theserviceusage.services.enablepermission.Learn how to grant
roles.
To get the permissions that
you need to create and manage data profile scans,
ask your administrator to grant you the
following IAM roles on your resource such as the project or table:
To create, run, update, and delete data profile scans:Dataplex DataScan Editor(roles/dataplex.dataScanEditor) role on the project containing the data scan.
To allow Dataplex Universal Catalog to run data profile scans against BigQuery data, grant the following roles to theDataplex Universal Catalog service account:BigQuery Job User(roles/bigquery.jobUser) role on the project running the scan;BigQuery Data Viewer(roles/bigquery.dataViewer) role on the tables being scanned.
To run data profile scans for BigQuery external tables that use Cloud Storage data:
grant theDataplex Universal Catalog service accounttheStorage Object Viewer(roles/storage.objectViewer) andStorage Legacy Bucket Reader(roles/storage.legacyBucketReader) roles on the Cloud Storage bucket.
To view data profile scan results, jobs, and history:Dataplex DataScan Viewer(roles/dataplex.dataScanViewer) role on the project containing the data scan.
To export data profile scan results to a BigQuery table:BigQuery Data Editor(roles/bigquery.dataEditor) role on the table.
To publish data profile scan results to Dataplex Universal Catalog:Dataplex Catalog Editor(roles/dataplex.catalogEditor) role on the@bigqueryentry group.
To view published data profile scan results in BigQuery on theData profiletab:BigQuery Data Viewer(roles/bigquery.dataViewer) role on the table.
dataplex.datascans.updateon data scan—Update the description of aDataScan
dataplex.datascans.deleteon data scan—Delete aDataScan
dataplex.datascans.runon data scan—Run aDataScan
dataplex.datascans.geton data scan—ViewDataScandetails excluding results
dataplex.datascans.liston project—ListDataScans
dataplex.dataScanJobs.geton data scan job—Read DataScan job resources
dataplex.dataScanJobs.liston data scan—List DataScan job resources in a project
To allow Dataplex Universal Catalog to run data profile scans against BigQuery data:
bigquery.jobs.createon project—Run jobs
bigquery.tables.geton table—Get table metadata
bigquery.tables.getDataon table—Get table data
To run data profile scans for BigQuery external tables that use Cloud Storage data:
storage.buckets.geton bucket—Read bucket metadata
storage.objects.geton object—Read object data
To view data profile scan results, jobs, and history:
dataplex.datascans.getDataon data scan—ViewDataScandetails including results
dataplex.datascans.liston project—ListDataScans
dataplex.dataScanJobs.geton data scan job—Read DataScan job resources
dataplex.dataScanJobs.liston data scan—List DataScan job resources in a project
To export data profile scan results to a BigQuery table:
bigquery.tables.createon dataset—Create tables
bigquery.tables.updateDataon table—Write data to tables
To publish data profile scan results to Dataplex Universal Catalog:
dataplex.entryGroups.useDataProfileAspecton entry group—Allows Dataplex Universal Catalog data profile scans to save their results to Dataplex Universal Catalog
Additionally, you need one of the following permissions:
To view published data profile results for a table in BigQuery or Dataplex Universal Catalog:
bigquery.tables.geton table—Get table metadata
bigquery.tables.getDataon table—Get table data
If a table uses BigQueryrow-level
security, then Dataplex Universal Catalog
can only scan rows visible to the Dataplex Universal Catalog service account. To
allow Dataplex Universal Catalog to scan all rows, add its service account to a row
filter where the predicate isTRUE.
If a table uses BigQuerycolumn-level security, then Dataplex Universal Catalog
requires access to scan protected columns. To grant access, give the
Dataplex Universal Catalog service account theData Catalog Fine-Grained Reader(roles/datacatalog.fineGrainedReader)
role on all policy tags used in the table. The user creating or updating a data
scan also needs permissions on protected columns.
Grant roles to the Dataplex Universal Catalog service account
To run data profile scans, Dataplex Universal Catalog uses a service account that
requires permissions to run BigQuery jobs and read
BigQuery table data. To grant the required roles, follow
these steps:
Get the Dataplex Universal Catalog service account email address. If you haven't
created a data profile or data quality scan in this project before,
run the followinggcloudcommand to generate the service identity:
The command returns the service account email, which has the following format:service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com.
If the service account already exists, you can find its email by viewing
principals with theDataplexname on theIAMpagein the Google Cloud console.
Grant the service account theBigQuery Job User(roles/bigquery.jobUser) role on your project. This role lets the
service account run BigQuery jobs for the scan.
service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com: the email of the Dataplex Universal Catalog service account.
Grant the service account theBigQuery Data Viewer(roles/bigquery.dataViewer) role for each table that you want to
profile. This role grants read-only access to the tables.
In theTablefield, clickBrowse. Choose the table to scan, and
then clickSelect.
For tables in multi-region datasets, choose a region where to create
the data scan.
To browse the tables organized within Dataplex Universal Catalog lakes,
clickBrowse within Dataplex Lakes.
In theScopefield, chooseIncrementalorEntire data.
If you chooseIncremental data, in theTimestamp columnfield,
select a column of typeDATEorTIMESTAMPfrom your
BigQuery table that increases as new records are added,
and that can be used to identify new records. For tables partitioned on a
column of typeDATEorTIMESTAMP, we recommend using the partition
column as the timestamp field.
Optional: To filter your data, do any of the following:
To filter by rows, click select theFilter rowscheckbox.
Enter a valid SQL expression that can be used in aWHEREclause in GoogleSQL syntax.
For example:col1 >= 0.
The filter can be a combination of SQL conditions over multiple
columns. For example:col1 >= 0 AND col2 < 10.
To filter by columns, select theFilter columnscheckbox.
To include columns in the profile scan, in theInclude columnsfield, clickBrowse. Select the columns to include, and then
clickSelect.
To exclude columns from the profile scan, in theExclude columnsfield, clickBrowse. Select the columns to exclude, and then
clickSelect.
To apply sampling to your data profile scan, in theSampling sizelist, select a sampling percentage. Choose a percentage value that ranges
between 0.0% and 100.0% with up to 3 decimal digits.
For larger datasets, choose a lower sampling percentage. For example,
for a 1 PB table, if you enter a value between 0.1% and 1.0%,
the data profile samples between 1-10 TB of data.
There must be at least 100 records in the sampled data to return a result.
For incremental data scans, the data profile scan applies sampling to
the latest increment.
Optional: Publish the data profile scan results in the
BigQuery and Dataplex Universal Catalog pages in the
Google Cloud console for the source table. Select thePublish results to BigQuery and Dataplex Catalogcheckbox.
You can view the latest scan results in theData profiletab in the
BigQuery and Dataplex Universal Catalog pages for the source
table. To enable users to access the published scan results, see theGrant access to data profile scan resultssection
of this document.
The publishing option might not be available in the following cases:
You don't have the required permissions on the table.
Another data quality scan is set to publish results.
In theSchedulesection, choose one of the following options:
Repeat: Run the data profile scan on a schedule: hourly, daily,
weekly, monthly, or custom. Specify how often the scan should run and
at what time. If you choose custom, usecronformat to specify the
schedule.
On-demand: Run the data profile scan on demand.
ClickContinue.
Optional: Export the scan results to a BigQuery standard
table. In theExport scan results to BigQuery tablesection, do the
following:
In theSelect BigQuery datasetfield, clickBrowse. Select a
BigQuery dataset to store the data profile scan results.
In theBigQuery tablefield, specify the table to store the data
profile scan results. If you're using an existing table, make sure
that it is compatible with theexport table schema.
If the specified table doesn't exist, Dataplex Universal Catalog creates
it for you.
Optional: Add labels. Labels are key-value pairs that let you group
related objects together or with other Google Cloud resources.
To create the scan, clickCreate.
If you set the schedule to on-demand, you can also run the scan now
by clickingRun scan.
LOCATION: The Google Cloud region in which to create
the data profile scan.
DATA_SOURCE_ENTITY: The Dataplex Universal Catalog
entity that contains the data for the data profile scan. For example,projects/test-project/locations/test-location/lakes/test-lake/zones/test-zone/entities/test-entity.
DATA_SOURCE_RESOURCE: The name of the resource
that contains the data for the data profile scan. For example,//bigquery.googleapis.com/projects/test-project/datasets/test-dataset/tables/test-table.
usingGoogle.Api.Gax.ResourceNames;usingGoogle.Cloud.Dataplex.V1;usingGoogle.LongRunning;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for CreateDataScan</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in/// https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidCreateDataScanRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)CreateDataScanRequestrequest=newCreateDataScanRequest{ParentAsLocationName=LocationName.FromProjectLocation("[PROJECT]","[LOCATION]"),DataScan=newDataScan(),DataScanId="",ValidateOnly=false,};// Make the requestOperation<DataScan,OperationMetadata>response=dataScanServiceClient.CreateDataScan(request);// Poll until the returned long-running operation is completeOperation<DataScan,OperationMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultDataScanresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<DataScan,OperationMetadata>retrievedResponse=dataScanServiceClient.PollOnceCreateDataScan(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultDataScanretrievedResult=retrievedResponse.Result;}}}
packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.CreateDataScanRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#CreateDataScanRequest.}op,err:=c.CreateDataScan(ctx,req)iferr!=nil{// TODO: Handle error.}resp,err:=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}
importcom.google.cloud.dataplex.v1.CreateDataScanRequest;importcom.google.cloud.dataplex.v1.DataScan;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.LocationName;publicclassSyncCreateDataScan{publicstaticvoidmain(String[]args)throwsException{syncCreateDataScan();}publicstaticvoidsyncCreateDataScan()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){CreateDataScanRequestrequest=CreateDataScanRequest.newBuilder().setParent(LocationName.of("[PROJECT]","[LOCATION]").toString()).setDataScan(DataScan.newBuilder().build()).setDataScanId("dataScanId1260787906").setValidateOnly(true).build();DataScanresponse=dataScanServiceClient.createDataScanAsync(request).get();}}}
# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in:# https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_create_data_scan():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)data_scan=dataplex_v1.DataScan()data_scan.data_quality_spec.rules.dimension="dimension_value"data_scan.data.entity="entity_value"request=dataplex_v1.CreateDataScanRequest(parent="parent_value",data_scan=data_scan,data_scan_id="data_scan_id_value",)# Make the requestoperation=client.create_data_scan(request=request)print("Waiting for operation to complete...")response=operation.result()# Handle the responseprint(response)
require"google/cloud/dataplex/v1"### Snippet for the create_data_scan call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#create_data_scan.#defcreate_data_scan# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::CreateDataScanRequest.new# Call the create_data_scan method.result=client.create_data_scanrequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endend
(Valid only if column type is numeric - integer/float)
max_value
float
nullable
(Valid only if column type is numeric - integer/float)
average_value
float
nullable
(Valid only if column type is numeric - integer/float)
standard_deviation
float
nullable
(Valid only if column type is numeric - integer/float)
quartile_lower
integer
nullable
(Valid only if column type is numeric - integer/float)
quartile_median
integer
nullable
(Valid only if column type is numeric - integer/float)
quartile_upper
integer
nullable
(Valid only if column type is numeric - integer/float)
top_n
struct/record - repeated
value
string
nullable
"4009"
count
integer
nullable
20
percent
float
nullable
10(indicates 10%)
Export table setup
When you export toBigQueryExporttables, follow these guidelines:
For the fieldresultsTable, use the format://bigquery.googleapis.com/projects/{project-id}/datasets/{dataset-id}/tables/{table-id}.
Use a BigQuery standard table.
If the table doesn't exist when the scan is created or updated,
Dataplex Universal Catalog creates the table for you.
By default, the table is partitioned on thejob_start_timecolumn daily.
If you want the table to be partitioned in other configurations or if
you don't want the partition, then recreate the table with the required
schema and configurations and then provide the pre-created table as the
results table.
Make sure the results table is in the same location as the source table.
If VPC-SC is configured on the project, then the results table must be in the
same VPC-SC perimeter as the source table.
If the table is modified during the scan execution stage, then the current
running job exports to the previous results table and the table change
takes effect from the next scan job.
Don't modify the table schema. If you need customized columns, create a view
upon the table.
To reduce costs, set an expiration on the partition based on your use case.
For more information, see how toset the partition expiration.
Create multiple data profile scans
You can configure data profile scans for multiple tables in a
BigQuery dataset at the same time by using the Google Cloud console.
In the Google Cloud console, go to the Dataplex Universal CatalogData profiling & qualitypage.
Enter anID prefix. Dataplex Universal Catalog automatically generates scan
IDs by using the provided prefix and unique suffixes.
Enter aDescriptionfor all of the data profile scans.
In theDatasetfield, clickBrowse. Select a dataset to pick tables
from. ClickSelect.
If the dataset is multi-regional, select aRegionin which to create the
data profile scans.
Configure the common settings for the scans:
In theScopefield, chooseIncrementalorEntire data.
To apply sampling to the data profile scans, in theSampling sizelist, select a sampling percentage.
Choose a percentage value between 0.0% and 100.0% with up to 3 decimal
digits.
Optional: Publish the data profile scan results in the
BigQuery and Dataplex Universal Catalog pages in the
Google Cloud console for the source table. Select thePublish results to BigQuery and Dataplex Catalogcheckbox.
You can view the latest scan results in theData profiletab in the
BigQuery and Dataplex Universal Catalog pages for the source
table. To enable users to access the published scan results, see theGrant access to data profile scan
resultssection of this document.
In theSchedulesection, choose one of the following options:
Repeat: Run the data profile scans on a schedule: hourly, daily,
weekly, monthly, or custom. Specify how often the scans should run and
at what time. If you choose custom, usecronformat to specify the
schedule.
On-demand: Run the data profile scans on demand.
ClickContinue.
In theChoose tablesfield, clickBrowse. Choose one or more tables
to scan, and then clickSelect.
ClickContinue.
Optional: Export the scan results to a BigQuery standard
table. In theExport scan results to BigQuery tablesection, do the
following:
In theSelect BigQuery datasetfield, clickBrowse. Select a
BigQuery dataset to store the data profile scan results.
In theBigQuery tablefield, specify the table to store the data
profile scan results. If you're using an existing table, make sure that
it is compatible with theexport table schema.
If the specified table doesn't exist, Dataplex Universal Catalog creates it
for you.
Dataplex Universal Catalog uses the same results table for all of the data
profile scans.
Optional: Add labels. Labels are key-value pairs that let you group related
objects together or with other Google Cloud resources.
To create the scans, clickCreate.
If you set the schedule to on-demand, you can also run the scans now by
clickingRun scan.
Run a data profile scan
Console
In the Google Cloud console, go to the Dataplex Universal CatalogData profiling & qualitypage.
usingGoogle.Cloud.Dataplex.V1;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for RunDataScan</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in/// https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidRunDataScanRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)RunDataScanRequestrequest=newRunDataScanRequest{DataScanName=DataScanName.FromProjectLocationDataScan("[PROJECT]","[LOCATION]","[DATASCAN]"),};// Make the requestRunDataScanResponseresponse=dataScanServiceClient.RunDataScan(request);}}
packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.RunDataScanRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#RunDataScanRequest.}resp,err:=c.RunDataScan(ctx,req)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}
importcom.google.cloud.dataplex.v1.DataScanName;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.RunDataScanRequest;importcom.google.cloud.dataplex.v1.RunDataScanResponse;publicclassSyncRunDataScan{publicstaticvoidmain(String[]args)throwsException{syncRunDataScan();}publicstaticvoidsyncRunDataScan()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){RunDataScanRequestrequest=RunDataScanRequest.newBuilder().setName(DataScanName.of("[PROJECT]","[LOCATION]","[DATASCAN]").toString()).build();RunDataScanResponseresponse=dataScanServiceClient.runDataScan(request);}}}
# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in:# https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_run_data_scan():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)request=dataplex_v1.RunDataScanRequest(name="name_value",)# Make the requestresponse=client.run_data_scan(request=request)# Handle the responseprint(response)
require"google/cloud/dataplex/v1"### Snippet for the run_data_scan call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#run_data_scan.#defrun_data_scan# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::RunDataScanRequest.new# Call the run_data_scan method.result=client.run_data_scanrequest# The returned object is of type Google::Cloud::Dataplex::V1::RunDataScanResponse.presultend
TheOverviewsection displays information about the most recent
jobs, including when the scan was run, the number of table records
scanned, and the job status.
TheData profile scan configurationsection displays details about
the scan.
To see detailed information about a job, such as the scanned table's
columns, statistics about the columns that were found in the scan, and the
job logs, click theJobs historytab. Then, click a job ID.
usingGoogle.Cloud.Dataplex.V1;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for GetDataScan</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in/// https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidGetDataScanRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)GetDataScanRequestrequest=newGetDataScanRequest{DataScanName=DataScanName.FromProjectLocationDataScan("[PROJECT]","[LOCATION]","[DATASCAN]"),View=GetDataScanRequest.Types.DataScanView.Unspecified,};// Make the requestDataScanresponse=dataScanServiceClient.GetDataScan(request);}}
packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.GetDataScanRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#GetDataScanRequest.}resp,err:=c.GetDataScan(ctx,req)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}
importcom.google.cloud.dataplex.v1.DataScan;importcom.google.cloud.dataplex.v1.DataScanName;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.GetDataScanRequest;publicclassSyncGetDataScan{publicstaticvoidmain(String[]args)throwsException{syncGetDataScan();}publicstaticvoidsyncGetDataScan()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){GetDataScanRequestrequest=GetDataScanRequest.newBuilder().setName(DataScanName.of("[PROJECT]","[LOCATION]","[DATASCAN]").toString()).build();DataScanresponse=dataScanServiceClient.getDataScan(request);}}}
# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in:# https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_get_data_scan():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)request=dataplex_v1.GetDataScanRequest(name="name_value",)# Make the requestresponse=client.get_data_scan(request=request)# Handle the responseprint(response)
require"google/cloud/dataplex/v1"### Snippet for the get_data_scan call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#get_data_scan.#defget_data_scan# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::GetDataScanRequest.new# Call the get_data_scan method.result=client.get_data_scanrequest# The returned object is of type Google::Cloud::Dataplex::V1::DataScan.presultend
If the data profile scan results are published to the BigQuery
and Dataplex Universal Catalog pages in the Google Cloud console, then you can
see the latest scan results on the source table'sData profiletab.
In the Google Cloud console, go to the Dataplex Universal CatalogSearchpage.
TheLatest job resultstab, when there is at least one successfully
completed run, provides information about the most recent job. It lists the scanned
table's columns and statistics about the columns that were found in the scan.
TheJobs historytab provides information about past jobs, such as
the number of records scanned in each job, the job status, and the time the
job was run.
To view detailed information about a job, click any of the jobs in theJob IDcolumn.
usingGoogle.Api.Gax;usingGoogle.Cloud.Dataplex.V1;usingSystem;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for ListDataScanJobs</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in/// https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidListDataScanJobsRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)ListDataScanJobsRequestrequest=newListDataScanJobsRequest{ParentAsDataScanName=DataScanName.FromProjectLocationDataScan("[PROJECT]","[LOCATION]","[DATASCAN]"),Filter="",};// Make the requestPagedEnumerable<ListDataScanJobsResponse,DataScanJob>response=dataScanServiceClient.ListDataScanJobs(request);// Iterate over all response items, lazily performing RPCs as requiredforeach(DataScanJobiteminresponse){// Do something with each itemConsole.WriteLine(item);}// Or iterate over pages (of server-defined size), performing one RPC per pageforeach(ListDataScanJobsResponsepageinresponse.AsRawResponses()){// Do something with each page of itemsConsole.WriteLine("A page of results:");foreach(DataScanJobiteminpage){// Do something with each itemConsole.WriteLine(item);}}// Or retrieve a single page of known size (unless it's the final page), performing as many RPCs as requiredintpageSize=10;Page<DataScanJob>singlePage=response.ReadPage(pageSize);// Do something with the page of itemsConsole.WriteLine($"A page of {pageSize} results (unless it's the final page):");foreach(DataScanJobiteminsinglePage){// Do something with each itemConsole.WriteLine(item);}// Store the pageToken, for when the next page is required.stringnextPageToken=singlePage.NextPageToken;}}
packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb""google.golang.org/api/iterator")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.ListDataScanJobsRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#ListDataScanJobsRequest.}it:=c.ListDataScanJobs(ctx,req)for{resp,err:=it.Next()iferr==iterator.Done{break}iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp// If you need to access the underlying RPC response,// you can do so by casting the `Response` as below.// Otherwise, remove this line. Only populated after// first call to Next(). Not safe for concurrent access._=it.Response.(*dataplexpb.ListDataScanJobsResponse)}}
importcom.google.cloud.dataplex.v1.DataScanJob;importcom.google.cloud.dataplex.v1.DataScanName;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.ListDataScanJobsRequest;publicclassSyncListDataScanJobs{publicstaticvoidmain(String[]args)throwsException{syncListDataScanJobs();}publicstaticvoidsyncListDataScanJobs()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){ListDataScanJobsRequestrequest=ListDataScanJobsRequest.newBuilder().setParent(DataScanName.of("[PROJECT]","[LOCATION]","[DATASCAN]").toString()).setPageSize(883849137).setPageToken("pageToken873572522").setFilter("filter-1274492040").build();for(DataScanJobelement:dataScanServiceClient.listDataScanJobs(request).iterateAll()){// doThingsWith(element);}}}}
# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in:# https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_list_data_scan_jobs():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)request=dataplex_v1.ListDataScanJobsRequest(parent="parent_value",)# Make the requestpage_result=client.list_data_scan_jobs(request=request)# Handle the responseforresponseinpage_result:print(response)
require"google/cloud/dataplex/v1"### Snippet for the list_data_scan_jobs call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#list_data_scan_jobs.#deflist_data_scan_jobs# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::ListDataScanJobsRequest.new# Call the list_data_scan_jobs method.result=client.list_data_scan_jobsrequest# The returned object is of type Gapic::PagedEnumerable. You can iterate# over elements, and API calls will be issued to fetch pages as needed.result.eachdo|item|# Each element is of type ::Google::Cloud::Dataplex::V1::DataScanJob.pitemendend
Click the data quality scan you want to share the results of.
Click thePermissionstab.
Do the following:
To grant access to a principal, clickperson_addGrant access. Grant theDataplex DataScan DataViewerrole to the
associated principal.
To remove access from a principal, select the principal that you
want to remove theDataplex DataScan DataViewerrole from. Clickperson_removeRemove access, and then confirm when prompted.
Manage data profile scans for a specific table
The steps in this document show how to manage data profile scans across your
project by using the Dataplex Universal CatalogData profiling & qualitypage
in the Google Cloud console.
You can also create and manage data profile scans when working with a
specific table. In the Google Cloud console, on the Dataplex Universal Catalog
page for the table, use theData profiletab. Do the following:
In the Google Cloud console, go to the Dataplex Universal CatalogSearchpage.
Depending on whether the table has a data profile scan whose results are
published, you can work with the table's data profile scans in the following ways:
Data profile scan results are published: the latest published scan
results are displayed on the page.
To manage the data profile scans for this table, clickData profile
scan, and then select from the following options:
Create new scan: create a new data profile scan. For more
information, see theCreate a data profile scansection
of this document. When you create a scan from a table's details page, the
table is preselected.
Run now: run the scan.
Edit scan configuration: edit settings including the display name,
filters, sampling size, and schedule.
View all scans: view a list of data profile scans that apply to this
table.
Data profile scan results aren't published: click the menu next toQuick data profile, and then select from the following options:
Customize data profiling: create a new data profile scan. For more
information, see theCreate a data profile scansection
of this document. When you create a scan from a table's details page, the
table is preselected.
View previous profiles: view a list of data profile scans that
apply to this table.
Update a data profile scan
Console
In the Google Cloud console, go to the Dataplex Universal CatalogData profiling & qualitypage.
usingGoogle.Cloud.Dataplex.V1;usingGoogle.LongRunning;usingGoogle.Protobuf.WellKnownTypes;publicsealedpartialclassGeneratedDataScanServiceClientSnippets{/// <summary>Snippet for UpdateDataScan</summary>/// <remarks>/// This snippet has been automatically generated and should be regarded as a code template only./// It will require modifications to work:/// - It may require correct/in-range values for request initialization./// - It may require specifying regional endpoints when creating the service client as shown in/// https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint./// </remarks>publicvoidUpdateDataScanRequestObject(){// Create clientDataScanServiceClientdataScanServiceClient=DataScanServiceClient.Create();// Initialize request argument(s)UpdateDataScanRequestrequest=newUpdateDataScanRequest{DataScan=newDataScan(),UpdateMask=newFieldMask(),ValidateOnly=false,};// Make the requestOperation<DataScan,OperationMetadata>response=dataScanServiceClient.UpdateDataScan(request);// Poll until the returned long-running operation is completeOperation<DataScan,OperationMetadata>completedResponse=response.PollUntilCompleted();// Retrieve the operation resultDataScanresult=completedResponse.Result;// Or get the name of the operationstringoperationName=response.Name;// This name can be stored, then the long-running operation retrieved later by nameOperation<DataScan,OperationMetadata>retrievedResponse=dataScanServiceClient.PollOnceUpdateDataScan(operationName);// Check if the retrieved long-running operation has completedif(retrievedResponse.IsCompleted){// If it has completed, then access the resultDataScanretrievedResult=retrievedResponse.Result;}}}
packagemainimport("context"dataplex"cloud.google.com/go/dataplex/apiv1"dataplexpb"cloud.google.com/go/dataplex/apiv1/dataplexpb")funcmain(){ctx:=context.Background()// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in:// https://pkg.go.dev/cloud.google.com/go#hdr-Client_Optionsc,err:=dataplex.NewDataScanClient(ctx)iferr!=nil{// TODO: Handle error.}deferc.Close()req:=&dataplexpb.UpdateDataScanRequest{// TODO: Fill request struct fields.// See https://pkg.go.dev/cloud.google.com/go/dataplex/apiv1/dataplexpb#UpdateDataScanRequest.}op,err:=c.UpdateDataScan(ctx,req)iferr!=nil{// TODO: Handle error.}resp,err:=op.Wait(ctx)iferr!=nil{// TODO: Handle error.}// TODO: Use resp._=resp}
importcom.google.cloud.dataplex.v1.DataScan;importcom.google.cloud.dataplex.v1.DataScanServiceClient;importcom.google.cloud.dataplex.v1.UpdateDataScanRequest;importcom.google.protobuf.FieldMask;publicclassSyncUpdateDataScan{publicstaticvoidmain(String[]args)throwsException{syncUpdateDataScan();}publicstaticvoidsyncUpdateDataScan()throwsException{// This snippet has been automatically generated and should be regarded as a code template only.// It will require modifications to work:// - It may require correct/in-range values for request initialization.// - It may require specifying regional endpoints when creating the service client as shown in// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_librarytry(DataScanServiceClientdataScanServiceClient=DataScanServiceClient.create()){UpdateDataScanRequestrequest=UpdateDataScanRequest.newBuilder().setDataScan(DataScan.newBuilder().build()).setUpdateMask(FieldMask.newBuilder().build()).setValidateOnly(true).build();DataScanresponse=dataScanServiceClient.updateDataScanAsync(request).get();}}}
# This snippet has been automatically generated and should be regarded as a# code template only.# It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in:# https://googleapis.dev/python/google-api-core/latest/client_options.htmlfromgoogle.cloudimportdataplex_v1defsample_update_data_scan():# Create a clientclient=dataplex_v1.DataScanServiceClient()# Initialize request argument(s)data_scan=dataplex_v1.DataScan()data_scan.data_quality_spec.rules.dimension="dimension_value"data_scan.data.entity="entity_value"request=dataplex_v1.UpdateDataScanRequest(data_scan=data_scan,)# Make the requestoperation=client.update_data_scan(request=request)print("Waiting for operation to complete...")response=operation.result()# Handle the responseprint(response)
require"google/cloud/dataplex/v1"### Snippet for the update_data_scan call in the DataScanService service## This snippet has been automatically generated and should be regarded as a code# template only. It will require modifications to work:# - It may require correct/in-range values for request initialization.# - It may require specifying regional endpoints when creating the service# client as shown in https://cloud.google.com/ruby/docs/reference.## This is an auto-generated example demonstrating basic usage of# Google::Cloud::Dataplex::V1::DataScanService::Client#update_data_scan.#defupdate_data_scan# Create a client object. The client can be reused for multiple calls.client=Google::Cloud::Dataplex::V1::DataScanService::Client.new# Create a request. To set request fields, pass in keyword arguments.request=Google::Cloud::Dataplex::V1::UpdateDataScanRequest.new# Call the update_data_scan method.result=client.update_data_scanrequest# The returned object is of type Gapic::Operation. You can use it to# check the status of an operation, cancel it, or wait for results.# Here is how to wait for a response.result.wait_until_done!timeout:60ifresult.response?presult.responseelseputs"No response received."endend
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-11-14 UTC."],[],[]]