Knowledge Catalog (formerly Dataplex Universal Catalog) lets you identify common statistical characteristics (common values, data distribution, null counts) of the columns in your BigQuery tables. This information helps you to understand and analyze your data more effectively.
For more information about Knowledge Catalog data profile scans, see About data profiling .
Before you begin
Enable the Dataplex API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM
role ( roles/serviceusage.serviceUsageAdmin
), which
contains the serviceusage.services.enable
permission. Learn how to grant
roles
.
Required roles and permissions
This section describes the IAM roles and permissions needed to use Knowledge Catalog data profile scans.
User roles and permissions
To get the permissions that you need to create and manage data profile scans, ask your administrator to grant you the following IAM roles:
- Create, run, update, and delete data profile scans: Dataplex DataScan Editor
(
roles/dataplex.dataScanEditor) on the project containing the data scan - View data profile scan results, jobs, and history: Dataplex DataScan Viewer
(
roles/dataplex.dataScanViewer) on the project containing the data scan - Publish data profile scan results to Knowledge Catalog: Dataplex Catalog Editor
(
roles/dataplex.catalogEditor) on the@bigqueryentry group - View published data profile scan results in BigQuery on the Data profiletab: BigQuery Data Viewer
(
roles/bigquery.dataViewer) on the table
For more information about granting roles, see Manage access to projects, folders, and organizations .
These predefined roles contain the permissions required to create and manage data profile scans. To see the exact permissions that are required, expand the Required permissionssection:
Required permissions
The following permissions are required to create and manage data profile scans:
- Create, run, update, and delete data profile scans:
-
dataplex.datascans.createon project -
dataplex.datascans.updateon data scan -
dataplex.datascans.deleteon data scan -
dataplex.datascans.runon data scan -
dataplex.datascans.geton data scan -
dataplex.datascans.liston project -
dataplex.dataScanJobs.geton data scan job -
dataplex.dataScanJobs.liston data scan
-
- View data profile scan results, jobs, and history:
-
dataplex.datascans.getDataon data scan -
dataplex.datascans.liston project -
dataplex.dataScanJobs.geton data scan job -
dataplex.dataScanJobs.liston data scan
-
- Publish data profile scan results to Knowledge Catalog:
-
dataplex.entryGroups.useDataProfileAspecton entry group -
bigquery.tables.updateon table -
dataplex.entries.updateon entry
-
- View published data profile results for a table in BigQuery or Knowledge Catalog:
-
bigquery.tables.geton table -
bigquery.tables.getDataon table
-
You might also be able to get these permissions with custom roles or other predefined roles .
Knowledge Catalog service account roles and permissions
To ensure that the Knowledge Catalog service account has the necessary permissions to run data profile scans and export results, ask your administrator to grant the following IAM roles to the Knowledge Catalog service account:
- Run data profile scans against BigQuery data:
- BigQuery Job User
(
roles/bigquery.jobUser) on project running the scan - BigQuery Data Viewer
(
roles/bigquery.dataViewer) on tables being scanned
- BigQuery Job User
(
- Run data profile scans for BigQuery external tables that use Cloud Storage data:
- Storage Object Viewer
(
roles/storage.objectViewer) on Cloud Storage bucket - Storage Legacy Bucket Reader
(
roles/storage.legacyBucketReader) on Cloud Storage bucket
- Storage Object Viewer
(
- Run data profile scans for Iceberg REST Catalog tables on Google Cloud Lakehouse: BigLake Viewer
(
roles/biglake.viewer) on Iceberg Rest Catalog tables being scanned - Export data profile scan results to a BigQuery table: BigQuery Data Editor
(
roles/bigquery.dataEditor) on table
For more information about granting roles, see Manage access to projects, folders, and organizations .
These predefined roles contain the permissions required to run data profile scans and export results. To see the exact permissions that are required, expand the Required permissionssection:
Required permissions
The following permissions are required to run data profile scans and export results:
- Run data profile scans against BigQuery data:
-
bigquery.jobs.createon project -
bigquery.tables.geton table -
bigquery.tables.getDataon table
-
- Run data profile scans for BigQuery external tables that use Cloud Storage data:
-
storage.buckets.geton bucket -
storage.objects.geton object
-
- Export data profile scan results to a BigQuery table:
-
bigquery.tables.createon dataset -
bigquery.tables.updateDataon table
-
Your administrator might also be able to give the Knowledge Catalog service account these permissions with custom roles or other predefined roles .
If a table uses BigQuery row-level
security
, then Knowledge Catalog
can only scan rows visible to the Knowledge Catalog service account. To
allow Knowledge Catalog to scan all rows, add its service account to a row
filter where the predicate is TRUE
.
If a table uses BigQuery column-level security
, then Knowledge Catalog
requires access to scan protected columns. To grant access, give the
Knowledge Catalog service account the Data Catalog Fine-Grained Reader( roles/datacatalog.fineGrainedReader
)
role on all policy tags used in the table. The user creating or updating a data
scan also needs permissions on protected columns.
Grant roles to the Knowledge Catalog service account
To run data profile scans, Knowledge Catalog uses a service account that requires permissions to run BigQuery jobs and read BigQuery table data. To grant the required roles, follow these steps:
-
Get the Knowledge Catalog service account email address. If you haven't created a data profile or data quality scan in this project before, run the following
gcloudcommand to generate the service identity:gcloud beta services identity create --service = dataplex.googleapis.comThe command returns the service account email, which has the following format: service- PROJECT_ID @gcp-sa-dataplex.iam.gserviceaccount.com.
If the service account already exists, you can find its email by viewing principals with the Dataplexname on the IAMpage in the Google Cloud console.
-
Grant the service account the BigQuery Job User(
roles/bigquery.jobUser) role on your project. This role lets the service account run BigQuery jobs for the scan.gcloud projects add-iam-policy-binding PROJECT_ID \ --member = "serviceAccount:service- PROJECT_NUMBER @gcp-sa-dataplex.iam.gserviceaccount.com" \ --role = "roles/bigquery.jobUser"Replace the following:
-
PROJECT_ID: your Google Cloud project ID. -
service- PROJECT_NUMBER @gcp-sa-dataplex.iam.gserviceaccount.com: the email of the Knowledge Catalog service account.
-
-
Grant the service account the BigQuery Data Viewer(
roles/bigquery.dataViewer) role for each table that you want to profile. This role grants read-only access to the tables.gcloud bigquery tables add-iam-policy-binding DATASET_ID . TABLE_ID \ --member = "serviceAccount:service- PROJECT_NUMBER @gcp-sa-dataplex.iam.gserviceaccount.com" \ --role = "roles/bigquery.dataViewer"Replace the following:
-
DATASET_ID: the ID of the dataset containing the table. -
TABLE_ID: the ID of the table to profile. -
service- PROJECT_NUMBER @gcp-sa-dataplex.iam.gserviceaccount.com: the email of the Knowledge Catalog service account.
-
Configure execution identity
By default, data profile scans run using the Knowledge Catalog Service Agent. You can override this to use a custom service account or your own End-User Credentials (EUC).
Using a custom execution identity changes how you are billed for the scan. When you specify a custom execution identity, the compute and storage costs associated with the scan are billed directly to your BigQuery project, bypassing the standard Knowledge Catalog Premium SKUs.
Required permissions for custom execution identities
To configure a custom service account or use end-user credentials, you must have the following additional IAM permissions:
- To use a custom service account, you need the following permissions:
- The
iam.serviceAccounts.actAspermission granted for the project that contains the service account (for example,roles/iam.serviceAccountUser). - Your project's Service Agent
(
service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com) needs theiam.serviceAccounts.getAccessTokenpermission on the custom service account (for example, by having theroles/iam.serviceAccountTokenCreatorrole). - The custom service account needs
bigquery.tables.getDataon the table to scan,bigquery.jobs.insertin the scan project, andbigquery.dataEditoron the export dataset (if using export).
- The
- To use End-User Credentials, you need:
-
bigquery.tables.getDataon the table to scan. -
bigquery.jobs.insertin the scan project. -
bigquery.dataEditoron the export dataset (if using export).
-
Console
To configure the execution identity in the Google Cloud console, select the identity when you create your data profile scan .
- In the Execution Identitysection, select one of the
following:
- Dataplex service account: The default behavior.
- Specific service account: Enter the email address of the service account you want to use.
- User Credentials: Use your own credentials to run the scan.

