Analyze data profiles stored in BigQuery

If you configured the sensitive data discovery service to send all successfully generated data profiles to BigQuery, you can query those data profiles to gain insights about your data. You can also use visualization tools like Looker Studio to build custom reports tailored to your business needs. Alternatively, you can use a premade report that Sensitive Data Protection provides, tune it, and share it as needed.

This page provides example SQL queries that you can use to learn more about your data profiles. It also shows you how you can visualize data profiles in in Looker Studio.

For more information on data profiles, see Data profiles .

Before you begin

This page assumes that you have configured profiling at the organization, folder, or project level. In your discovery scan configuration, make sure that the Save data profile copies to BigQueryaction is enabled. For more information about how to create a discovery scan configuration, see Create a scan configuration .

In this document, the table that contains the exported data profiles is called the output table .

Make sure you have the project ID, dataset ID, and table ID of the output table readily available. You need them to perform the procedures on this page.

The latest view

When Sensitive Data Protection exports data profiles to your output table, it also creates the latest view . This view is a prefiltered virtual table that includes only the latest snapshots of your data profiles. The latest view has the same schema as the output table, so you can use the two interchangeably in your SQL queries and Looker Studio reports. The results can differ because the output table contains older snapshots of the data profiles.

The latest view is stored in the same location as the output table. Its name has the following format:

  OUTPUT_TABLE 
_latest_ VERSION 
 

Replace the following:

  • OUTPUT_TABLE : the ID of the table that contains the exported data profiles.
  • VERSION : the version number of the view.

For example, if your output table's name is table-profile , then the latest view has a name like table-profile_latest_v1 .

Latest view

When using the latest view in SQL queries, use the view's full name, which includes the project ID, dataset ID, table ID, and suffix—for example, myproject.mydataset.table-profile_latest_v1 .

  PROJECT_ID 
. DATASET_ID 
. OUTPUT_TABLE 
_latest_ VERSION 
 

Choose between the output table and the latest view

The latest view includes only the latest data profile snapshots, while the output table has all data profile snapshots, including snapshots that are outdated. For example, a query on the output table can return multiple column data profiles for the same column—one for each time that column was profiled.

When choosing between using the output table and the latest view in your SQL queries or Looker Studio reports, consider the following:

  • The latest view is useful if you have data assets that have been reprofiled and you only want to see the latest profiles—not their earlier versions. That is, you want to see the current state of your profiled data.

  • The output table is useful if you want to get a historical view of your profiled data. For example, you're trying to determine whether your organization has ever stored a particular infoType, or you want to see the changes a particular data profile has undergone.

Sample SQL queries

This section provides example queries that you can use when analyzing data profiles. To run these queries, see Running interactive queries .

In the following examples, replace TABLE_OR_VIEW with either of the following:

  • The name of the output table, which is the table that contains the exported data profiles—for example, myproject.mydataset.table-profile .
  • The name of the latest view of the output table—for example, myproject.mydataset.table-profile_latest_v1 .

In either case, you must include the project ID and the dataset ID.

For more information, see Choose between the output table and the latest view on this page.

To troubleshoot any errors you encounter, see Error messages .

List all columns that have a high free-text score and evidence of other infoType matches

  SELECT 
  
 column_profile 
 . 
 table_full_resource 
 , 
  
 column_profile 
 . 
 COLUMN 
 , 
  
 other_matches 
 . 
 info_type 
 . 
 name 
 , 
  
 column_profile 
 . 
 profile_last_generated 
 FROM 
  
 ` 
  TABLE_OR_VIEW 
 
 ` 
  
 LEFT 
  
 JOIN 
  
 UNNEST 
 ( 
 column_profile 
 . 
 other_matches 
 ) 
  
 AS 
  
 other_matches 
 WHERE 
  
 column_profile 
 . 
 free_text_score 
  
 = 
  
 1 
  
 AND 
  
 ( 
  
 column_profile 
 . 
 column_info_type 
 . 
 info_type 
 . 
 name 
> "" 
  
 OR 
  
 ARRAY_LENGTH 
 ( 
 column_profile 
 . 
 other_matches 
 ) 
> 0 
  
 ) 
 

For information about how to remediate these findings, see Recommended strategies for mitigating data risk .

For more information about the Free text scoreand Other infoTypesmetrics, see Column data profiles .

List all tables that contain a column of credit card numbers

  SELECT 
  
 column_profile 
 . 
 table_full_resource 
 , 
  
 column_profile 
 . 
 profile_last_generated 
 FROM 
  
 ` 
  TABLE_OR_VIEW 
 
 ` 
 WHERE 
  
 column_profile 
 . 
 column_info_type 
 . 
 info_type 
 . 
 name 
 = 
 "CREDIT_CARD_NUMBER" 
 

CREDIT_CARD_NUMBER is a built-in infoType that represents a credit card number.

For information about how to remediate these findings, see Recommended strategies for mitigating data risk .

List table profiles that contain columns of credit card numbers, US Social Security numbers, and person names

  SELECT 
  
 table_full_resource 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 count_findings 
 FROM 
  
 ( 
  
 SELECT 
  
 DISTINCT 
  
 column_profile 
 . 
 table_full_resource 
 , 
  
 column_profile 
 . 
 column_info_type 
 . 
 info_type 
 . 
 name 
  
 FROM 
  
 ` 
  TABLE_OR_VIEW 
 
 ` 
  
 WHERE 
  
 column_profile 
 . 
 column_info_type 
 . 
 info_type 
 . 
 name 
  
 IN 
  
 ( 
 'PERSON_NAME' 
 , 
  
 'CREDIT_CARD_NUMBER' 
 , 
  
 'US_SOCIAL_SECURITY_NUMBER' 
 ) 
  
 ORDER 
  
 BY 
  
 column_profile 
 . 
 table_full_resource 
  
 ) 
  
 ot1 
 GROUP 
  
 BY 
  
 table_full_resource 
  
 # 
 increase 
  
 this 
  
 number 
  
 to 
  
 match 
  
 the 
  
 total 
  
 distinct 
  
 infoTypes 
  
 that 
  
 must 
  
 be 
  
 present 
 HAVING 
  
 count_findings 
> = 
 3 
 

This query uses the following built-in infoTypes :

  • CREDIT_CARD_NUMBER : represents a credit card number
  • PERSON_NAME : represents the full name of a person
  • US_SOCIAL_SECURITY_NUMBER represents a US Social Security number

For information about how to remediate these findings, see Recommended strategies for mitigating data risk .

List buckets where the sensitivity score is SENSITIVITY_HIGH

  SELECT 
  
 file_store_profile 
 . 
 file_store_path 
 , 
  
 file_store_profile 
 . 
 resource_visibility 
 , 
  
 file_store_profile 
 . 
 sensitivity_score 
 FROM 
  
 ` 
  TABLE_OR_VIEW 
 
 ` 
 WHERE 
  
 file_store_profile 
 . 
 sensitivity_score 
 . 
 score 
  
 = 
 'SENSITIVITY_HIGH' 
 ; 
 

For more information, see File store data profiles .

List all bucket paths, clusters, and file extensions scanned where the sensitivity score is SENSITIVITY_HIGH

  SELECT 
  
 file_store_profile 
 . 
 file_store_path 
 , 
  
 summaries 
 . 
 file_cluster_type 
 . 
 cluster 
 , 
  
 STRING_AGG 
 ( 
 scanned_file_extensions 
 . 
 file_extension 
 ) 
  
 AS 
  
 scanned_extensions 
 , 
  
 file_store_profile 
 . 
 profile_last_generated 
 . 
 timestamp 
 FROM 
  
 ` 
  TABLE_OR_VIEW 
 
 ` 
 LEFT 
  
 JOIN 
  
 UNNEST 
 ( 
 file_store_profile 
 . 
 file_cluster_summaries 
 ) 
  
 as 
  
 summaries 
 LEFT 
  
 JOIN 
  
 UNNEST 
 ( 
 summaries 
 . 
 file_store_info_type_summaries 
 ) 
  
 as 
  
 info_types 
 LEFT 
  
 JOIN 
  
 UNNEST 
 ( 
 summaries 
 . 
 file_extensions_scanned 
 ) 
  
 as 
  
 scanned_file_extensions 
 WHERE 
  
 file_store_profile 
 . 
 data_source_type 
 . 
 data_source 
  
 = 
  
 'google/storage/bucket' 
 AND 
  
 summaries 
 . 
 sensitivity_score 
 . 
 score 
  
 = 
 'SENSITIVITY_HIGH' 
 GROUP 
  
 BY 
  
 1 
 , 
  
 2 
 , 
  
 4 
 ; 
 

For more information, see File store data profiles .

List all bucket paths, clusters, and file extensions scanned where credit card numbers were detected

  SELECT 
  
 file_store_profile 
 . 
 file_store_path 
 , 
  
 summaries 
 . 
 file_cluster_type 
 . 
 cluster 
 , 
  
 STRING_AGG 
 ( 
 scanned_file_extensions 
 . 
 file_extension 
 ) 
  
 AS 
  
 scanned_extensions 
 FROM 
  
 ` 
  TABLE_OR_VIEW 
 
 ` 
 LEFT 
  
 JOIN 
  
 UNNEST 
 ( 
 file_store_profile 
 . 
 file_cluster_summaries 
 ) 
  
 as 
  
 summaries 
 LEFT 
  
 JOIN 
  
 UNNEST 
 ( 
 summaries 
 . 
 file_store_info_type_summaries 
 ) 
  
 as 
  
 info_types 
 LEFT 
  
 JOIN 
  
 UNNEST 
 ( 
 summaries 
 . 
 file_extensions_scanned 
 ) 
  
 as 
  
 scanned_file_extensions 
 WHERE 
  
 file_store_profile 
 . 
 data_source_type 
 . 
 data_source 
  
 = 
  
 'google/storage/bucket' 
 AND 
  
 info_types 
 . 
 info_type 
 . 
 name 
 = 
 'CREDIT_CARD_NUMBER' 
 GROUP 
  
 BY 
  
 1 
 , 
  
 2 
 ; 
 

CREDIT_CARD_NUMBER is a built-in infoType that represents a credit card number.

For more information, see File store data profiles .

List all bucket paths, clusters, and file extensions scanned where a credit card number, person name, or US Social Security number was detected

  SELECT 
  
 file_store_profile 
 . 
 file_store_path 
 , 
  
 summaries 
 . 
 file_cluster_type 
 . 
 cluster 
 , 
  
 STRING_AGG 
 ( 
 scanned_file_extensions 
 . 
 file_extension 
 ) 
  
 AS 
  
 scanned_extensions 
 FROM 
  
 ` 
  TABLE_OR_VIEW 
 
 ` 
 LEFT 
  
 JOIN 
  
 UNNEST 
 ( 
 file_store_profile 
 . 
 file_cluster_summaries 
 ) 
  
 as 
  
 summaries 
 LEFT 
  
 JOIN 
  
 UNNEST 
 ( 
 summaries 
 . 
 file_store_info_type_summaries 
 ) 
  
 as 
  
 info_types 
 LEFT 
  
 JOIN 
  
 UNNEST 
 ( 
 summaries 
 . 
 file_extensions_scanned 
 ) 
  
 as 
  
 scanned_file_extensions 
 WHERE 
  
 file_store_profile 
 . 
 data_source_type 
 . 
 data_source 
  
 = 
  
 'google/storage/bucket' 
 AND 
  
 info_types 
 . 
 info_type 
 . 
 name 
  
 IN 
  
 ( 
 'CREDIT_CARD_NUMBER' 
 , 
  
 'PERSON_NAME' 
 , 
  
 'US_SOCIAL_SECURITY_NUMBER' 
 ) 
 GROUP 
  
 BY 
  
 1 
 , 
  
 2 
 ; 
 

This query uses the following built-in infoTypes :

  • CREDIT_CARD_NUMBER : represents a credit card number
  • PERSON_NAME : represents the full name of a person
  • US_SOCIAL_SECURITY_NUMBER represents a US Social Security number

For more information, see File store data profiles .

Work with data profiles in Looker Studio

To visualize your data profiles in Looker Studio, you can use a premade report, or you can create your own .

Use a premade report

Sensitive Data Protection provides a premade Looker Studio report that highlights the rich insights of data profiles. The Sensitive Data Protection Dashboardis a multi-page report that gives you a quick high-level view of your data profiles, including breakdowns by risk, by infoType, and by location. Explore the other tabs to see views by geographic region and posture risk, or drill down to specific metrics. You can use this premade report as is, or you can customize it as needed. This is the recommended version of the premade report.

To view the premade report with your data, enter the required values in the following URL. Then, copy the resulting URL to your browser.

https://lookerstudio.google.com/c/u/0/reporting/create?c.reportId=c9826374-e016-4c96-a495-7281328375c6&ds.connector=BIG_QUERY&ds.projectId= PROJECT_ID 
&ds.datasetId= DATASET_ID 
&ds.tableId= TABLE_OR_VIEW 
&ds.type=TABLE&ds.useFreshSchema=false

Replace the following:

  • PROJECT_ID : the project that contains the output table.
  • DATASET_ID : the dataset that contains the output table.
  • TABLE_OR_VIEW : either of the following:

    • The name of the output table, which is the table that contains the exported data profiles—for example, myproject.mydataset.table-profile .
    • The name of the latest view of the output table—for example, myproject.mydataset.table-profile_latest_v1 .

    For more information, see Choose between the output table and the latest view on this page.

It can take a few minutes for Looker Studio to load the report with your data. If you encounter errors or if the report doesn't load, see Troubleshoot errors with the premade report on this page.

In the following example, the dashboard shows that low-sensitivity and high-sensitivity data are present in multiple countries around the world.

Premade report

Earlier version of the premade report

The first version of the premade report is still available at the following address:

https://lookerstudio.google.com/c/u/0/reporting/create?c.reportId=907a2b73-ffe4-40b2-b9a1-c2aa0bbd69fd&ds.connector=BIG_QUERY&ds.projectId= PROJECT_ID 
&ds.datasetId= DATASET_ID 
&ds.tableId= TABLE_OR_VIEW 
&ds.type=TABLE&ds.useFreshSchema=false

Create a report

Looker Studio lets you create interactive reports. In this section, you create a simple table report in Looker Studio that is based on the data profiles exported to your output table in BigQuery.

Make sure you have the project ID, dataset ID, and table ID of the output table or the latest view readily available. You need them to perform this procedure.

This example shows you how to create a report containing a table that shows each infoType reported in your data profiles and its corresponding frequency.

In general, you incur BigQuery usage costs when accessing BigQuery through Looker Studio. For more information, see Visualizing BigQuery data using Looker Studio .

To create a report, do the following:

  1. Open Looker Studio and log in.
  2. Click Blank Report.
  3. On the Connect to datatab, click the BigQuerycard.
  4. If prompted, authorize Looker Studio to access your BigQuery projects.
  5. Connect to your BigQuery data:

    1. For Project, select the project that contains the output table. You can search for the project in the Recent projects, My projects, and Shared projectstabs.
    2. For Dataset, select the dataset that contains the output table.
    3. For Table, select either the output table or the latest view of the output table.

      For more information, see Choose between the output table and the latest view on this page.

    4. Click Add.

    5. In the dialog that appears, click Add to report.

  6. To add a table that shows each infoType reported and its corresponding frequency (record count), follow these steps:

    1. Click Add a chart.
    2. Select a table style.
    3. Click the area where you want to position the chart.

      The chart appears in table format.

    4. Resize the table as needed.

      As long as the table is selected, its properties appear in the Chartpane.

    5. In the Chartpane, on the Setuptab, remove any preselected dimensions and metrics.

    6. For Dimension, add column_profile.column_info_type.info_type.name or file_store_profile.file_cluster_summaries.file_store_info_type_summaries.info_type.name .

      These examples provide data at the column and file cluster levels. You can try other dimensions as well. For example, you can use table-level and bucket-level dimensions.

    7. For Metric, add Record Count.

    The resulting table looks similar to the following:

    A table showing the infoTypes detected and their corresponding
record counts

Learn more about tables in Looker Studio .

Troubleshoot errors with the premade report

If you see any errors, missing controls, or missing charts when loading the premade report, make sure that the premade report is using the latest fields:

  • If your premade report is connected to the output table, confirm that this table is attached to an active discovery scan configuration. To view the settings of your scan configurations, see View a scan configuration .

  • If your premade report is connected to the latest view, confirm that this view is still present in BigQuery. If it is present, try making a change to the view. Alternatively, make a copy of the view and connect the premade report to that copy. For more information about the latest view, see The latest view on this page.

If you continue to see errors after trying these steps, contact Cloud Customer Care .

What's next

Learn about actions you can take to remediate data profile findings.

Design a Mobile Site
View Site in Mobile | Classic
Share by: