Query Sensitive Data Protection findings in BigQuery

This page provides example queries that you can use to analyze Sensitive Data Protection findings that were exported to BigQuery.

You can configure an inspection job or job trigger to save the findings to BigQuery. Doing so lets you query the findings for further analysis. When your findings are exported to BigQuery, the data is written to either a new or existing table.

For more information about all the actions that Sensitive Data Protection can perform after inspection, see the Actions conceptual topic.

For more information about running queries, see the following:

Columns of the BigQuery table

The columns of the table of exported findings are based on the attributes of the Finding object.

If you are configuring inspection of a BigQuery table , you can set up the job or job trigger such that the exported findings will contain the row's identifiers. Doing so lets you link the inspection findings back to the rows that contain them.

In the inspection job or job trigger, set the following fields to the names of the columns that uniquely identify each row in the table—that is, the columns that serve the purpose of a primary key:

  • If you're using the Google Cloud console, set the Identifying fields (comma separated)field.
  • If you're using the DLP API, set the identifyingFields property.

When the inspection is done and the findings are exported to BigQuery, each finding will contain the corresponding values of the columns that you specified. Those values will be in the location.content_locations.record_location.record_key.id_values field. You can then use those values to link the finding back to the specific row in the inspected BigQuery table.

Sample queries

You can use the following sample queries to analyze your findings. You can also use the queries in a visualization tool such as Looker Studio . These queries are provided to help you get started querying your findings data.

In each of the following queries, replace the following:

  • PROJECT_ID : the project identifier
  • DATASET : the BigQuery dataset name
  • TABLE_ID : the table ID

Select the count of each infoType

Google Cloud console

 SELECT 
  
 info_type 
 . 
 name 
 , 
 COUNT 
 ( 
 info_type 
 . 
 name 
 ) 
  
 AS 
  
 count 
 FROM 
  
 ` 
  PROJECT_ID 
 
 . 
  DATASET 
 
 . 
  TABLE_ID 
 
 ` 
 GROUP 
  
 BY 
  
 info_type 
 . 
 name 
 ORDER 
  
 BY 
  
 count 
  
 DESC 
 ; 

Command-line

bq  
query  
--use_legacy_sql = 
 false 
  
 ' SELECT info_type.name, 
 COUNT(info_type.name) AS count 
 FROM ` PROJECT_ID 
. DATASET 
. TABLE_ID 
` 
 GROUP BY info_type.name ORDER BY count DESC;' 

Select the count of each infoType by day

Google Cloud console

 SELECT 
  
 info_type 
 . 
 name 
 , 
  
 cast 
 ( 
 TIMESTAMP_SECONDS 
 ( 
 create_time 
 . 
 seconds 
 ) 
  
 as 
  
 date 
 ) 
  
 as 
  
 day 
 , 
 COUNT 
 ( 
 locations 
 . 
 container_name 
 ) 
  
 AS 
  
 count 
 FROM 
  
 ` 
  PROJECT_ID 
 
 . 
  DATASET 
 
 . 
  TABLE_ID 
 
 ` 
 , 
 UNNEST 
 ( 
 location 
 . 
 content_locations 
 ) 
  
 AS 
  
 locations 
 GROUP 
  
 BY 
  
 info_type 
 . 
 name 
 , 
  
 day 
 ORDER 
  
 BY 
  
 count 
  
 DESC 
 ; 

Command-line

bq  
query  
--use_legacy_sql = 
 false 
  
 ' SELECT info_type.name, 
 cast(TIMESTAMP_SECONDS(create_time.seconds) as date) as day, 
 COUNT(locations.container_name) AS count FROM ` PROJECT_ID 
. DATASET 
. TABLE_ID 
`, 
 UNNEST(location.content_locations) AS locations 
 GROUP BY info_type.name, day ORDER BY count DESC;' 

Selects the count of each infoType in each container

Google Cloud console

 SELECT 
  
 info_type 
 . 
 name 
 , 
  
 locations 
 . 
 container_name 
 , 
 COUNT 
 ( 
 locations 
 . 
 container_name 
 ) 
  
 AS 
  
 count 
 FROM 
  
 ` 
  PROJECT_ID 
 
 . 
  DATASET 
 
 . 
  TABLE_ID 
 
 ` 
 , 
 UNNEST 
 ( 
 location 
 . 
 content_locations 
 ) 
  
 AS 
  
 locations 
 GROUP 
  
 BY 
  
 locations 
 . 
 container_name 
 , 
  
 info_type 
 . 
 name 
 ORDER 
  
 BY 
  
 count 
  
 DESC 
 ; 

Command-line

bq  
query  
--use_legacy_sql = 
 false 
  
 ' SELECT info_type.name, locations.container_name, 
 COUNT(locations.container_name) AS count FROM ` PROJECT_ID 
. DATASET 
. TABLE_ID 
`, 
 UNNEST(location.content_locations) AS locations 
 GROUP BY locations.container_name,info_type.name ORDER BY count DESC;' 

Selects the finding types found for each column of a table

This query will group all the findings by column name and is intended to work on findings from a BigQuery inspection job. This query is useful if you are trying to identify the likely types for a given column. You can adjust settings by modifying the WHERE and HAVING clauses. For example, if multiple table results are included in your findings table, you can limit these to just one job run or one table name.

Google Cloud console

 SELECT 
  
 table_counts 
 . 
 field_name 
 , 
  
 STRING_AGG 
 ( 
  
 CONCAT 
 ( 
 " " 
 , 
 table_counts 
 . 
 name 
 , 
 " [count: " 
 , 
 CAST 
 ( 
 table_counts 
 . 
 count_total 
  
 AS 
  
 String 
 ), 
 "]" 
 ) 
  
 ORDER 
  
 BY 
  
 table_counts 
 . 
 count_total 
  
 DESC 
 ) 
  
 AS 
  
 infoTypes 
 FROM 
  
 ( 
  
 SELECT 
  
 locations 
 . 
 record_location 
 . 
 field_id 
 . 
 name 
  
 AS 
  
 field_name 
 , 
  
 info_type 
 . 
 name 
 , 
  
 COUNT 
 ( 
 * 
 ) 
  
 AS 
  
 count_total 
  
 FROM 
  
 ` 
  PROJECT_ID 
 
 . 
  DATASET 
 
 . 
  TABLE_ID 
 
 ` 
 , 
  
 UNNEST 
 ( 
 location 
 . 
 content_locations 
 ) 
  
 AS 
  
 locations 
  
 WHERE 
  
 ( 
 likelihood 
  
 = 
  
 'LIKELY' 
  
 OR 
  
 likelihood 
  
 = 
  
 'VERY_LIKELY' 
  
 OR 
  
 likelihood 
  
 = 
  
 'POSSIBLE' 
 ) 
  
 GROUP 
  
 BY 
  
 locations 
 . 
 record_location 
 . 
 field_id 
 . 
 name 
 , 
  
 info_type 
 . 
 name 
  
 HAVING 
  
 count_total 
 > 
 200 
  
 ) 
  
 AS 
  
 table_counts 
 GROUP 
  
 BY 
  
 table_counts 
 . 
 field_name 
 ORDER 
  
 BY 
  
 table_counts 
 . 
 field_name 

The above query might produce a result like this for a sample table, where the infoTypes column tells us how many instances of each infoType was found for that given column.

field_name infoTypes
field1 CUSTOM_USER_US [count: 7004] , CUSTOM_USER_EU [count: 2996]
field2 US_VEHICLE_IDENTIFICATION_NUMBER [count: 9597]
field3 EMAIL_ADDRESS [count: 10000]
field4 IP_ADDRESS [count: 10000]
field5 PHONE_NUMBER [count: 7815]
field6 US_SOCIAL_SECURITY_NUMBER [count: 10000]
field7 CREDIT_CARD_NUMBER [count: 10000]
Design a Mobile Site
View Site in Mobile | Classic
Share by: