Analyze causes of Personally Identifiable Information (PII) leak

In this scenario, you receive an alert that sensitive consumer data (specifically first and last names) appears in a view visible to the entire organization.

This information is originally intended only for specific functional purposes, such as account creation, invoicing, and shipping. However, through a series of transformations and the creation of an analytics view, the Personally Identifiable Information (PII) leaks into a broader analytics schema.

In this tutorial, you use data lineage to trace the flow of sensitive data back to the process that moves it from a trusted to a non-trusted location.

Get started

To complete the use case, first set up the environment and run the data transformations. Use the prerequisites and setup page to connect a remote repository to Dataform. This repository contains the code necessary to set up the dataset and transform the data.

After you set up the environment, use BigQuery and Lineage Explorerto identify where PII crosses a security boundary.

Analyze personal information leak with Lineage Explorer

After you prepare the dataset, trace the personal information leak using the BigQuery Lineagetab.

In this example, you trace the user_email column from the public view back to its source:

  1. In Google Cloud console, go to the BigQuerypage.
  2. Use the search field to find the order_status_stats table.
  3. Click the Lineagetab.
  4. In the Lineage Explorerpane, do the following:
    1. In the Column Level Lineagesection, select the user_email column name from the list.
    2. In the Directionsection, select the Upstreamdirection.
    3. Click Apply.
  5. Follow the graph back one step. The graph shows that the email is pulled from the status_counts_by_user_v intermediate view.
  6. Click the process node between the view and its upstream dependencies. The process node shows that a join operation occurs between anonymized order data and a table containing identity information.

The lineage proves that personal information crosses from a restricted functional table into a broader analytics schema, where unauthorized users can see it.

For more information on visualizing data with data lineage graph, see Lineage graph view .

Create a Mobile Website
View Site in Mobile | Classic
Share by: