In this scenario, you receive an alert that sensitive consumer data (specifically first and last names) appears in a view visible to the entire organization.
This information is originally intended only for specific functional purposes, such as account creation, invoicing, and shipping. However, through a series of transformations and the creation of an analytics view, the Personally Identifiable Information (PII) leaks into a broader analytics schema.
In this tutorial, you use data lineage to trace the flow of sensitive data back to the process that moves it from a trusted to a non-trusted location.
Get started
To complete the use case, first set up the environment and run the data transformations. Use the prerequisites and setup page to connect a remote repository to Dataform. This repository contains the code necessary to set up the dataset and transform the data.
After you set up the environment, use BigQuery and Lineage Explorerto identify where PII crosses a security boundary.
Analyze personal information leak with Lineage Explorer
After you prepare the dataset, trace the personal information leak using the BigQuery Lineagetab.
In this example, you trace the user_email
column from the public view back to its source:
- In Google Cloud console, go to the BigQuerypage.
- Use the search field to find the
order_status_statstable. - Click the Lineagetab.
- In the Lineage Explorerpane, do the following:
- In the Column Level Lineagesection, select the
user_emailcolumn name from the list. - In the Directionsection, select the Upstreamdirection.
- Click Apply.
- In the Column Level Lineagesection, select the
- Follow the graph back one step. The graph shows that the email is pulled from the
status_counts_by_user_vintermediate view. - Click the process node between the view and its upstream dependencies. The process node shows that a join operation occurs between anonymized order data and a table containing identity information.
The lineage proves that personal information crosses from a restricted functional table into a broader analytics schema, where unauthorized users can see it.
For more information on visualizing data with data lineage graph, see Lineage graph view .

