In this scenario, you maintain a database that stores records about how various services provided by a healthcare provider are used. To make the data easier to use, you browse the tables to identify potential changes. Before you implement the changes, you identify if any improvements affect existing workflows and if additional adjustments are necessary.
In this tutorial, use data lineage to identify how data transformations affect downstream resources and the workflows the resources are part of.
Get started
To complete the use case, first set up the environment and run the data transformations. Use the prerequisites and setup page to connect a remote repository to Dataform. This repository contains the code necessary to set up the dataset and transform the data.
After you finish the environment setup, use BigQuery and Lineage Explorerto track data transformations and the effect they have on the workflows.
Analyze data transformations with Lineage Explorer
After you prepare the dataset, analyze the impact of the data transformation using the BigQuery Lineagetab.
Verify data integrity
In this example, examine the medicare_participation_indicator
column that indicates whether a physician or supplier agrees to provide services for Medicare. The lineage graph shows how data transformations between the derived tables result in column data type changes:
- In Google Cloud console, go to the BigQuerypage.
- Use the search field to find the
physicians_and_other_supplier_2012_originaltable. - Click the Lineagetab.
- In the Lineage Explorerpane, do the following:
- In the Column Level Lineagesection, select the
medicare_participation_indicatorcolumn name from the list. - In the Directionsection, select the Downstreamdirection.
- Click Apply.
- In the Column Level Lineagesection, select the
- Expand the lineage path until you reach
vertex_ai_model_final_features. -
Analyze the path changes between the
supplier_stg3table and thesupplier_transform1table:
Lineage tracking visualization for medicare_participation_indicatorcolumn- Exact Copypath marking indicates that the column is passing through unchanged.
- Otherpath marking indicates a transformation. In this path, the data type
Stringis treated likeBoolean.
The path shows that the column data types change, which might require adjustments in the workflows that use these tables.
Identify redundant columns
This example examines the nppes_credentials
column that lists National Provider Identifiers the practitioners hold in the National Plan and Provider Enumeration System (NPPES):
- In Google Cloud console, go to the BigQuerypage.
- Use the search field to find the
physicians_and_other_supplier_2012_originaltable. - Click the Lineagetab.
- In the Lineage Explorerpane, do the following:
- In the Column Level Lineagesection, select the
nppes_credentialscolumn name from the list. - In the Directionsection, select the Downstreamdirection.
- Click Apply.
- In the Column Level Lineagesection, select the
- Expand the path to check if downstream lineage exists leading to
vertex_ai_model_final_features.
No lineage means that this column might not be relevant in this particular workflow, and can even be deleted.
For more information on visualizing data with data lineage graph, see Lineage graph view .

