Analyze the impact of data changes on workflows

In this scenario, you maintain a database that stores records about how various services provided by a healthcare provider are used. To make the data easier to use, you browse the tables to identify potential changes. Before you implement the changes, you identify if any improvements affect existing workflows and if additional adjustments are necessary.

In this tutorial, use data lineage to identify how data transformations affect downstream resources and the workflows the resources are part of.

Get started

To complete the use case, first set up the environment and run the data transformations. Use the prerequisites and setup page to connect a remote repository to Dataform. This repository contains the code necessary to set up the dataset and transform the data.

After you finish the environment setup, use BigQuery and Lineage Explorerto track data transformations and the effect they have on the workflows.

Analyze data transformations with Lineage Explorer

After you prepare the dataset, analyze the impact of the data transformation using the BigQuery Lineagetab.

Verify data integrity

In this example, examine the medicare_participation_indicator column that indicates whether a physician or supplier agrees to provide services for Medicare. The lineage graph shows how data transformations between the derived tables result in column data type changes:

  1. In Google Cloud console, go to the BigQuerypage.
  2. Use the search field to find the physicians_and_other_supplier_2012_original table.
  3. Click the Lineagetab.
  4. In the Lineage Explorerpane, do the following:
    1. In the Column Level Lineagesection, select the medicare_participation_indicator column name from the list.
    2. In the Directionsection, select the Downstreamdirection.
    3. Click Apply.
  5. Expand the lineage path until you reach vertex_ai_model_final_features .
  6. Analyze the path changes between the supplier_stg3 table and the supplier_transform1 table:

    Lineage tracking for medicare_participation_indicator column
    Lineage tracking visualization for medicare_participation_indicator column
    • Exact Copypath marking indicates that the column is passing through unchanged.
    • Otherpath marking indicates a transformation. In this path, the data type String is treated like Boolean .

The path shows that the column data types change, which might require adjustments in the workflows that use these tables.

Identify redundant columns

This example examines the nppes_credentials column that lists National Provider Identifiers the practitioners hold in the National Plan and Provider Enumeration System (NPPES):

  1. In Google Cloud console, go to the BigQuerypage.
  2. Use the search field to find the physicians_and_other_supplier_2012_original table.
  3. Click the Lineagetab.
  4. In the Lineage Explorerpane, do the following:
    1. In the Column Level Lineagesection, select the nppes_credentials column name from the list.
    2. In the Directionsection, select the Downstreamdirection.
    3. Click Apply.
  5. Expand the path to check if downstream lineage exists leading to vertex_ai_model_final_features .

No lineage means that this column might not be relevant in this particular workflow, and can even be deleted.

For more information on visualizing data with data lineage graph, see Lineage graph view .

Create a Mobile Website
View Site in Mobile | Classic
Share by: