Introduction to data governance in BigQuery
This document provides an introduction to BigQuery data governance and explains how you can use BigQuery features to implement and enforce BigQuery data governance policies. For a more comprehensive overview of data governance in Google Cloud, see What is data governance?
Data governance is the management of the security and quality of data throughout its lifecycle to ensure that the access and accuracy are in accordance with organizational policies and regulations. These data governance priorities can be broken down into three categories:
The following sections define these data governance categories, discuss how BigQuery features support them, and recommend next steps for you.
Access control
Data access management is the process of defining, enforcing, and monitoring the rules and policies governing who has access to data. Access management ensures that data is only accessible to those who are authorized to access it. BigQuery provides the following features to help you with data access:
- Identity and Access Management (IAM) .IAM lets you control who has access to your BigQuery resources such as projects, datasets, tables, and views. You can grant IAM roles to users, groups, and service accounts. These roles define what they can do with your resources.
- Column-level access controls and row-level access controls .Column-level and row-level access controls let you restrict access to specific columns and rows in a table, based on user attributes or data values. This control lets you implement fine-grained access to help protect sensitive data from unauthorized access.
- Data transfer management .VPC Service Controls let you create perimeters around Google Cloud resources and control access to those resources based on your organization's policies.
- Audit logs .Audit logs provide you with a detailed record of user activity and system events in your organization. These logs help you enforce data governance policies and identify potential security risks.
Next steps for access control
The following table outlines next steps that you can take to learn more about access control features:
- Take a look at predefined roles in BigQuery and consider how to assign them based on the principle of least privilege .
- For greater flexibility and granularity in managing your permissions, consider creating custom roles that match your needs.
- Add row and column controls to help control access to specific rows and columns in your tables.
- Establish an access perimeter around your Google Cloud resources by setting up VPC Service Controls .
Data stewardship
Data stewardship helps safeguard sensitive data by appropriately categorizing, masking, redacting, or encrypting it during querying, transit, or storage. This approach enhances data protection and organization. BigQuery provides the following features to help you with data stewardship:
- Data masking .Data masking lets you obscure sensitive data in a table while still permitting authorized users to access the surrounding data. It can also mask data that matches sensitive data patterns, safeguarding against accidental data disclosure.
- Encryption .BigQuery automatically encrypts all data at rest and in transit , while letting you customize your encryption settings to meet your specific needs and requirements.
- Metadata management .Metadata management lets you tag resources, which in turn helps you with data search, organization, and categorization.
Next steps for data stewardship
The following table outlines next steps that you can take to learn more about data stewardship features:
- Learn how Google encrypts your data at rest and in transit by default.
- Add column-level data masking to your table to make it easier to share information through your organization without revealing sensitive data.
- Use Sensitive Data Protection to scan your data for sensitive and high-risk information, such as personally identifiable information (PII), financial data, and health information.
Data quality
Data quality management is the process of tracing data lineage and ensuring that data meets your standards for accuracy, completeness, and consistency. BigQuery provides the following features to help you with data quality:
- Data lineage .Data lineage lets you track the flow of your data over time, providing insights into the data's origin, how it changes over time, and its final destination within your system.
- Data profile scans .Data profile scans let you analyze the statistical characteristics of your data, such as average and unique values.
- Data quality scans .Data quality scans let you perform data checks, validate your data against defined rules, and troubleshoot data quality issues.
Next steps for data quality
The following table outlines next steps that you can take to learn more about access data quality features:
- Run a data profile scan to gain insights about your data, including the limits or averages of your data.
- Enable data lineage in your BigQuery project to automatically record lineage information for BigQuery operations like load, copy, and data modifications.
- Set up a recurring data quality scan to alert you to possible data issues with predefined scan rules .
- Set up custom data rules for your data quality scans so that your scans are tailored to your specific needs.
What's next
- Learn about authentication at Google .
- Learn about data deletion on Google Cloud .
- Learn more about IAM best practices .
- Learn the resource hierarchy on Google Cloud .
- Learn about IAM on Google Cloud .