Best practices for Knowledge Catalog

This document provides guidance and best practices for using Knowledge Catalog (formerly Dataplex Universal Catalog).

Choose a project for your lake

When you select the project in which to host your lake, consider the following factors:

  • The project must belong to the same VPC Service Controls perimeter as the data destined to be within the lake.

  • The lake service account requires administrator permissions on the Cloud Storage buckets or BigQuery datasets. Knowledge Catalog creates external tables in BigQuery for tables discovered in Cloud Storage. Knowledge Catalog also makes available BigQuery table metadata, and tables discovered in the Cloud Storage bucket, in a Dataproc Metastore service. The Dataproc Metastore is located within the data lake project.

Cloud Storage settings and limitations

  • Region: Knowledge Catalog supports single region and multi-region buckets in some Google Cloud regions .

  • Storage class: Cloud Storage buckets of all storage classes are supported (Standard, Nearline, Coldline, Archive). Additional data retrieval costs might incur for accessing or scanning Nearline, Coldline, or Archive data.

  • Bucket ACL: Knowledge Catalog supports Cloud Storage buckets with uniform access controls only. Fine-grained access controls aren't supported.

  • Requester Pays: Cloud Storage buckets with the Requester Pays feature enabled aren't supported.

Security and permissions guidance

Knowledge Catalog requires adding the Knowledge Catalog service accounts as an administrative service account on managed buckets and datasets.

Knowledge Catalog enables analysts to access Cloud Storage buckets and BigQuery datasets across many projects. To enable this access, Knowledge Catalog requires adding the Knowledge Catalog service accounts with administrative controls to these projects.

For Discovery, Knowledge Catalog adds the Dataproc Metastore service account to the Cloud Storage buckets. If you have your own Dataproc Metastore cluster, you might want to make the Knowledge Catalog lake use your Dataproc Metastore service, which is an option when you create your lake.

If you choose to add a Cloud Storage bucket with fine-grained access to a lake, Knowledge Catalog will provide full access to that bucket through the lake because Knowledge Catalog permissions are propagated to all objects in the bucket. If you require fine-grained access, it's recommended that you split the data in your bucket into multiple buckets.

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: