Manage metadata of lakes, zones, and assets

This guide describes Dataplex Universal Catalog metadata for lakes, zones, and assets, and how you can use the Dataplex API to manage it.

Overview

Dataplex Universal Catalog scans the following:

  • Structured and semi-structured data assets within data lakes, to extract table metadata into table entities
  • Unstructured data, such as images and texts, to extract fileset metadata into fileset entities

You can use the Dataplex Universal Catalog Metadata API to do the following:

  • View, edit, and delete table and fileset entity metadata
  • Create your own table or fileset entity metadata

You can analyze Dataplex Universal Catalog metadata using the following:

  • Data Catalog ( Deprecated ) for searching and tagging
  • Dataproc Metastore and BigQuery for table metadata querying and analytics processing

Dataplex API

This section summarizes the lake, zone, and asset resources in the Dataplex API and the key resources with them.

Control plane API

The Dataplex Universal Catalog control plane API allows for the creation and management of the lake, zone, and asset resources.

  • Lake : A Dataplex Universal Catalog service instance that allows managing storage resources across projects within an organization.

  • Zone : A logical grouping of assets within a lake. Use multiple zones within a lake to organize data based on readiness, workload, or organization structure.

  • Assets : Storage resources, with data stored in Cloud Storage buckets or BigQuery datasets, that are attached to a zone within a lake.

Use the Dataplex Universal Catalog Metadata API to create and manage metadata within table and fileset entities and partitions. Dataplex Universal Catalog scans data assets, either in a lake or provided by you, to create entities and partitions. Entities and partitions maintain references to associated assets and physical storage locations.

Key concepts

Table entity:

Metadata for structured data with well-defined schemas. Table entities are uniquely identified by entity ID and data location. Table entity metadata is queryable in BigQuery and Dataproc Metastore:

  • Cloud Storage objects:Metadata for Cloud Storage objects, which are accessed through the Cloud Storage APIs.
  • BigQuery tables:Metadata for BigQuery tables, which are accessed through the BigQuery APIs.
Fileset entity:

Metadata about unstructured, typically schema-less, data. Filesets are uniquely identified by entity ID and data location. Each fileset has a data format.

Partitions:

Metadata for a subset of data within a table or fileset entity, identified by a set of key-value pairs and a data location.

Try the API

Use the Dataplex Universal Catalog lakes.zones.entities and lakes.zones.partitions API reference documentation pages to view the parameters and fields associated with each API. Use the Try this APIpanel that accompanies the reference documentation for each API method to make API requests using different parameters and fields. You can construct, view, and submit your requests without the need to generate credentials, and then view responses returned by the service.

The following sections provide information to help you understand and use the Dataplex Universal Catalog Metadata APIs.

Entities

Partitions

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: