This document describes how to create a Dataplex Universal Catalog lake. You can create a lake in any of the regions that support Dataplex Universal Catalog.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the Dataplex, Dataproc, Dataproc Metastore, BigQuery, and Cloud Storage APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles . -
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the Dataplex, Dataproc, Dataproc Metastore, BigQuery, and Cloud Storage APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles .
Access control
-
To create and manage your lake, make sure you have the predefined roles
roles/dataplex.adminorroles/dataplex.editorgranted. For more information, see grant a single role . -
To attach a Cloud Storage bucket from another project to your lake, grant the following Dataplex Universal Catalog service account an administrator role on the bucket by running the following command:
gcloud alpha dataplex lakes authorize \ --project PROJECT_ID_OF_LAKE \ --storage-bucket-resource BUCKET_NAME
Create a metastore
You can access Dataplex Universal Catalog metadata using Hive Metastore in Spark queries by associating a Dataproc Metastore service instance with your Dataplex Universal Catalog lake. You need to have a gRPC-enabled Dataproc Metastore (version 3.1.2 or higher) associated with the Dataplex Universal Catalog lake.
-
Create a Dataproc Metastore service .
-
Configure the Dataproc Metastore service instance to expose a gRPC endpoint (instead of the default Thrift Metastore endpoint):
curl -X PATCH \ -H "Authorization: Bearer $( gcloud auth print-access-token ) " \ -H "Content-Type: application/json" \ "https://metastore.googleapis.com/v1beta/projects/ PROJECT_ID /locations/ LOCATION /services/ SERVICE_ID ?updateMask=hiveMetastoreConfig.endpointProtocol" \ -d '{"hiveMetastoreConfig": {"endpointProtocol": "GRPC"}}' -
View the gRPC endpoint:
gcloud metastore services describe SERVICE_ID \ --project PROJECT_ID \ --location LOCATION \ --format "value(endpointUri)"
Create a lake
Console
-
In the Google Cloud console, go to the Dataplex Universal Catalog Lakespage.
-
Click Create.
-
Enter a Display name.
-
The lake ID is automatically generated for you. If you prefer, you can provide your own ID. See Resource naming convention .
-
Optional: Enter a Description.
-
Specify the Regionin which to create the lake.
For lakes created in a given region (for example,
us-central1), you can attach both single-region (us-central1) data and multi-region (us multi-region) data depending on the zone settings. -
Optional: Add labels to your lake.
-
Optional: In the Metastoresection, click the Metastore servicemenu, and select the service you created in the Before you begin section.
-
Click Create.
gcloud
To create a lake, use the gcloud alpha dataplex lakes create
command:
gcloud alpha dataplex lakes create LAKE \ --location= LOCATION \ --labels= k1=v1,k2=v2,k3=v3 \ --metastore-service= METASTORE_SERVICE
Replace the following:
-
LAKE: name of the new lake -
LOCATION: refers to a Google Cloud region -
k1=v1,k2=v2,k3=v3: labels used (if any) -
METASTORE_SERVICE: the Dataproc Metastore service, if created
REST
To create a lake, use the lakes.create method.
What's next?
- Learn how to Add zones to a lake .
- Learn how to Attach assets to a zone .
- Learn how to secure your lake .
- Learn how to manage your lake .

