Transition to business glossary on Dataplex Universal Catalog

This document provides instructions for migrating in a single step from the preview version of business glossary, which supported Data Catalog metadata, to the generally available version of business glossary, which supports Dataplex Universal Catalog metadata.

Before you begin

  1. Install gcloud or python packages . Authenticate your user account and the Application Default Credentials (ADC) that the Python libraries use. Run the following commands and follow the browser-based prompts:

     gcloud  
    init
    gcloud  
    auth  
    login
    gcloud  
    auth  
    application-default  
    login 
    
  2. Enable the following APIs:

  3. Create one or several Cloud Storage buckets in any of your projects. The buckets will be used as a temporary location for the import files. The more buckets you provide, the faster the import is. Grant the Storage Admin IAM role to the service account running the migration:

     service 
     - 
      MIGRATION_PROJECT_ID 
     
     @gcp 
     - 
     sa 
     - 
     dataplex 
     . 
    iam.gserviceaccount.com

    Replace MIGRATION_PROJECT_ID with the project from which you are migrating the glossaries.

  4. Set up the repository:

    1. Clone the repository:

       git  
      clone  
      https://github.com/GoogleCloudPlatform/dataplex-labs.git cd 
        
      dataplex-labs/dataplex-quickstart-labs/00-resources/scripts/python/business-glossary-import 
      
    2. Install the required packages:

       pip3  
      install  
      -r  
      requirements.txt cd 
        
      migration 
      

Required roles

Run the migration script

python3 run.py --project= MIGRATION_PROJECT_ID 
--user-project= USER_PROJECT_ID 
--buckets= BUCKET1 
, BUCKET2 

Replace the following:

  • USER_PROJECT_ID : the project ID of the project to be migrated.

    The MIGRATION_PROJECT_ID refers to the source project containing Data Catalog glossaries that you want to export. The USER_PROJECT_ID is the project used for billing and quota attribution for the API calls generated by the script.

  • BUCKET1 and BUCKET2 : the Cloud Storage bucket IDs to be used for the import.

    You can provide one or more buckets. For the bucket arguments, provide a comma-separated list of bucket names without spaces (for example, --buckets=bucket-one,bucket-two ). A one-to-one mapping among buckets and glossaries is not required; the script runs the import jobs in parallel, speeding up the migration.

If permission issues prevent the script from automatically discovering your organization IDs, use the --orgIds flag to specify the organizations that the script can use to search for data assets linked to glossary terms.

Scope glossaries in migration

To migrate only specific glossaries, define their scope by providing their respective URLs.

python3 run.py --project= MIGRATION_PROJECT_ID 
--user-project= USER_PROJECT_ID 
--buckets= BUCKET1 
, BUCKET2 
--glossaries=" GLOSSARY_URL1 
"," GLOSSARY_URL2 
"

Replace GLOSSARY_URL1 (and GLOSSARY_URL2 ) with the URLs of the glossaries you are migrating. You can provide one or more glossary URLs.

When the migration runs, the number of import jobs can be less than the number of exported glossaries. This happens when empty glossaries that don't require a background import job are created directly.

Resume migration for import job failures

The presence of files after the migration indicates that some import jobs have failed. To resume the migration, run the following command:

python3 run.py --project= MIGRATION_PROJECT_ID 
--user-project= USER_PROJECT_ID 
--buckets= BUCKET1 
, BUCKET2 
--resume-import

If you encounter failures, run the resume command again. The script processes only files that were not successfully imported and deleted.

The script enforces dependency checks for entry links and inter-glossary links. An entry link file is imported only if its parent glossary was successfully imported. Similarly, a link between terms is imported only if all referenced terms have been successfully imported.

Troubleshoot

This section provides solutions to common errors.

  • Permission Denied / 403 Error: Ensure the user or service account has the Dataplex Universal Catalog Editor role on the destination project and the Data Catalog Viewer role on the source project.

  • ModuleNotFoundError: Ensure you have activated your Python virtual environment and installed the required packages using pip3 install -r requirements.txt .

  • TimeoutError / ssl.SSLError: These network-level errors might be caused by firewalls, proxies, or slow connections. The script has a 5-minute timeout - persistent issues might require checking your local network configuration.

  • Method not found (Cannot fetch entries): This error often indicates that your user project is not allow-listed to call the API, preventing the retrieval of necessary entries.

Design a Mobile Site
View Site in Mobile | Classic
Share by: