This document provides instructions for migrating in a single step from the preview version of business glossary, which supported Data Catalog metadata, to the generally available version of business glossary, which supports Dataplex Universal Catalog metadata.
Before you begin
-
Install gcloud or python packages . Authenticate your user account and the Application Default Credentials (ADC) that the Python libraries use. Run the following commands and follow the browser-based prompts:
gcloud init gcloud auth login gcloud auth application-default login -
Enable the following APIs:
-
Create one or several Cloud Storage buckets in any of your projects. The buckets will be used as a temporary location for the import files. The more buckets you provide, the faster the import is. Grant the Storage Admin IAM role to the service account running the migration:
service - MIGRATION_PROJECT_ID @gcp - sa - dataplex . iam.gserviceaccount.com
Replace
MIGRATION_PROJECT_IDwith the project from which you are migrating the glossaries. -
Set up the repository:
-
Clone the repository:
git clone https://github.com/GoogleCloudPlatform/dataplex-labs.git cd dataplex-labs/dataplex-quickstart-labs/00-resources/scripts/python/business-glossary-import -
Install the required packages:
pip3 install -r requirements.txt cd migration
-
Required roles
Run the migration script
python3 run.py --project= MIGRATION_PROJECT_ID --user-project= USER_PROJECT_ID --buckets= BUCKET1 , BUCKET2
Replace the following:
-
USER_PROJECT_ID: the project ID of the project to be migrated.The
MIGRATION_PROJECT_IDrefers to the source project containing Data Catalog glossaries that you want to export. TheUSER_PROJECT_IDis the project used for billing and quota attribution for the API calls generated by the script. -
BUCKET1andBUCKET2: the Cloud Storage bucket IDs to be used for the import.You can provide one or more buckets. For the bucket arguments, provide a comma-separated list of bucket names without spaces (for example,
--buckets=bucket-one,bucket-two). A one-to-one mapping among buckets and glossaries is not required; the script runs the import jobs in parallel, speeding up the migration.
If permission issues prevent the script from automatically discovering your
organization IDs, use the --orgIds
flag to specify the organizations that the
script can use to search for data assets linked to glossary terms.
Scope glossaries in migration
To migrate only specific glossaries, define their scope by providing their respective URLs.
python3 run.py --project= MIGRATION_PROJECT_ID --user-project= USER_PROJECT_ID --buckets= BUCKET1 , BUCKET2 --glossaries=" GLOSSARY_URL1 "," GLOSSARY_URL2 "
Replace GLOSSARY_URL1
(and GLOSSARY_URL2
)
with the URLs of the glossaries you are migrating. You can provide one or more
glossary URLs.
When the migration runs, the number of import jobs can be less than the number of exported glossaries. This happens when empty glossaries that don't require a background import job are created directly.
Resume migration for import job failures
The presence of files after the migration indicates that some import jobs have failed. To resume the migration, run the following command:
python3 run.py --project= MIGRATION_PROJECT_ID --user-project= USER_PROJECT_ID --buckets= BUCKET1 , BUCKET2 --resume-import
If you encounter failures, run the resume
command again. The script processes
only files that were not successfully imported and deleted.
The script enforces dependency checks for entry links and inter-glossary links. An entry link file is imported only if its parent glossary was successfully imported. Similarly, a link between terms is imported only if all referenced terms have been successfully imported.
Troubleshoot
This section provides solutions to common errors.
-
Permission Denied / 403 Error: Ensure the user or service account has the Dataplex Universal Catalog Editor role on the destination project and the Data Catalog Viewer role on the source project.
-
ModuleNotFoundError: Ensure you have activated your Python virtual environment and installed the required packages using
pip3 install -r requirements.txt. -
TimeoutError / ssl.SSLError: These network-level errors might be caused by firewalls, proxies, or slow connections. The script has a 5-minute timeout - persistent issues might require checking your local network configuration.
-
Method not found (Cannot fetch entries): This error often indicates that your user project is not allow-listed to call the API, preventing the retrieval of necessary entries.

