The Cloud SQL instance is configured with a VPC network that uses
therequired subnets.
The Cloud SQL instance uses a database schema that iscompatible with the Hive
Metastore versionthat runs on the Dataproc Metastore service
(where it's copying data to).
The Cloud SQL instance contains the appropriate users to
establish connectivity between Datastream and Dataproc Metastore
and Dataproc Metastore and Cloud SQL.
Required Roles
To get the permissions that
you need to create a Dataproc Metastore and start a managed migration,
ask your administrator to grant you the
following IAM roles:
To grant full access to all Dataproc Metastore resources, including setting IAM permissions:Dataproc Metastore Admin(roles/metastore.admin)
on the Dataproc Metastore user account or service account
To grant full control of Dataproc Metastore resources:Dataproc Metastore Editor(roles/metastore.editor)
on the Dataproc Metastore user account or service account
To grant permission to start a migration:Migration Admin(roles/metastore.migrationAdmin)
on the Dataproc Metastoreservice agentin the service project
Grant additional roles depending on your project settings
Depending on how your project is configured, you might need to add the following
additional roles. Examples on how to grant these roles to the appropriate
accounts are shown in theprerequisitessection later on this page.
Grant the Network Admin (roles/compute.networkAdmin) role to the
Datastream Service Agent on the host project.
If your Cloud SQL instance is in a different project than the Dataproc Metastore service project:
Grant theroles/cloudsql.clientrole and theroles/cloudsql.instanceUserrole to the Dataproc Metastore service agent on the Cloud SQL instance project.
If the Cloud Storage bucket for the Change-Data-Capture pipeline is in a different project than your Dataproc Metastore service project:
Make sure your Datastream service agent has the required permissions to write to the bucket. Typically these are theroles/storage.objectViewer,roles/storage.objectCreatorandroles/storage.legacyBucketReaderroles.
Managed migration prerequisites
Dataproc Metastore usesproxies and
a change data capture pipelineto facilitate the data transfer.
It's important to understand how these work before starting a transfer.
Key terms
Service Project: A service project is the Google Cloud project where you
created your Dataproc Metastore service.
Host Project: A host project is the Google Cloud project that holds
your Shared VPC networks. One or more service projects can be linked
to your host project to use these shared networks. For more information,
seeShared VPC.
If you can't grant theroles/compute.networkAdminrole, create a
custom role with the permissions listed inShared VPC
prerequisites.
These permissions are required at the start of the migration to establish
peering between the VPC network in the host project with Datastream.
This role can be removed as soon as the migration is started. If you
remove the role before the migration is complete, Dataproc Metastore
can't clean up the peering job. In this case, you must clean the job up yourself.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eThis guide outlines the necessary steps to prepare a Google Cloud project for a Dataproc Metastore managed migration, including setting up required services.\u003c/p\u003e\n"],["\u003cp\u003eA Dataproc Metastore configured with the Spanner database type and a Cloud SQL for MySQL instance with specific configurations are needed for managed migration.\u003c/p\u003e\n"],["\u003cp\u003eSpecific IAM roles, including Dataproc Metastore Admin, Editor, and Migration Admin, are required for managing Dataproc Metastore resources and initiating the migration process.\u003c/p\u003e\n"],["\u003cp\u003eDepending on the project configuration, additional roles like Network User and Network Admin, may need to be granted to specific service agents to ensure proper network access and connectivity.\u003c/p\u003e\n"],["\u003cp\u003eFirewall rules must be configured to allow traffic between the Dataproc Metastore, the Cloud SQL instance, and Datastream for the migration to function properly.\u003c/p\u003e\n"]]],[],null,["# Prerequisites for managed migration\n\nThis page shows you how to set up your Google Cloud project to prepare for a\nDataproc Metastore managed migration.\n\nBefore you begin\n----------------\n\n- Understand how [managed migration works](/dataproc-metastore/docs/about-managed-migration).\n\n- Set up or have access to the following services:\n\n - A Dataproc Metastore configured with the [Spanner\n database type](/dataproc-metastore/docs/database-type).\n - A [Cloud SQL for MySQL](/sql/docs/mysql/introduction) database instance configured with [Private\n IP](/sql/docs/mysql/configure-private-ip#existing-private-instance). For the Cloud SQL instance, ensure the following:\n\n - The Cloud SQL instance is configured with a VPC network that uses\n the [required subnets](/dataproc-metastore/docs/about-managed-migrations#proxy-and-pipeline-considerations).\n\n - The Cloud SQL instance uses a database schema that is [compatible with the Hive\n Metastore version](/dataproc-metastore/docs/database-type#mysql) that runs on the Dataproc Metastore service\n (where it's copying data to).\n\n - The Cloud SQL instance contains the appropriate users to\n establish connectivity between Datastream and Dataproc Metastore\n and Dataproc Metastore and Cloud SQL.\n\n### Required Roles\n\n\nTo get the permissions that\nyou need to create a Dataproc Metastore and start a managed migration,\n\nask your administrator to grant you the\nfollowing IAM roles:\n\n- To grant full access to all Dataproc Metastore resources, including setting IAM permissions: [Dataproc Metastore Admin](/iam/docs/roles-permissions/metastore#metastore.admin) (`roles/metastore.admin`) on the Dataproc Metastore user account or service account\n- To grant full control of Dataproc Metastore resources: [Dataproc Metastore Editor](/iam/docs/roles-permissions/metastore#metastore.editor) (`roles/metastore.editor`) on the Dataproc Metastore user account or service account\n- To grant permission to start a migration: [Migration Admin](/iam/docs/roles-permissions/metastore#metastore.migrationAdmin) (`roles/metastore.migrationAdmin`) on the Dataproc Metastore [service agent](/dataproc-metastore/docs/iam-and-access-control#service-accounts) in the service project\n\n\nFor more information about granting roles, see [Manage access to projects, folders, and organizations](/iam/docs/granting-changing-revoking-access).\n\n\nYou might also be able to get\nthe required permissions through [custom\nroles](/iam/docs/creating-custom-roles) or other [predefined\nroles](/iam/docs/roles-overview#predefined). \n\n#### Grant additional roles depending on your project settings\n\nDepending on how your project is configured, you might need to add the following\nadditional roles. Examples on how to grant these roles to the appropriate\naccounts are shown in the [prerequisites](#prerequisites) section later on this page.\n\n- Grant the Network User (`roles/compute.networkUser`) role to the Dataproc Metastore [service agent](/iam/docs/service-agents#cloud-datastream-service-account) and the [\\[Google APIs Service Agent\\]](/iam/docs/service-agents#google-apis-service-agent) on the service project.\n- Grant the Network Admin (`roles/compute.networkAdmin`) role to the Datastream Service Agent on the host project.\n\nIf your Cloud SQL instance is in a different project than the Dataproc Metastore service project:\n\n- Grant the `roles/cloudsql.client` role and the `roles/cloudsql.instanceUser` role to the Dataproc Metastore service agent on the Cloud SQL instance project.\n\nIf the Cloud Storage bucket for the Change-Data-Capture pipeline is in a different project than your Dataproc Metastore service project:\n\n- Make sure your Datastream service agent has the required permissions to write to the bucket. Typically these are the `roles/storage.objectViewer`, `roles/storage.objectCreator` and `roles/storage.legacyBucketReader` roles.\n\nManaged migration prerequisites\n-------------------------------\n\nDataproc Metastore uses [proxies and\na change data capture pipeline](/dataproc-metastore/docs/about-managed-migrations#proxy-and-pipeline-considerations) to facilitate the data transfer.\nIt's important to understand how these work before starting a transfer.\n\n**Key terms**\n\n- **Service Project**: A service project is the Google Cloud project where you created your Dataproc Metastore service.\n- **Host Project** : A host project is the Google Cloud project that holds your Shared VPC networks. One or more service projects can be linked to your host project to use these shared networks. For more information, see [Shared VPC](/vpc/docs/shared-vpc).\n\n| **Note:** If you don't use a Shared VPC, then the service and host project are the same where the Dataproc Metastore service is created.\n\n1. [Enable the Datastream API](/datastream/docs/use-the-datastream-api) in your service project.\n2. Grant the `roles/metastore.migrationAdmin` role to the Dataproc Metastore\n Service Agent in your service project.\n\n ```\n gcloud projects add-iam-policy-binding SERVICE_PROJECT --role \"roles/metastore.migrationAdmin\" --member \"serviceAccount:service-SERVICE_PROJECT@gcp-sa-metastore.iam.gserviceaccount.com\"\n ```\n3. Add the following firewall rules.\n\n To establish a connection between Dataproc Metastore and your private\n IP Cloud SQL instance.\n - A firewall rule to allow traffic from the [health check\n probe](/load-balancing/docs/health-check-concepts) probe to the network load\n balancer of SOCKS5 proxy. For example:\n\n ```\n gcloud compute firewall-rules create RULE_NAME --direction=INGRESS --priority=1000 --network=CLOUD_SQL_NETWORK--allow=tcp:1080 --source-ranges=35.191.0.0/16,130.211.0.0/22\n ```\n\n Port `1080` is where the SOCKS5 proxy server is running.\n - A firewall rule to allow traffic from the load balancer to the SOCKS5\n proxy MIG. For example:\n\n ```\n gcloud compute firewall-rules create RULE_NAME --direction=INGRESS --priority=1000 --network=\u003cvar translate=\"no\"\u003eCLOUD_SQL_NETWORK\u003c/var\u003e--action=ALLOW --rules=all --source-ranges=\u003cvar translate=\"no\"\u003ePROXY_SUBNET_RANGE\u003c/var\u003e\n ```\n - A firewall rule to allow traffic from the Private Service Connect service attachment to the load balancer. For example:\n\n ```\n gcloud compute firewall-rules create RULE_NAME --direction=INGRESS --priority=1000 --network=CLOUD_SQL_NETWORK --allow=tcp:1080 --source-ranges=NAT_SUBNET_RANGE\n ```\n\n A firewall rule to allow Datastream to use the `/29` CIDR IP range\n to create a private IP connection. For example: \n\n ```\n gcloud compute firewall-rules create RULE_NAME --direction=INGRESS --priority=1000 --network=CLOUD_SQL_NETWORK --action=ALLOW --rules=all --source-ranges=CIDR_RANGE\n ```\n\n(Optional) Add roles to Shared VPC\n----------------------------------\n\nFollow these steps if you use a Shared VPC.\n\nFor more details about a Shared VPC, see [Service Project Admins](/vpc/docs/provisioning-shared-vpc#sa-as-spa).\n| **Note:** If you can't assign these roles at the host project level, you can assign them at the individual subnet levels.\n\n1. Grant the `roles/compute.networkUser` role to the Dataproc Metastore\n Service Agent and the Google API Service Agent on the host project.\n\n ```\n gcloud projects add-iam-policy-binding HOST_PROJECT --role \"roles/compute.networkUser\" --member \"serviceAccount:service-SERVICE_ACCOUNT@gcp-sa-metastore.iam.gserviceaccount.com\"\n gcloud projects add-iam-policy-binding HOST_PROJECT --role \"roles/compute.networkUser\" --member \"serviceAccount:SERVICE_PROJECT@cloudservices.gserviceaccount.com\"\n ```\n2. Grant the `roles/compute.networkAdmin` role to the Datastream Service Agent\n on the host project.\n\n ```\n gcloud projects add-iam-policy-binding HOST_PROJECT --role \"roles/compute.networkAdmin\" --member \"serviceAccount:service-SERVICE_PROJECT@gcp-sa-datastream.iam.gserviceaccount.com\"\n ```\n\nIf you can't grant the `roles/compute.networkAdmin` role, create a\ncustom role with the permissions listed in [Shared VPC\nprerequisites](/datastream/docs/create-a-private-connectivity-configuration#shared-vpc).\n\n- These permissions are required at the start of the migration to establish\n peering between the VPC network in the host project with Datastream.\n\n- This role can be removed as soon as the migration is started. If you\n remove the role before the migration is complete, Dataproc Metastore\n can't clean up the peering job. In this case, you must clean the job up yourself.\n\nWhat's next\n-----------\n\n- [Use managed migration](/dataproc-metastore/docs/use-managed-migration)"]]