Stay organized with collectionsSave and categorize content based on your preferences.
Data plane Identity
Dataproc on GKE usesGKE workload identityto allow pods within the Dataproc on GKE cluster to act with
the authority of the defaultDataproc VM service account (data plane identity).
Workload identity requires the following permissions
to update IAM policies on the GSA used by your Dataproc on GKE
virtual cluster:
compute.projects.get
iam.serviceAccounts.getIamPolicy
iam.serviceAccounts.setIamPolicy
GKE workload identity links the following
GKE Service Accounts (KSAs) to the Dataproc VM Service Account:
agentKSA (interacts with Dataproc control plane): serviceAccount:${PROJECT}.svc.id.goog[${DPGKE_NAMESPACE}/agent]
Grant permissions to theDataproc VM service accountto allow thespark-driverandspark-executorto access project resources,
data sources, data sinks, and any other services required by your workload.
Example:
The following command assigns roles to thedefault Dataproc VM service accountto allow Spark workloads running on
Dataproc on GKE cluster VMs to access Cloud Storage buckets and
BigQuery data sets in the project.
DPGKE_GSA: The examples set and useDPGKE_GSAas the name of the variable that
contains the email address of your GSA. You can set and use a different
variable name.
DPGKE_NAMESPACE: The defaultGKE namespaceis the name of your Dataproc on GKE cluster.
When you create the Dataproc on GKE cluster, add the following properties
for Dataproc to use your GSA instead of the default GSA:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eDataproc on GKE utilizes GKE workload identity to enable pods to operate with the permissions of the Dataproc VM service account.\u003c/p\u003e\n"],["\u003cp\u003eGKE workload identity links three GKE Service Accounts (KSAs) — \u003ccode\u003eagent\u003c/code\u003e, \u003ccode\u003espark-driver\u003c/code\u003e, and \u003ccode\u003espark-executor\u003c/code\u003e — to the Dataproc VM Service Account.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003egcloud dataproc clusters gke create --setup-workload-identity\u003c/code\u003e flag is used during cluster creation to establish the necessary workload identity bindings.\u003c/p\u003e\n"],["\u003cp\u003eYou can use a custom Google service account (GSA) instead of the default, by creating it and setting specific properties during cluster creation, along with assigning workload identity permissions to the KSAs.\u003c/p\u003e\n"],["\u003cp\u003ePermissions \u003ccode\u003ecompute.projects.get\u003c/code\u003e, \u003ccode\u003eiam.serviceAccounts.getIamPolicy\u003c/code\u003e, \u003ccode\u003eiam.serviceAccounts.setIamPolicy\u003c/code\u003e are needed to update IAM policies on the GSA used by Dataproc.\u003c/p\u003e\n"]]],[],null,["Data plane Identity\n\nDataproc on GKE uses\n[GKE workload identity](/kubernetes-engine/docs/how-to/workload-identity)\nto allow pods within the Dataproc on GKE cluster to act with\nthe authority of the default\n[Dataproc VM service account (data plane identity)](/dataproc/docs/concepts/iam/dataproc-principals#vm_service_account_data_plane_identity).\nWorkload identity requires the following permissions\nto update IAM policies on the GSA used by your Dataproc on GKE\nvirtual cluster:\n\n- `compute.projects.get`\n- `iam.serviceAccounts.getIamPolicy`\n- `iam.serviceAccounts.setIamPolicy`\n\n| **Note:** You can use workload identity with a different GSA: see [Custom IAM configuration](#custom_iam_configuration).\n\nGKE workload identity links the following\nGKE Service Accounts (KSAs) to the Dataproc VM Service Account:\n\n1. `agent` KSA (interacts with Dataproc control plane): \n `serviceAccount:${PROJECT}.svc.id.goog[${DPGKE_NAMESPACE}/agent]`\n2. `spark-driver` KSA (runs Spark drivers): \n `serviceAccount:${PROJECT}.svc.id.goog[${DPGKE_NAMESPACE}/spark-driver]`\n3. `spark-executor` KSA (runs Spark executors): \n `serviceAccount:${PROJECT}.svc.id.goog[${DPGKE_NAMESPACE}/spark-executor]`\n\n| Use the `gcloud dataproc clusters gke create --setup-workload-identity` flag when you [create a Dataproc on GKE cluster](/dataproc/docs/guides/dpgke/quickstarts/dataproc-gke-quickstart-create-cluster) to create the workload identity bindings required for the cluster.\n\nAssign roles\n\nGrant permissions to the\n[Dataproc VM service account](/dataproc/docs/concepts/iam/dataproc-principals#vm_service_account_data_plane_identity)\nto allow the `spark-driver` and `spark-executor` to access project resources,\ndata sources, data sinks, and any other services required by your workload.\n\nExample:\n\nThe following command assigns roles to the\n[default Dataproc VM service account](/dataproc/docs/concepts/configuring-clusters/service-accounts#dataproc_service_accounts_2)\nto allow Spark workloads running on\nDataproc on GKE cluster VMs to access Cloud Storage buckets and\nBigQuery data sets in the project. \n\n```\ngcloud projects add-iam-policy-binding \\\n --role=roles/storage.objectAdmin \\\n --role=roles/bigquery.dataEditor \\\n --member=\"project-number-compute@developer.gserviceaccount.com\" \\\n \"${PROJECT}\"\n```\n\nCustom IAM configuration\n\nDataproc on GKE uses\n[GKE workload identity](/kubernetes-engine/docs/how-to/workload-identity)\nto link the default\n[Dataproc VM service account (data plane identity)](/dataproc/docs/concepts/iam/dataproc-principals#vm_service_account_data_plane_identity)\nto the three [GKE service accounts (KSAs)](#ksa-sas).\n\nTo create and use a different Google service account (GSA) to link to\nthe KSAs:\n\n1. Create the GSA (see\n [Creating and managing service accounts](/iam/docs/creating-managing-service-accounts)).\n\n gcloud CLI example: \n\n ```\n gcloud iam service-accounts create \"dataproc-${USER}\" \\\n --description \"Used by Dataproc on GKE workloads.\"\n ```\n Notes:\n\n \u003cbr /\u003e\n\n - The example sets the GSA name as \"dataproc-${USER}\", but you can use a different name.\n2. Set environmental variables:\n\n ```\n PROJECT=project-id \\\n DPGKE_GSA=\"dataproc-${USER}@${PROJECT}.iam.gserviceaccount.com\"\n DPGKE_NAMESPACE=GKE namespace\n ```\n Notes:\n\n \u003cbr /\u003e\n\n - `DPGKE_GSA`: The examples set and use `DPGKE_GSA` as the name of the variable that contains the email address of your GSA. You can set and use a different variable name.\n - `DPGKE_NAMESPACE`: The default [GKE namespace](/dataproc/docs/reference/rest/v1/GkeClusterConfig#NamespacedGkeDeploymentTarget.FIELDS.cluster_namespace) is the name of your Dataproc on GKE cluster.\n3. When you create the Dataproc on GKE cluster, add the following properties\n for Dataproc to use your GSA instead of the default GSA:\n\n ```\n --properties \"dataproc:dataproc.gke.agent.google-service-account=${DPGKE_GSA}\" \\\n --properties \"dataproc:dataproc.gke.spark.driver.google-service-account=${DPGKE_GSA}\" \\\n --properties \"dataproc:dataproc.gke.spark.executor.google-service-account=${DPGKE_GSA}\" \\\n ```\n4. Run the following commands to assign necessary\n [Workload Identity](/kubernetes-engine/docs/how-to/workload-identity)\n permissions to the service accounts:\n\n 1. Assign your GSA the `dataproc.worker` role to allow it to act as agent:\n\n ```\n gcloud projects add-iam-policy-binding \\\n --role=roles/dataproc.worker \\\n --member=\"serviceAccount:${DPGKE_GSA}\" \\\n \"${PROJECT}\"\n ```\n 2. Assign the `agent` KSA the `iam.workloadIdentityUser` role to\n allow it to act as your GSA:\n\n ```\n gcloud iam service-accounts add-iam-policy-binding \\\n --role=roles/iam.workloadIdentityUser \\\n --member=\"serviceAccount:${PROJECT}.svc.id.goog[${DPGKE_NAMESPACE}/agent]\" \\\n \"${DPGKE_GSA}\"\n ```\n\n \u003cbr /\u003e\n\n 3. Grant the `spark-driver` KSA the `iam.workloadIdentityUser` role to\n allow it to act as your GSA:\n\n ```\n gcloud iam service-accounts add-iam-policy-binding \\\n --role=roles/iam.workloadIdentityUser \\\n --member=\"serviceAccount:${PROJECT}.svc.id.goog[${DPGKE_NAMESPACE}/spark-driver]\" \\\n \"${DPGKE_GSA}\"\n ```\n\n \u003cbr /\u003e\n\n 4. Grant the `spark-executor` KSA the `iam.workloadIdentityUser` role to\n allow it to act as your GSA:\n\n ```\n gcloud iam service-accounts add-iam-policy-binding \\\n --role=roles/iam.workloadIdentityUser \\\n --member=\"serviceAccount:${PROJECT}.svc.id.goog[${DPGKE_NAMESPACE}/spark-executor]\" \\\n \"${DPGKE_GSA}\"\n ```\n\n \u003cbr /\u003e"]]