You can set the bounds for Dataproc on GKE node pool autoscaling when youcreate a Dataproc on GKE virtual cluster. If not specified, Dataproc on GKE node pools
are autoscaled with default values (at Dataproc on GKE GA release, defaults
set to minimum = 1 and maximum = 10, which are subject to change). To obtain
specific minimum and maximum node pool autoscaling values, set them when you
create your Dataproc on GKE virtual cluster.
gcloud container node-pools updateNODE_POOL_NAME\
--cluster=GKE_CLUSTER_NAME\
--region=region\
--enable-autoscaling \
--min-nodes=min nodes (must be <= max-nodes)\
--max-nodes=max nodes (must be >= min-nodes)\
How Spark autoscaling works
When a job is submitted, the driver pod is scheduled to run on the node pool
associated with theSpark driver role.
The driver pod calls the GKE scheduler to create
executor pods.
Executor pods are scheduled on the node pool associated with theSpark executor role.
If the node pools have capacity for the pods, the pods start running immediately.
If there is insufficient capacity, the GKE cluster autoscaler scales up
the node pool to provide the requested resources, up to the user-specified
limit. When node pools have excess capacity, the GKE cluster autoscaler
scales down the node pool to its user-specified limit.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eDataproc on GKE clusters can be scaled by updating the autoscaler configuration of the node pools associated with Spark driver or executor roles.\u003c/p\u003e\n"],["\u003cp\u003eNode pool autoscaling bounds can be set during the creation of a Dataproc on GKE virtual cluster, with default values applied if not specified.\u003c/p\u003e\n"],["\u003cp\u003eUpdating a node pool to disable autoscaling is not recommended, as autoscaling can help with job execution.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003egcloud container node-pools update\u003c/code\u003e command can be used to modify the autoscaling configuration of a Dataproc on GKE node pool, including setting minimum and maximum node counts.\u003c/p\u003e\n"],["\u003cp\u003eWhen a job is submitted, the Spark driver and executor pods are scheduled on their respective node pools, and the GKE cluster autoscaler manages node pool scaling based on capacity and user-specified limits.\u003c/p\u003e\n"]]],[],null,["# Scale Dataproc on GKE clusters\n\nTo scale a Dataproc on GKE cluster, update the autoscaler configuration\nof the [node pool(s)](/dataproc/docs/guides/dpgke/dataproc-gke-nodepools)\nassociated with the Spark driver or Spark executor roles. You\nspecify Dataproc on GKE\n[node pools and their associated roles](/dataproc/docs/guides/dpgke/dataproc-gke-nodepools#role_to_node_pool_mapping) when you\n[create a Dataproc on GKE cluster](/dataproc/docs/guides/dpgke/quickstarts/dataproc-gke-quickstart-create-cluster#create_a_on_cluster).\n\nSet node pool autoscaling\n-------------------------\n\nYou can set the bounds for Dataproc on GKE node pool autoscaling when you\n[create a Dataproc on GKE virtual cluster](/dataproc/docs/guides/dpgke/quickstarts/dataproc-gke-quickstart-create-cluster). If not specified, Dataproc on GKE node pools\nare autoscaled with default values (at Dataproc on GKE GA release, defaults\nset to minimum = 1 and maximum = 10, which are subject to change). To obtain\nspecific minimum and maximum node pool autoscaling values, set them when you\ncreate your Dataproc on GKE virtual cluster.\n\nUpdate node pool autoscaling\n----------------------------\n\n| **Note:** Updating a Dataproc on GKE node pool configuration to [disable autoscaling](/kubernetes-engine/docs/concepts/cluster-autoscaler#autoscaling_limits) is not recommended.\n\nUse the following GKE\n[`gcloud container node-pools update`](/sdk/gcloud/reference/container/node-pools/update)\ncommand to change the autoscaling configuration of a Dataproc on GKE node pool. \n\n```\ngcloud container node-pools update NODE_POOL_NAME \\\n --cluster=GKE_CLUSTER_NAME \\\n --region=region \\\n --enable-autoscaling \\\n --min-nodes=min nodes (must be \u003c= max-nodes) \\\n --max-nodes=max nodes (must be \u003e= min-nodes) \\\n\n```\n\nHow Spark autoscaling works\n---------------------------\n\n1. When a job is submitted, the driver pod is scheduled to run on the node pool associated with the [Spark driver role](/dataproc/docs/reference/rest/v1/projects.regions.clusters#role).\n2. The driver pod calls the GKE scheduler to create executor pods.\n3. Executor pods are scheduled on the node pool associated with the [Spark executor role](/dataproc/docs/reference/rest/v1/projects.regions.clusters#role).\n4. If the node pools have capacity for the pods, the pods start running immediately. If there is insufficient capacity, the GKE cluster autoscaler scales up the node pool to provide the requested resources, up to the user-specified limit. When node pools have excess capacity, the GKE cluster autoscaler scales down the node pool to its user-specified limit."]]