When you create a Dataproc cluster, cluster resources use a regional endpoints based on Compute Engine zones . When you choose a region, you can select a zone within that region, or you can omit the zone to have the Dataproc Auto Zone feature select a zone for you in the region you choose. Once a zone is selected, all nodes for that cluster will be deployed to that zone.
You can exclude zones from Auto Zone selection criteria provided that the cluster region includes at least two non-excluded zones. For more information, see Use Auto Zone placement .
Auto Zone and resource reservations
Auto Zone prioritizes creating a cluster in a zone with resource reservations , as follows:
-
If requested cluster resources can be fully satisfied by reserved, plus, if necessary, on-demand resources in a zone, Auto Zone will consume the reserved and on-demand resources, and create the cluster in that zone.
-
Auto Zone prioritizes zones for selection according to total CPU core (
vCPU) reservations in a zone.Example:A cluster creation request specifies 20
n2-standard-2and 1n2-standard-64(40 + 64vCPUsrequested). Auto Zone will prioritize the following zones for selection according to the total vCPU reservations available in the zone:-
zone-cavailable reservations: 3n2-standard-2and 1n2-standard-64(70vCPUs) -
zone-bavailable reservations: 1n2-standard-64(64vCPUs) -
zone-aavailable reservations: 25n2-standard-2(50vCPUs)Assuming each of these zones has additional on-demand
vCPUand other resources sufficient to satisfy the cluster request, Auto Zone will selectzone-cfor cluster creation.
-
-
If requested cluster resources cannot be fully satisfied by reserved plus on-demand resources in a zone, Auto Zone will create the cluster in a zone that is most likely to satisfy the request using on-demand resources.
Use Auto Zone placement
Console
To create a Dataproc cluster that uses Auto Zone placement:
- In the Google Cloud console, open the Dataproc Create a Dataproc cluster on Compute Engine page. The Set up clusterpanel is selected.
- In the Locationsection, do the following:
- Select a Regionfor your cluster.
- Under Zone, select "Any".
Exclude zones:Specifying zones to exclude from Auto Zone placement is not supported through the Google Cloud console. This feature is available using the Google Cloud CLI and the REST API.
gcloud CLI
To create a Dataproc cluster that uses Auto Zone placement, use the gcloud dataproc clusters create
command. Set the --region
flag to a region. then
either omit the --zone
flag or set the --zone
flag to
an empty string ( --zone=""
).
--auto-zone-exclude-zones
flag to specify a comma-separated list
of zones. Auto Zone selection will select a zone from the specified region,
but exclude the listed zones from its selection criteria. Note that
there must be at least two non-excluded zones available in the cluster region.
Examples:
Basic Auto Zone usage:
gcloud dataproc clusters create CLUSTER_NAME \ --region= REGION \ other args ...
Auto Zone with excluded zones:
gcloud dataproc clusters create CLUSTER_NAME \ --region= REGION \ --auto-zone-exclude-zones= ZONE_1 , ZONE_2 \ other args ...
REST API
To create a Dataproc cluster that uses Auto Zone placement,
construct a JSON clusters.create
API request, leaving the gceClusterConfig.zoneUri
field empty. In the REST endpoint, https://dataproc.googleapis.com/v1/projects/ projectId
/regions/ region
/clusters
, insert a region name. Dataproc Auto Zone will choose
a zone for the cluster within the specified region.
To exclude specific zones, you can populate the gceClusterConfig.autoZoneExcludeZoneUris field with a list of zone names to exclude. Note that there must be at least two non-excluded zones available in the cluster region.
Use short resource names with Auto Zone placement: When specifying a resource URI, such as machineTypeUri or acceleratorTypeUri , in an Auto Zone placement REST API cluster creation request, use a short resource name without a zone specification, for example, "n1-standard-2" or "nvidia-tesla-t4".

