- JSON representation
- ClusterType
- ClusterTier
- GceClusterConfig
- PrivateIpv6GoogleAccess
- ReservationAffinity
- Type
- NodeGroupAffinity
- ShieldedInstanceConfig
- ConfidentialInstanceConfig
- SoftwareConfig
- Component
- NodeInitializationAction
- EncryptionConfig
- AutoscalingConfig
- SecurityConfig
- KerberosConfig
- IdentityConfig
- LifecycleConfig
- EndpointConfig
- DataprocMetricConfig
- Metric
- MetricSource
- AuxiliaryNodeGroup
The cluster config.
| JSON representation | 
|---|
| { "clusterType" : enum ( | 
| Fields | |
|---|---|
| clusterType |   Optional. The type of the cluster. | 
| clusterTier |   Optional. The cluster tier. | 
| configBucket |   Optional. A Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Dataproc staging and temp buckets 
). This field requires a Cloud Storage bucket name, not a  | 
| tempBucket |   Optional. A Cloud Storage bucket used to store ephemeral cluster and jobs data, such as Spark and MapReduce history files. If you do not specify a temp bucket, Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's temp bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket. The default bucket has a TTL of 90 days, but you can use any TTL (or none) if you specify a bucket (see Dataproc staging and temp buckets 
). This field requires a Cloud Storage bucket name, not a  | 
| gceClusterConfig |   Optional. The shared Compute Engine config settings for all instances in a cluster. | 
| masterConfig |   Optional. The Compute Engine config settings for the cluster's master instance. | 
| workerConfig |   Optional. The Compute Engine config settings for the cluster's worker instances. | 
| secondaryWorkerConfig |   Optional. The Compute Engine config settings for a cluster's secondary worker instances | 
| softwareConfig |   Optional. The config settings for cluster software. | 
| initializationActions[] |   Optional. Commands to execute on each node after config is completed. By default, executables are run on master and all worker nodes. You can test a node's    | 
| encryptionConfig |   Optional. Encryption settings for the cluster. | 
| autoscalingConfig |   Optional. Autoscaling config for the policy associated with the cluster. Cluster does not autoscale if this field is unset. | 
| securityConfig |   Optional. Security settings for the cluster. | 
| lifecycleConfig |   Optional. Lifecycle setting for the cluster. | 
| endpointConfig |   Optional. Port/endpoint configuration for this cluster | 
| metastoreConfig |   Optional. Metastore configuration. | 
| dataprocMetricConfig |   Optional. The config for Dataproc metrics. | 
| auxiliaryNodeGroups[] |   Optional. The node group settings. | 
ClusterType
The type of the cluster.
| Enums | |
|---|---|
| CLUSTER_TYPE_UNSPECIFIED | Not set. | 
| STANDARD | Standard dataproc cluster with a minimum of two primary workers. | 
| SINGLE_NODE | https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/single-node-clusters | 
| ZERO_SCALE | Clusters that can use only secondary workers and be scaled down to zero secondary worker nodes. | 
ClusterTier
The cluster tier.
| Enums | |
|---|---|
| CLUSTER_TIER_UNSPECIFIED | Not set. Works the same as CLUSTER_TIER_STANDARD. | 
| CLUSTER_TIER_STANDARD | Standard Dataproc cluster. | 
| CLUSTER_TIER_PREMIUM | Premium Dataproc cluster. | 
GceClusterConfig
Common config settings for resources of Compute Engine cluster instances, applicable to all instances in the cluster.
| JSON representation | 
|---|
| { "zoneUri" : string , "networkUri" : string , "subnetworkUri" : string , "privateIpv6GoogleAccess" : enum ( | 
zoneUri 
 string 
Optional. The Compute Engine zone where the Dataproc cluster will be located. If omitted, the service will pick a zone in the cluster's Compute Engine region. On a get request, zone will always be present.
A full URL, partial URI, or short name are valid. Examples:
-  https://www.googleapis.com/compute/v1/projects/[projectId]/zones/[zone]
-  projects/[projectId]/zones/[zone]
-  [zone]
networkUri 
 string 
Optional. The Compute Engine network to be used for machine communications. Cannot be specified with subnetworkUri. If neither networkUri 
nor subnetworkUri 
is specified, the "default" network of the project is used, if it exists. Cannot be a "Custom Subnet Network" (see Using Subnetworks 
for more information).
A full URL, partial URI, or short name are valid. Examples:
-  https://www.googleapis.com/compute/v1/projects/[projectId]/global/networks/default
-  projects/[projectId]/global/networks/default
-  default
subnetworkUri 
 string 
Optional. The Compute Engine subnetwork to be used for machine communications. Cannot be specified with networkUri.
A full URL, partial URI, or short name are valid. Examples:
-  https://www.googleapis.com/compute/v1/projects/[projectId]/regions/[region]/subnetworks/sub0
-  projects/[projectId]/regions/[region]/subnetworks/sub0
-  sub0
privateIpv6GoogleAccess 
 enum (  
 PrivateIpv6GoogleAccess 
 
)
Optional. The type of IPv6 access for a cluster.
serviceAccount 
 string 
Optional. The Dataproc service account (also see VM Data Plane identity ) used by Dataproc cluster VM instances to access Google Cloud Platform services.
If not specified, the Compute Engine default service account is used.
serviceAccountScopes[] 
 string 
Optional. The URIs of service account scopes to be included in Compute Engine instances. The following base set of scopes is always included:
- https://www.googleapis.com/auth/cloud.useraccounts.readonly
- https://www.googleapis.com/auth/devstorage.read_write
- https://www.googleapis.com/auth/logging.write
If no scopes are specified, the following defaults are also provided:
reservationAffinity 
 object (  
 ReservationAffinity 
 
)
Optional. Reservation Affinity for consuming Zonal reservation.
nodeGroupAffinity 
 object (  
 NodeGroupAffinity 
 
)
Optional. Node Group Affinity for sole-tenant clusters.
shieldedInstanceConfig 
 object (  
 ShieldedInstanceConfig 
 
)
Optional. Shielded Instance Config for clusters using Compute Engine Shielded VMs .
confidentialInstanceConfig 
 object (  
 ConfidentialInstanceConfig 
 
)
Optional. Confidential Instance Config for clusters using Confidential VMs .
internalIpOnly 
 boolean 
Optional. This setting applies to subnetwork-enabled networks. It is set to true 
by default in clusters created with image versions 2.2.x.
When set to true 
:
- All cluster VMs have internal IP addresses.
- Google Private Access must be enabled to access Dataproc and other Google Cloud APIs.
- Off-cluster dependencies must be configured to be accessible without external IP addresses.
When set to false 
:
- Cluster VMs are not restricted to internal IP addresses.
- Ephemeral external IP addresses are assigned to each cluster VM.
PrivateIpv6GoogleAccess
 PrivateIpv6GoogleAccess 
controls whether and how Dataproc cluster nodes can communicate with Google Services through gRPC over IPv6. These values are directly mapped to corresponding values in the Compute Engine Instance fields 
.
| Enums | |
|---|---|
| PRIVATE_IPV6_GOOGLE_ACCESS_UNSPECIFIED | If unspecified, Compute Engine default behavior will apply, which is the same as  INHERIT_FROM_SUBNETWORK 
. | 
| INHERIT_FROM_SUBNETWORK | Private access to and from Google Services configuration inherited from the subnetwork configuration. This is the default Compute Engine behavior. | 
| OUTBOUND | Enables outbound private IPv6 access to Google Services from the Dataproc cluster. | 
| BIDIRECTIONAL | Enables bidirectional private IPv6 access between Google Services and the Dataproc cluster. | 
ReservationAffinity
Reservation Affinity for consuming Zonal reservation.
| JSON representation | 
|---|
|  { 
 "consumeReservationType" 
 : 
 enum (  | 
| Fields | |
|---|---|
| consumeReservationType |   Optional. Type of reservation to consume | 
| key |   Optional. Corresponds to the label key of reservation resource. | 
| values[] |   Optional. Corresponds to the label values of reservation resource. | 
Type
Indicates whether to consume capacity from an reservation or not.
| Enums | |
|---|---|
| TYPE_UNSPECIFIED | |
| NO_RESERVATION | Do not consume from any allocated capacity. | 
| ANY_RESERVATION | Consume any reservation available. | 
| SPECIFIC_RESERVATION | Must consume from a specific reservation. Must specify key value fields for specifying the reservations. | 
NodeGroupAffinity
Node Group Affinity for clusters using sole-tenant node groups. The Dataproc NodeGroupAffinity 
resource is not related to the Dataproc  NodeGroup 
 
resource.
| JSON representation | 
|---|
| { "nodeGroupUri" : string } | 
nodeGroupUri 
 string 
Required. The URI of a sole-tenant node group resource that the cluster will be created on.
A full URL, partial URI, or node group name are valid. Examples:
-  https://www.googleapis.com/compute/v1/projects/[projectId]/zones/[zone]/nodeGroups/node-group-1
-  projects/[projectId]/zones/[zone]/nodeGroups/node-group-1
-  node-group-1
ShieldedInstanceConfig
Shielded Instance Config for clusters using Compute Engine Shielded VMs .
| JSON representation | 
|---|
| { "enableSecureBoot" : boolean , "enableVtpm" : boolean , "enableIntegrityMonitoring" : boolean } | 
| Fields | |
|---|---|
| enableSecureBoot |   Optional. Defines whether instances have Secure Boot enabled. | 
| enableVtpm |   Optional. Defines whether instances have the vTPM enabled. | 
| enableIntegrityMonitoring |   Optional. Defines whether instances have integrity monitoring enabled. | 
ConfidentialInstanceConfig
Confidential Instance Config for clusters using Confidential VMs
| JSON representation | 
|---|
| { "enableConfidentialCompute" : boolean } | 
| Fields | |
|---|---|
| enableConfidentialCompute |   Optional. Defines whether the instance should have confidential compute enabled. | 
SoftwareConfig
Specifies the selection and config of software inside the cluster.
| JSON representation | 
|---|
|  { 
 "imageVersion" 
 : 
 string 
 , 
 "properties" 
 : 
 { 
 string 
 : 
 string 
 , 
 ... 
 } 
 , 
 "optionalComponents" 
 : 
 [ 
 enum (  | 
imageVersion 
 string 
Optional. The version of software inside the cluster. It must be one of the supported Dataproc Versions , such as "1.2" (including a subminor version, such as "1.2.29"), or the "preview" version . If unspecified, it defaults to the latest Debian version.
properties 
 map (key: string, value: string) 
Optional. The properties to set on daemon config files.
Property keys are specified in prefix:property 
format, for example core:hadoop.tmp.dir 
. The following are supported prefixes and their mappings:
- capacity-scheduler: capacity-scheduler.xml
- core: core-site.xml
- distcp: distcp-default.xml
- hdfs: hdfs-site.xml
- hive: hive-site.xml
- mapred: mapred-site.xml
- pig: pig.properties
- spark: spark-defaults.conf
- yarn: yarn-site.xml
For more information, see Cluster properties .
An object containing a list of "key": value 
pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" } 
.
optionalComponents[] 
 enum (  
 Component 
 
)
Optional. The set of components to activate on the cluster.
Component
Cluster components that can be activated.
| Enums | |
|---|---|
| COMPONENT_UNSPECIFIED | Unspecified component. Specifying this will cause Cluster creation to fail. | 
| ANACONDA | The Anaconda component is no longer supported or applicable to supported Dataproc on Compute Engine image versions . It cannot be activated on clusters created with supported Dataproc on Compute Engine image versions. | 
| DELTA | Delta Lake. | 
| DOCKER | Docker | 
| DRUID | The Druid query engine. (alpha) | 
| FLINK | Flink | 
| HBASE | HBase. (beta) | 
| HIVE_WEBHCAT | The Hive Web HCatalog (the REST service for accessing HCatalog). | 
| HUDI | Hudi. | 
| ICEBERG | Iceberg. | 
| JUPYTER | The Jupyter Notebook. | 
| PRESTO | The Presto query engine. | 
| TRINO | The Trino query engine. | 
| RANGER | The Ranger service. | 
| SOLR | The Solr service. | 
| ZEPPELIN | The Zeppelin notebook. | 
| ZOOKEEPER | The Zookeeper service. | 
| JUPYTER_KERNEL_GATEWAY | The Jupyter Kernel Gateway. | 
NodeInitializationAction
Specifies an executable to run on a fully configured node and a timeout period for executable completion.
| JSON representation | 
|---|
| { "executableFile" : string , "executionTimeout" : string } | 
| Fields | |
|---|---|
| executableFile |   Required. Cloud Storage URI of executable file. | 
| executionTimeout |   Optional. Amount of time executable has to complete. Default is 10 minutes (see JSON representation of Duration ). Cluster creation fails with an explanatory error message (the name of the executable that caused the error and the exceeded timeout period) if the executable is not completed at end of the timeout period. | 
EncryptionConfig
Encryption settings for the cluster.
| JSON representation | 
|---|
| { "gcePdKmsKeyName" : string , "kmsKey" : string } | 
gcePdKmsKeyName 
 string 
Optional. The Cloud KMS key resource name to use for persistent disk encryption for all instances in the cluster. See Use CMEK with cluster data for more information.
kmsKey 
 string 
Optional. The Cloud KMS key resource name to use for cluster persistent disk and job argument encryption. See Use CMEK with cluster data for more information.
When this key resource name is provided, the following job arguments of the following job types submitted to the cluster are encrypted using CMEK:
- FlinkJob args
- HadoopJob args
- SparkJob args
- SparkRJob args
- PySparkJob args
- SparkSqlJob scriptVariables and queryList.queries
- HiveJob scriptVariables and queryList.queries
- PigJob scriptVariables and queryList.queries
- PrestoJob scriptVariables and queryList.queries
AutoscalingConfig
Autoscaling Policy config associated with the cluster.
| JSON representation | 
|---|
| { "policyUri" : string } | 
policyUri 
 string 
Optional. The autoscaling policy used by the cluster.
Only resource names including projectid and location (region) are valid. Examples:
-  https://www.googleapis.com/compute/v1/projects/[projectId]/locations/[dataproc_region]/autoscalingPolicies/[policyId]
-  projects/[projectId]/locations/[dataproc_region]/autoscalingPolicies/[policyId]
Note that the policy must be in the same project and Dataproc region.
SecurityConfig
Security related configuration, including encryption, Kerberos, etc.
| JSON representation | 
|---|
| { "kerberosConfig" : { object ( | 
| Fields | |
|---|---|
| kerberosConfig |   Optional. Kerberos related configuration. | 
| identityConfig |   Optional. Identity related configuration, including service account based secure multi-tenancy user mappings. | 
KerberosConfig
Specifies Kerberos related configuration.
| JSON representation | 
|---|
| { "enableKerberos" : boolean , "rootPrincipalPasswordUri" : string , "kmsKeyUri" : string , "keystoreUri" : string , "truststoreUri" : string , "keystorePasswordUri" : string , "keyPasswordUri" : string , "truststorePasswordUri" : string , "crossRealmTrustRealm" : string , "crossRealmTrustKdc" : string , "crossRealmTrustAdminServer" : string , "crossRealmTrustSharedPasswordUri" : string , "kdcDbKeyUri" : string , "tgtLifetimeHours" : integer , "realm" : string } | 
| Fields | |
|---|---|
| enableKerberos |   Optional. Flag to indicate whether to Kerberize the cluster (default: false). Set this field to true to enable Kerberos on a cluster. | 
| rootPrincipalPasswordUri |   Optional. The Cloud Storage URI of a KMS encrypted file containing the root principal password. | 
| kmsKeyUri |   Optional. The URI of the KMS key used to encrypt sensitive files. | 
| keystoreUri |   Optional. The Cloud Storage URI of the keystore file used for SSL encryption. If not provided, Dataproc will provide a self-signed certificate. | 
| truststoreUri |   Optional. The Cloud Storage URI of the truststore file used for SSL encryption. If not provided, Dataproc will provide a self-signed certificate. | 
| keystorePasswordUri |   Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided keystore. For the self-signed certificate, this password is generated by Dataproc. | 
| keyPasswordUri |   Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided key. For the self-signed certificate, this password is generated by Dataproc. | 
| truststorePasswordUri |   Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided truststore. For the self-signed certificate, this password is generated by Dataproc. | 
| crossRealmTrustRealm |   Optional. The remote realm the Dataproc on-cluster KDC will trust, should the user enable cross realm trust. | 
| crossRealmTrustKdc |   Optional. The KDC (IP or hostname) for the remote trusted realm in a cross realm trust relationship. | 
| crossRealmTrustAdminServer |   Optional. The admin server (IP or hostname) for the remote trusted realm in a cross realm trust relationship. | 
| crossRealmTrustSharedPasswordUri |   Optional. The Cloud Storage URI of a KMS encrypted file containing the shared password between the on-cluster Kerberos realm and the remote trusted realm, in a cross realm trust relationship. | 
| kdcDbKeyUri |   Optional. The Cloud Storage URI of a KMS encrypted file containing the master key of the KDC database. | 
| tgtLifetimeHours |   Optional. The lifetime of the ticket granting ticket, in hours. If not specified, or user specifies 0, then default value 10 will be used. | 
| realm |   Optional. The name of the on-cluster Kerberos realm. If not specified, the uppercased domain of hostnames will be the realm. | 
IdentityConfig
Identity related configuration, including service account based secure multi-tenancy user mappings.
| JSON representation | 
|---|
| { "userServiceAccountMapping" : { string : string , ... } } | 
| Fields | |
|---|---|
| userServiceAccountMapping |   Required. Map of user to service account. An object containing a list of  | 
LifecycleConfig
Specifies the cluster auto-delete schedule configuration.
| JSON representation | 
|---|
| { "idleDeleteTtl" : string , "idleStopTtl" : string , "idleStartTime" : string , // Union field | 
idleDeleteTtl 
idleStopTtl 
idleStartTime 
 string (  
 Timestamp 
 
format)
Output only. The time when cluster became idle (most recent job finished) and became eligible for deletion due to idleness (see JSON representation of Timestamp ).
Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z" 
, "2014-10-02T15:01:23.045123456Z" 
or "2014-10-02T15:01:23+05:30" 
.
ttl 
. Either the exact time the cluster should be deleted at or the cluster maximum age. ttl 
can be only one of the following:autoDeleteTime 
 string (  
 Timestamp 
 
format)
Optional. The time when cluster will be auto-deleted (see JSON representation of Timestamp ).
Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z" 
, "2014-10-02T15:01:23.045123456Z" 
or "2014-10-02T15:01:23+05:30" 
.
autoDeleteTtl 
stop_ttl 
. Either the exact time the cluster should be stopped at or the cluster maximum age. stop_ttl 
can be only one of the following:autoStopTime 
 string (  
 Timestamp 
 
format)
Optional. The time when cluster will be auto-stopped (see JSON representation of Timestamp ).
Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z" 
, "2014-10-02T15:01:23.045123456Z" 
or "2014-10-02T15:01:23+05:30" 
.
autoStopTtl 
 string (  
 Duration 
 
format)
Optional. The lifetime duration of the cluster. The cluster will be auto-stopped at the end of this period, calculated from the time of submission of the create or update cluster request. Minimum value is 10 minutes; maximum value is 14 days (see JSON representation of Duration ).
EndpointConfig
Endpoint config for this cluster
| JSON representation | 
|---|
| { "httpPorts" : { string : string , ... } , "enableHttpPortAccess" : boolean } | 
| Fields | |
|---|---|
| httpPorts |   Output only. The map of port descriptions to URLs. Will only be populated if enableHttpPortAccess is true. An object containing a list of  | 
| enableHttpPortAccess |   Optional. If true, enable http access to specific ports on the cluster from external sources. Defaults to false. | 
DataprocMetricConfig
Dataproc metric config.
| JSON representation | 
|---|
|  { 
 "metrics" 
 : 
 [ 
 { 
 object (  | 
| Fields | |
|---|---|
| metrics[] |   Required. Metrics sources to enable. | 
Metric
A Dataproc custom metric.
| JSON representation | 
|---|
|  { 
 "metricSource" 
 : 
 enum (  | 
metricSource 
 enum (  
 MetricSource 
 
)
Required. A standard set of metrics is collected unless metricOverrides 
are specified for the metric source (see Custom metrics 
for more information).
metricOverrides[] 
 string 
Optional. Specify one or more Custom metrics 
to collect for the metric course (for the SPARK 
metric source (any Spark metric 
can be specified).
Provide metrics in the following format:
  METRIC_SOURCE 
: INSTANCE 
: GROUP 
: METRIC 
 
Use camelcase as appropriate.
Examples:
 yarn:ResourceManager:QueueMetrics:AppsCompleted
spark:driver:DAGScheduler:job.allJobs
sparkHistoryServer:JVM:Memory:NonHeapMemoryUsage.committed
hiveserver2:JVM:Memory:NonHeapMemoryUsage.used 
 
Notes:
- Only the specified overridden metrics are collected for the  metric source. For example, if one or more spark:executivemetrics are listed as metric overrides, otherSPARKmetrics are not collected. The collection of the metrics for other enabled custom metric sources is unaffected. For example, if bothSPARKandYARNmetric sources are enabled, and overrides are provided for Spark metrics only, all YARN metrics are collected.
MetricSource
A source for the collection of Dataproc custom metrics (see Custom metrics ).
| Enums | |
|---|---|
| METRIC_SOURCE_UNSPECIFIED | Required unspecified metric source. | 
| MONITORING_AGENT_DEFAULTS | Monitoring agent metrics. If this source is enabled, Dataproc enables the monitoring agent in Compute Engine, and collects monitoring agent metrics, which are published with an agent.googleapis.comprefix. | 
| HDFS | HDFS metric source. | 
| SPARK | Spark metric source. | 
| YARN | YARN metric source. | 
| SPARK_HISTORY_SERVER | Spark History Server metric source. | 
| HIVESERVER2 | Hiveserver2 metric source. | 
| HIVEMETASTORE | hivemetastore metric source | 
| FLINK | flink metric source | 
AuxiliaryNodeGroup
Node group identification and configuration information.
| JSON representation | 
|---|
|  { 
 "nodeGroup" 
 : 
 { 
 object (  | 
| Fields | |
|---|---|
| nodeGroup |   Required. Node group configuration. | 
| nodeGroupId |   Optional. A node group ID. Generated if not specified. The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). Cannot begin or end with underscore or hyphen. Must consist of from 3 to 33 characters. | 

