Cluster metadata

Dataproc sets special metadata values for the instances that run in your cluster:

Metadata key	Value
`dataproc-bucket`	Name of the cluster's staging bucket
`dataproc-region`	Region of the cluster's endpoint
`dataproc-worker-count`	Number of worker nodes in the cluster. The value is `0` for single node clusters .
`dataproc-cluster-name`	Name of the cluster
`dataproc-cluster-uuid`	UUID of the cluster
`dataproc-role`	Instance's role, either `Master` or `Worker`
`dataproc-master`	Hostname of the first master node. The value is either `[CLUSTER_NAME]-m` in a standard or single node cluster, or `[CLUSTER_NAME]-m-0` in a high-availability cluster , where `[CLUSTER_NAME]` is the name of your cluster.
`dataproc-master-additional`	Comma-separated list of hostnames for the additional master nodes in a high-availability cluster, for example, `[CLUSTER_NAME]-m-1,[CLUSTER_NAME]-m-2` in a cluster that has 3 master nodes.
`SPARK_BQ_CONNECTOR_VERSION or SPARK_BQ_CONNECTOR_URL`	The version or URL that points to a Spark BigQuery connector version to use in Spark applications, for example, `0.42.1` or `gs://spark-lib/bigquery/spark-3.5-bigquery-0.42.1.jar` . A default Spark BigQuery connector version is pre-installed in Dataproc `2.1` and later image version clusters. For more information, see Use the Spark BigQuery connector .

You can use these values to customize the behavior of initialization actions .

You can use the --metadata flag in the gcloud dataproc clusters create command to provide your own metadata:

gcloud dataproc clusters create CLUSTER_NAME 
\
    --region= REGION 
\
    --metadata= name1=value1,name2=value2... 
\
     ... other flags ...

Cluster metadata Stay organized with collections Save and categorize content based on your preferences.