Dataproc sets special metadata values for the instances that run in your cluster:
| Metadata key | Value |
|---|---|
dataproc-bucket
|
Name of the cluster's staging bucket |
dataproc-region
|
Region of the cluster's endpoint |
dataproc-worker-count
|
Number of worker nodes in the cluster. The value is 0
for single node clusters
. |
dataproc-cluster-name
|
Name of the cluster |
dataproc-cluster-uuid
|
UUID of the cluster |
dataproc-role
|
Instance's role, either Master
or Worker
|
dataproc-master
|
Hostname of the first master node. The value is either [CLUSTER_NAME]-m
in a standard or single node cluster, or [CLUSTER_NAME]-m-0
in a high-availability cluster
, where [CLUSTER_NAME]
is the name of your cluster. |
dataproc-master-additional
|
Comma-separated list of hostnames for the additional master nodes in a high-availability cluster, for example, [CLUSTER_NAME]-m-1,[CLUSTER_NAME]-m-2
in a cluster that has 3 master nodes. |
SPARK_BQ_CONNECTOR_VERSION or SPARK_BQ_CONNECTOR_URL
|
The version or URL that points to a Spark BigQuery connector version to use in Spark applications, for example, 0.42.1
or gs://spark-lib/bigquery/spark-3.5-bigquery-0.42.1.jar
. A default Spark BigQuery connector version is pre-installed in Dataproc 2.1
and later image version clusters. For more information, see Use the Spark BigQuery connector
. |
You can use these values to customize the behavior of initialization actions .
You can use the --metadata
flag in the gcloud dataproc clusters create
command to provide your own metadata:
gcloud dataproc clusters create CLUSTER_NAME \ --region= REGION \ --metadata= name1=value1,name2=value2... \ ... other flags ...

