You can copy an existing Dataproc on GKE virtual cluster's configuration, update the copied configuration, and then create a new Dataproc on GKE cluster using the updated configuration.
Recreate and update a Dataproc on GKE cluster
gcloud
-
Set environment variables:
CLUSTER= existing Dataproc on GKE cluster name \ REGION= region
-
Export the existing Dataproc on GKE cluster configuration to a YAML file.
gcloud dataproc clusters export $CLUSTER \ --region=$REGION > "${CLUSTER}-config.yaml" -
Update the configuration.
-
Remove the
kubernetesNamespacefield. Removing this field is necessary to avoid a namespace conflict when you create the updated cluster.Sample
sedcommand to remove thekubernetesNamespacefield:sed -E "s/kubernetesNamespace: .+$//g" ${CLUSTER}-config.yaml -
Make additional changes to update Dataproc on GKE virtual cluster configuration settings, such as changing the Spark componentVersion .
-
-
Delete the existing Dataproc on GKE virtual cluster if you will create a cluster that has the same name as the cluster it is updating (if you are replacing the original cluster).
-
Wait for the previous delete operation to finish, and then import the updated cluster configuration to create a new Dataproc on GKE virtual cluster with the updated config settings.
gcloud dataproc clusters import $CLUSTER \ --region=$REGION \ --source="${CLUSTER}-config.yaml"
API
-
Set environment variables:
CLUSTER= existing Dataproc on GKE cluster name \ REGION= region
-
Export the existing Dataproc on GKE cluster configuration to a YAML file.
curl -X GET -H "Authorization: Bearer $(gcloud auth print-access-token)" "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters/${CLUSTER}?alt=json" > "${CLUSTER}-config.json" -
Update the configuration.
-
Remove the
kubernetesNamespacefield. Removal of this field is necessary to avoid a namespace conflict when you create the updated cluster.Sample
jqcommand to removekubernetesNamespacefield:jq 'del(.virtualClusterConfig.kubernetesClusterConfig.kubernetesNamespace)'
-
Make additional changes to update Dataproc on GKE virtual cluster configuration settings, such as changing the Spark componentVersion .
-
-
Delete the existing Dataproc on GKE virtual cluster if you will create a cluster that has the same name as the cluster it is updating (if you are replacing the original cluster).
curl -X DELETE -H "Authorization: Bearer $(gcloud auth print-access-token)" "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters/${CLUSTER}" -
Wait for the previous delete operation to finish, and then import the updated cluster configuration to create a new Dataproc on GKE virtual cluster with the updated settings.
curl -i -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json; charset=utf-8" -d "@${CLUSTER}-config.json" "https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/${REGION}/clusters?alt=json"
Console
The Google Cloud console does not support recreating a Dataproc on GKE virtual cluster by importing an existing cluster's configuration.

