gcloud alpha dataproc clusters gke create

NAME
gcloud alpha dataproc clusters gke create - create a GKE-based virtual cluster
SYNOPSIS
gcloud alpha dataproc clusters gke create ( CLUSTER : --region = REGION ) --spark-engine-version = SPARK_ENGINE_VERSION ( --gke-cluster = GKE_CLUSTER : --gke-cluster-location = GKE_CLUSTER_LOCATION ) [ --async ] [ --namespace = NAMESPACE ] [ --pools =[ KEY = VALUE [; VALUE ], …]] [ --properties =[ PREFIX : PROPERTY = VALUE , …]] [ --setup-workload-identity ] [ --staging-bucket = STAGING_BUCKET ] [ --history-server-cluster = HISTORY_SERVER_CLUSTER : --history-server-cluster-region = HISTORY_SERVER_CLUSTER_REGION ] [ --metastore-service = METASTORE_SERVICE : --metastore-service-location = METASTORE_SERVICE_LOCATION ] [ GCLOUD_WIDE_FLAG ]
DESCRIPTION
(ALPHA) Create a GKE-based virtual cluster.
EXAMPLES
Create a Dataproc on GKE cluster in us-central1 on a GKE cluster in the same project and region with default values:
 gcloud  
alpha  
dataproc  
clusters  
gke  
create  
my-cluster  
 --region 
 = 
us-central1  
 --gke-cluster 
 = 
my-gke-cluster  
 --spark-engine-version 
 = 
latest  
 --pools 
 = 
 'name=dp,roles=default' 
 

Create a Dataproc on GKE cluster in us-central1 on a GKE cluster in the same project and zone us-central1-f with default values:

 gcloud  
alpha  
dataproc  
clusters  
gke  
create  
my-cluster  
 --region 
 = 
us-central1  
 --gke-cluster 
 = 
my-gke-cluster  
 --gke-cluster-location 
 = 
us-central1-f  
 --spark-engine-version 
 = 
 3 
.1  
 --pools 
 = 
 'name=dp,roles=default' 
 

Create a Dataproc on GKE cluster in us-central1 with machine type 'e2-standard-4', autoscaling 5-15 nodes per zone.

 gcloud  
alpha  
dataproc  
clusters  
gke  
create  
my-cluster  
 --region 
 = 
 'us-central1' 
  
 --gke-cluster 
 = 
 'projects/my-project/locations/us-central1/clusters/my-gke-cluster' 
  
 --spark-engine-version 
 = 
dataproc-1.5  
 --pools 
 = 
 'name=dp-default,roles=default,machineType=e2-standard-4,min=5,max=15' 
 

Create a Dataproc on GKE cluster in us-central1 with two distinct node pools.

 gcloud  
alpha  
dataproc  
clusters  
gke  
create  
my-cluster  
 --region 
 = 
 'us-central1' 
  
 --gke-cluster 
 = 
 'my-gke-cluster' 
  
 --spark-engine-version 
 = 
 'dataproc-2.0' 
  
 --pools 
 = 
 'name=dp-default,roles=default,machineType=e2-standard-4' 
  
 --pools 
 = 
 ' 
 name 
 = 
workers,roles = 
spark-drivers ; 
spark-executors,machineType = 
n2-standard-8 
POSITIONAL ARGUMENTS
Cluster resource - The name of the cluster to create. The arguments in this group can be used to specify the attributes of this resource. (NOTE) Some attributes are not given arguments in this group but can be set in other ways.

To set the project attribute:

  • provide the argument cluster on the command line with a fully specified name;
  • provide the argument --project on the command line;
  • set the property core/project .

This must be specified.

CLUSTER
ID of the cluster or fully qualified identifier for the cluster.

To set the cluster attribute:

  • provide the argument cluster on the command line.

This positional argument must be specified if any of the other arguments in this group are specified.

--region = REGION
Dataproc region for the cluster. Each Dataproc region constitutes an independent resource namespace constrained to deploying instances into Compute Engine zones inside the region. Overrides the default dataproc/region property value for this command invocation.

To set the region attribute:

  • provide the argument cluster on the command line with a fully specified name;
  • provide the argument --region on the command line;
  • set the property dataproc/region .
REQUIRED FLAGS
--spark-engine-version = SPARK_ENGINE_VERSION
The version of the Spark engine to run on this cluster.
Gke cluster resource - The GKE cluster to install the Dataproc cluster on. The arguments in this group can be used to specify the attributes of this resource. (NOTE) Some attributes are not given arguments in this group but can be set in other ways.

To set the project attribute:

  • provide the argument --gke-cluster on the command line with a fully specified name;
  • provide the argument --project on the command line;
  • set the property core/project .

This must be specified.

--gke-cluster = GKE_CLUSTER
ID of the gke-cluster or fully qualified identifier for the gke-cluster.

To set the gke-cluster attribute:

  • provide the argument --gke-cluster on the command line.

This flag argument must be specified if any of the other arguments in this group are specified.

--gke-cluster-location = GKE_CLUSTER_LOCATION
GKE region for the gke-cluster.

To set the gke-cluster-location attribute:

  • provide the argument --gke-cluster on the command line with a fully specified name;
  • provide the argument --gke-cluster-location on the command line;
  • provide the argument --region on the command line;
  • set the property dataproc/region .
OPTIONAL FLAGS
--async
Return immediately, without waiting for the operation in progress to complete.
--namespace = NAMESPACE
The name of the Kubernetes namespace to deploy Dataproc system components in. This namespace does not need to exist.
--pools =[ KEY = VALUE [; VALUE ],…]
Each --pools flag represents a GKE node pool associated with the virtual cluster. It is a comma-separated list in the form KEY=VALUE[;VALUE] , where certain keys may have multiple values.

The following KEYs must be specified:

-----------------------------------------------------------------------------------------------------------
KEY  
Type  
Example  
Description
------  


---------------------------------------------------------- name string ` my-node-pool ` Name of the node pool.
roles  
repeated  
string  
 ` 
default ; 
spark-driver ` 
  
Roles  
that  
each  
node  
pool  
will  
perform.  
 [ 
One  
Pool  
must  
have  
DEFAULT  
role ] 
  
Valid  
values  
are  
 ` 
default ` 
,  
 ` 
controller ` 
,  
 ` 
spark-driver ` 
,  
 ` 
spark-executor ` 
.
-----------------------------------------------------------------------------------------------------------

The following KEYs may be specified:

----------------------------------------------------------------------------------------------------------------------------------------------------------------
KEY  
Type  
Example  
Description
---------------  


--------------------------------------------------------------------------------- machineType string ` n1-standard-8 ` Compute Engine machine type to use.
preemptible  
boolean  
 ` 
 false 
 ` 
  
If  
true,  
 then 
  
this  
node  
pool  
uses  
preemptible  
VMs.  
This  
Must  
be  
 ` 
 false 
 ` 
  
 for 
  
a  
node  
pool  
with  
the  
CONTROLLER  
role  
or  
 for 
  
a  
node  
pool  
with  
the  
DEFAULT  
role  
 in 
  
no  
node  
pool  
has  
the  
CONTROLLER  
role.
localSsdCount  
int  
 ` 
 2 
 ` 
  
The  
number  
of  
 local 
  
SSDs  
to  
attach  
to  
each  
node.
localNvmeSsdCount  
int  
 ` 
 2 
 ` 
  
The  
number  
of  
 local 
  
NVMe  
SSDs  
to  
attach  
to  
each  
node.
accelerator  
repeated  
string  
 ` 
nvidia-tesla-a100 = 
 1 
 ` 
  
Accelerators  
to  
attach  
to  
each  
node,  
 in 
  
 NODE 
 = 
COUNT  
format.
minCpuPlatform  
string  
 ` 
Intel  
Skylake ` 
  
Minimum  
CPU  
platform  
 for 
  
each  
node.
bootDiskKmsKey  
string  
 ` 
projects/project-id/locations/us-central1  
The  
Customer  
Managed  
Encryption  
Key  
 ( 
CMEK ) 
  
used  
to  
encrypt  
/keyRings/keyRing-name/cryptoKeys/key-name ` 
  
the  
boot  
disk  
attached  
to  
each  
node  
 in 
  
the  
node  
pool.
locations  
repeated  
string  
 ` 
us-west1-a ; 
us-west1-c ` 
  
Zones  
within  
the  
location  
of  
the  
GKE  
cluster.  
All  
 ` 
--pools ` 
  
flags  
 for 
  
a  
Dataproc  
cluster  
must  
have  
identical  
locations.
min  
int  
 ` 
 0 
 ` 
  
Minimum  
number  
of  
nodes  
per  
zone  
that  
this  
node  
pool  
can  
scale  
down  
to.
max  
int  
 ` 
 10 
 ` 
  
Maximum  
number  
of  
nodes  
per  
zone  
that  
this  
node  
pool  
can  
scale  
up  
to.
----------------------------------------------------------------------------------------------------------------------------------------------------------------
--properties =[ PREFIX : PROPERTY = VALUE ,…]
Specifies configuration properties for installed packages, such as Spark. Properties are mapped to configuration files by specifying a prefix, such as "core:io.serializations".
--setup-workload-identity
Sets up the GKE Workload Identity for your Dataproc on GKE cluster. Note that running this requires elevated permissions as it will manipulate IAM policies on the Google Service Accounts that will be used by your Dataproc on GKE cluster.
--staging-bucket = STAGING_BUCKET
The Cloud Storage bucket to use to stage job dependencies, miscellaneous config files, and job driver console output when using this cluster.
History server cluster resource - A Dataproc Cluster created as a History Server, see https://cloud.google.com/dataproc/docs/concepts/jobs/history-server The arguments in this group can be used to specify the attributes of this resource. (NOTE) Some attributes are not given arguments in this group but can be set in other ways.

To set the project attribute:

  • provide the argument --history-server-cluster on the command line with a fully specified name;
  • provide the argument --project on the command line;
  • set the property core/project .
--history-server-cluster = HISTORY_SERVER_CLUSTER
ID of the history-server-cluster or fully qualified identifier for the history-server-cluster.

To set the history-server-cluster attribute:

  • provide the argument --history-server-cluster on the command line.

This flag argument must be specified if any of the other arguments in this group are specified.

--history-server-cluster-region = HISTORY_SERVER_CLUSTER_REGION
Compute Engine region for the history-server-cluster. It must be the same region as the Dataproc cluster that is being created.

To set the history-server-cluster-region attribute:

  • provide the argument --history-server-cluster on the command line with a fully specified name;
  • provide the argument --history-server-cluster-region on the command line;
  • provide the argument --region on the command line;
  • set the property dataproc/region .
Metastore service resource - Dataproc Metastore Service to be used as an external metastore. The arguments in this group can be used to specify the attributes of this resource. (NOTE) Some attributes are not given arguments in this group but can be set in other ways.

To set the project attribute:

  • provide the argument --metastore-service on the command line with a fully specified name;
  • provide the argument --project on the command line;
  • set the property core/project .
--metastore-service = METASTORE_SERVICE
ID of the metastore-service or fully qualified identifier for the metastore-service.

To set the metastore-service attribute:

  • provide the argument --metastore-service on the command line.

This flag argument must be specified if any of the other arguments in this group are specified.

--metastore-service-location = METASTORE_SERVICE_LOCATION
Dataproc Metastore location for the metastore-service.

To set the metastore-service-location attribute:

  • provide the argument --metastore-service on the command line with a fully specified name;
  • provide the argument --metastore-service-location on the command line;
  • provide the argument --region on the command line;
  • set the property dataproc/region .
GCLOUD WIDE FLAGS
These flags are available to all commands: --access-token-file , --account , --billing-project , --configuration , --flags-file , --flatten , --format , --help , --impersonate-service-account , --log-http , --project , --quiet , --trace-token , --user-output-enabled , --verbosity .

Run $ gcloud help for details.

NOTES
This command is currently in alpha and might change without notice. If this command fails with API permission errors despite specifying the correct project, you might be trying to access an API with an invitation-only early access allowlist. These variants are also available:
  gcloud  
dataproc  
clusters  
gke  
create 
 
  gcloud  
beta  
dataproc  
clusters  
gke  
create 
 
Design a Mobile Site
View Site in Mobile | Classic
Share by: