gcloud dataproc jobs submit spark

NAME
gcloud dataproc jobs submit spark - submit a Spark job to a cluster
SYNOPSIS
gcloud dataproc jobs submit spark ( --class = MAIN_CLASS     | --jar = MAIN_JAR ) ( --cluster = CLUSTER     | --cluster-labels =[ KEY = VALUE , …]) [ --archives =[ ARCHIVE , …]] [ --async ] [ --bucket = BUCKET ] [ --driver-log-levels =[ PACKAGE = LEVEL , …]] [ --driver-required-memory-mb = DRIVER_REQUIRED_MEMORY_MB ] [ --driver-required-vcores = DRIVER_REQUIRED_VCORES ] [ --files =[ FILE , …]] [ --jars =[ JAR , …]] [ --labels =[ KEY = VALUE , …]] [ --max-failures-per-hour = MAX_FAILURES_PER_HOUR ] [ --max-failures-total = MAX_FAILURES_TOTAL ] [ --properties =[ PROPERTY = VALUE , …]] [ --properties-file = PROPERTIES_FILE ] [ --region = REGION ] [ GCLOUD_WIDE_FLAG ] [-- JOB_ARGS …]
DESCRIPTION
Submit a Spark job to a cluster.
EXAMPLES
To submit a Spark job that runs the main class of a jar, run:
 gcloud  
dataproc  
 jobs 
  
submit  
spark  
 --cluster 
 = 
my-cluster  
 --region 
 = 
us-central1  
 --jar 
 = 
my_jar.jar  
--  
arg1  
arg2 

To submit a Spark job that runs a specific class of a jar, run:

 gcloud  
dataproc  
 jobs 
  
submit  
spark  
 --cluster 
 = 
my-cluster  
 --region 
 = 
us-central1  
 --class 
 = 
org.my.main.Class  
 --jars 
 = 
my_jar1.jar,my_jar2.jar  
--  
arg1  
arg2 

To submit a Spark job that runs a jar that is already on the cluster, run:

 gcloud  
dataproc  
 jobs 
  
submit  
spark  
 --cluster 
 = 
my-cluster  
 --region 
 = 
us-central1  
 --class 
 = 
org.apache.spark.examples.SparkPi  
 --jars 
 = 
file:///usr/lib/spark/examples/jars/spark-examples.jar  
--  
 1000 
 
POSITIONAL ARGUMENTS
[-- JOB_ARGS …]
Arguments to pass to the driver.

The '--' argument must be specified between gcloud specific args on the left and JOB_ARGS on the right.

REQUIRED FLAGS
Exactly one of these must be specified:
--class = MAIN_CLASS
The class containing the main method of the driver. Must be in a provided jar or jar that is already on the classpath
--jar = MAIN_JAR
The HCFS URI of jar file containing the driver jar.
Exactly one of these must be specified:
--cluster = CLUSTER
The Dataproc cluster to submit the job to.
--cluster-labels =[ KEY = VALUE ,…]
List of label KEY=VALUE pairs to add.

Keys must start with a lowercase character and contain only hyphens ( - ), underscores ( _ ), lowercase characters, and numbers. Values must contain only hyphens ( - ), underscores ( _ ), lowercase characters, and numbers.

Labels of Dataproc cluster on which to place the job.

OPTIONAL FLAGS
--archives =[ ARCHIVE ,…]
Comma separated list of archives to be extracted into the working directory of each executor. Must be one of the following file formats: .zip, .tar, .tar.gz, or .tgz.
--async
Return immediately, without waiting for the operation in progress to complete.
--bucket = BUCKET
The Cloud Storage bucket to stage files in. Defaults to the cluster's configured bucket.
--driver-log-levels =[ PACKAGE = LEVEL ,…]
List of package to log4j log level pairs to configure driver logging. For example: root=FATAL,com.example=INFO
--driver-required-memory-mb = DRIVER_REQUIRED_MEMORY_MB
The memory allocation requested by the job driver in megabytes (MB) for execution on the driver node group (it is used only by clusters with a driver node group).
--driver-required-vcores = DRIVER_REQUIRED_VCORES
The vCPU allocation requested by the job driver for execution on the driver node group (it is used only by clusters with a driver node group).
--files =[ FILE ,…]
Comma separated list of files to be placed in the working directory of both the app driver and executors.
--jars =[ JAR ,…]
Comma separated list of jar files to be provided to the executor and driver classpaths.
--labels =[ KEY = VALUE ,…]
List of label KEY=VALUE pairs to add.

Keys must start with a lowercase character and contain only hyphens ( - ), underscores ( _ ), lowercase characters, and numbers. Values must contain only hyphens ( - ), underscores ( _ ), lowercase characters, and numbers.

--max-failures-per-hour = MAX_FAILURES_PER_HOUR
Specifies the maximum number of times a job can be restarted per hour in event of failure. Default is 0 (no retries after job failure).
--max-failures-total = MAX_FAILURES_TOTAL
Specifies the maximum total number of times a job can be restarted after the job fails. Default is 0 (no retries after job failure).
--properties =[ PROPERTY = VALUE ,…]
List of key value pairs to configure Spark. For a list of available properties, see: https://spark.apache.org/docs/latest/configuration.html#available-properties .
--properties-file = PROPERTIES_FILE
Path to a local file or a file in a Cloud Storage bucket containing configuration properties for the job. The client machine running this command must have read permission to the file.

Specify properties in the form of property=value in the text file. For example:

  
 # Properties to set for the job: 
  
 key1 
 = 
value1  
 key2 
 = 
value2  
 # Comment out properties not used. 
  
 # key3=value3 

If a property is set in both --properties and --properties-file , the value defined in --properties takes precedence.

--region = REGION
Dataproc region to use. Each Dataproc region constitutes an independent resource namespace constrained to deploying instances into Compute Engine zones inside the region. Overrides the default dataproc/region property value for this command invocation.
GCLOUD WIDE FLAGS
These flags are available to all commands: --access-token-file , --account , --billing-project , --configuration , --flags-file , --flatten , --format , --help , --impersonate-service-account , --log-http , --project , --quiet , --trace-token , --user-output-enabled , --verbosity .

Run $ gcloud help for details.

NOTES
These variants are also available:
  gcloud  
alpha  
dataproc  
 jobs 
  
submit  
spark 
 
  gcloud  
beta  
dataproc  
 jobs 
  
submit  
spark 
 
Create a Mobile Website
View Site in Mobile | Classic
Share by: