Cloud Profiler continuously gathers and reports application CPU usage and memory-allocation information.
Requirements:
-  Profiler supports only Dataproc Hadoop and Spark job types (Spark, PySpark, SparkSql, and SparkR). 
-  Jobs must run longer than 3 minutes to allow Profiler to collect and upload data to your project. 
Dataproc recognizes cloud.profiler.enable 
and the other cloud.profiler.* 
properties (see Profiler options 
), and then appends
the relevant profiler JVM options to the following configurations:
- Spark: spark.driver.extraJavaOptionsandspark.executor.extraJavaOptions
- MapReduce: mapreduce.task.profileand othermapreduce.task.profile.*properties
Enable profiling
Complete the following steps to enable and use the Profiler on your Dataproc Spark and Hadoop jobs.
-  Create a Dataproc cluster with service account scopes set to monitoringto allow the cluster to talk to the profiler service.
-  If you are using a custom VM service account , grant the Cloud Profiler Agent role to the custom VM service account. This role contains required profiler service permissions. 
gcloud
gcloud dataproc clusters create cluster-name \ --scopes=cloud-platform \ --region= region \ other args ...
Submit a Dataproc job with Profiler options
-  Submit a Dataproc Spark or Hadoop job 
with one or more of the following Profiler options: Option Description Value Required/Optional Default Notes cloud.profiler.enableEnable profiling of the job trueorfalseRequired falsecloud.profiler.nameName used to create profile on the Profiler Service profile-name Optional Dataproc job UUID cloud.profiler.service.versionA user-supplied string to identify and distinguish profiler results. Profiler Service Version Optional Dataproc job UUID mapreduce.task.profile.mapsNumeric range of map tasks to profile (example: for up to 100, specify "0-100") number range Optional 0-10000 Applies to Hadoop mapreduce jobs only mapreduce.task.profile.reducesNumeric range of reducer tasks to profile (example: for up to 100, specify "0-100") number range Optional 0-10000 Applies to Hadoop mapreduce jobs only 
PySpark Example
Google Cloud CLI
PySpark job submit with profiling example:
gcloud dataproc jobs submit pyspark python-job-file \ --cluster= cluster-name \ --region= region \ --properties=cloud.profiler.enable=true,cloud.profiler.name= profiler_name ,cloud.profiler.service.version= version \ -- job args
Two profiles will be created:
-  profiler_name -driverto profile spark driver tasks
-  profiler_name -executorto profile spark executor tasks
For example, if the profiler_name 
is "spark_word_count_job", spark_word_count_job-driver 
and spark_word_count_job-executor 
profiles are created.
Hadoop Example
gcloud CLI
Hadoop (teragen mapreduce) job submit with profiling example:
gcloud dataproc jobs submit hadoop \ --cluster= cluster-name \ --region= region \ --jar= jar-file \ --properties=cloud.profiler.enable=true,cloud.profiler.name= profiler_name ,cloud.profiler.service.version= version \ -- teragen 100000 gs:// bucket-name
View profiles
View profiles from the Profiler on the Google Cloud console.
Whats next
- See the Monitoring documentation
- See the Logging documentation
- Explore Google Cloud Observability

