Use Cloud Monitoring for Dataflow pipelines

Cloud Monitoring provides powerful logging and diagnostics. Dataflow integration with Monitoring lets you access Dataflow job metrics such as job status, element counts, system lag (for streaming jobs), and user counters from the Monitoring dashboards. You can also use Monitoring alerts to notify you of various conditions, such as long streaming system lag or failed jobs.

Before you begin

Follow the Java tutorial , Python tutorial , or Go tutorial to set up your Dataflow project. Then, construct and run your pipeline .

To see logs in Metrics Explorer, the worker service account must have the roles/monitoring.metricWriter role.

Custom metrics

Any metric that you define in your Apache Beam pipeline is reported by Dataflow to Monitoring as a custom metric. Apache Beam has three types of pipeline metrics : Counter , Distribution , and Gauge .

Dataflow reports Counter and Distribution metrics to Monitoring.
Distribution is reported as four submetrics suffixed with _MAX , _MIN , _MEAN , and _COUNT .
Dataflow doesn't support creating a histogram from Distribution metrics.
Dataflow reports incremental updates to Monitoring approximately every 30 seconds.
To avoid conflicts, all Dataflow custom metrics are exported as a double data type.

For simplicity, all Dataflow custom metrics are exported as a GAUGE metric kind . You can monitor the delta over a time window for a GAUGE metric, as shown in the following Monitoring Query Language example:

 fetch dataflow_job
 | metric 'dataflow.googleapis.com/job/user_counter'
 | filter (metric.job_id == '[JobID]')
 | delta 1m
 | group_by 1m, [value_user_counter_mean: mean(value.user_counter)]
 | every 1m
 | group_by [metric.ptransform, metric.metric_name],
   [value_user_counter_mean_aggregate: aggregate(value_user_counter_mean)]

The Dataflow custom metrics appear in Monitoring as dataflow.googleapis.com/job/user_counter with the labels metric_name: metric-name and ptransform: ptransform-name .
For backward compatibility, Dataflow also reports custom metrics to Monitoring as custom.googleapis.com/dataflow/ metric-name .
The Dataflow custom metrics are subject to the limitation of cardinality in Monitoring.
Each project has a limit of 100 Dataflow custom metrics. These metrics are published as custom.googleapis.com/dataflow/ metric-name .

Custom metrics that are reported to Monitoring incur charges based on the Cloud Monitoring pricing .

Use Metrics Explorer

Use Monitoring to explore Dataflow metrics. Follow the steps in this section to observe the standard metrics that are provided for each of your Apache Beam pipelines. For more information about using Metrics Explorer, see Create charts with Metrics Explorer .

In the Google Cloud console, select Monitoring:

Go to Monitoring
In the navigation pane, select Metrics explorer.
In the Select a metricpane, enter Dataflow Job in the filter.
From the list that appears, select a metric to observe for one of your jobs.
Read about the available Dataflow-related metrics.
- Job status: Provide job status (Failed, Successful) as an enum every 30 seconds and on update. Note: enum values might not be charted or used for alerts, but you can retrieve this value by using the Cloud Monitoring web interface. For alerting, use the "Failed" metric that sets to 1 if a job fails.
- Failed: Failed sets 1 if a job exits with a failure. Use this metric to alert on and chart the number of failed pipelines.
- Elapsed time: Job elapsed time (measured in seconds), reported every 30 seconds.
- System lag: Max lag across the entire pipeline, reported in seconds.
- Current vCPU count: Current # of virtual CPUs used by job and updated on value change.
- Total vCPU usage: Total # of virtual CPUs used by a job and updated on value change.
- Total Persistent Disk usage: Cumulative total of Persistent Disk used by job, measured in GB-seconds and updated on value change. Note: there are two different types of persistent disk (SSD and HDD). They are both reported using the same metric name and use a metric label to differentiate.
- Total memory usage: Cumulative total memory allocated to job, measured in GB-seconds and updated on value change.
- Element count: Number of elements per PCollection. Note: This metric is per-PCollection, not job level, so it's not yet available for alerting.
- Estimated byte count: Number of bytes processed per PCollection. Note: This metric is per-PCollection, not job level, so it's not yet available for alerting.

Use Cloud Monitoring for Dataflow pipelines

Before you begin

Custom metrics

Use Metrics Explorer

Create alerting policies and dashboards

Create groups of resources

Create alerting policies for Dataflow metrics

Build custom monitoring dashboards

Receive worker VM metrics from the Monitoring agent

Storage and retention

What's next

Use Cloud Monitoring for Dataflow pipelines Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Custom metrics

Use Metrics Explorer

Create alerting policies and dashboards

Create groups of resources

Create alerting policies for Dataflow metrics

Build custom monitoring dashboards

Receive worker VM metrics from the Monitoring agent

Storage and retention

What's next

Use Cloud Monitoring for Dataflow pipelines