Stay organized with collectionsSave and categorize content based on your preferences.
Cloud Monitoring provides powerful logging and diagnostics. Dataflow
integration with Monitoring lets you access Dataflow
job metrics such as job status, element counts, system lag (for streaming jobs),
and user counters from the Monitoring dashboards. You can also use
Monitoring alerts to notify you of various
conditions, such as long streaming system lag or failed jobs.
To see logs in Metrics Explorer, theworker service accountmust have theroles/monitoring.metricWriterrole.
Custom metrics
Any metric that you define in your Apache Beam pipeline is reported by
Dataflow to Monitoring as a custom metric.
Apache Beam hasthree types of pipeline metrics:Counter,Distribution, andGauge.
Dataflow reportsCounterandDistributionmetrics to Monitoring.
Distributionis reported as four submetrics suffixed with_MAX,_MIN,_MEAN, and_COUNT.
Dataflow doesn't support creating a histogram fromDistributionmetrics.
Dataflow reports incremental updates to Monitoring
approximately every 30 seconds.
To avoid conflicts, all Dataflow custom metrics are
exported as adoubledata type.
For simplicity, all Dataflow custom metrics are
exported as aGAUGEmetric kind.
You can monitor the delta over a time window for aGAUGEmetric, as shown
in the followingMonitoring Query Languageexample:
The Dataflow custom metrics appear in Monitoring asdataflow.googleapis.com/job/user_counterwith the labelsmetric_name:metric-nameandptransform:ptransform-name.
For backward compatibility, Dataflow also reports custom metrics
to Monitoring ascustom.googleapis.com/dataflow/metric-name.
The Dataflow custom metrics are subject to the limitation
ofcardinalityin Monitoring.
Each project has a limit of 100 Dataflow custom metrics. These metrics are published ascustom.googleapis.com/dataflow/metric-name.
Custom metrics that are reported to Monitoring incur charges based on theCloud Monitoring pricing.
Use Metrics Explorer
Use Monitoring to explore Dataflow metrics.
Follow the steps in this section to observe the standard metrics that are
provided for each of your Apache Beam pipelines. For more information
about using Metrics Explorer, seeCreate charts with Metrics Explorer.
In theSelect a metricpane, enterDataflow Jobin the filter.
From the list that appears, select a metric to observe for one of
your jobs.Read about the available Dataflow-related metrics.
Job status:Provide job status (Failed, Successful) as an enum
every 30 seconds and on update. Note:enumvalues might not be
charted or used for alerts, but you can retrieve this value by using the
Cloud Monitoring web interface. For alerting, use the "Failed" metric that sets
to 1 if a job fails.
Failed:Failed sets 1 if a job exits with a failure.
Use this metric to alert on and chart the number of failed pipelines.
Elapsed time:Job elapsed time (measured in seconds), reported
every 30 seconds.
System lag:Max lag across the entire pipeline, reported in
seconds.
Current vCPU count:Current # of virtual CPUs used by job and
updated on value change.
Total vCPU usage:Total # of virtual CPUs used by a job and
updated on value change.
Total Persistent Disk usage:Cumulative total of Persistent Disk used by job,
measured in GB-seconds and updated on value change. Note: there are two
different types of persistent disk (SSD and HDD). They are both reported
using the same metric name and use a metric label to differentiate.
Total memory usage:Cumulative total memory allocated to job,
measured in GB-seconds and updated on value change.
Element count:Number of elements per PCollection. Note: This
metric is per-PCollection, not job level, so it's not yet
available for alerting.
Estimated byte count:Number of bytes processed per PCollection.
Note: This metric is per-PCollection, not job level, so it's
not yet available for alerting.
Monitoring provides access to
Dataflow-related metrics. Create dashboards to chart the time
series of metrics, and create alerting policies that notify you when
metrics reach specified values.
Create groups of resources
To make it easier to set alerts and build dashboards, create resource groups
that include multiple Apache Beam pipelines.
Add filter criteria that define the Dataflow resources included in the group.
For example, one of your filter criteria can be the name prefix of your pipelines.
After the group is created, you can see the basic metrics related
to resources in that group.
Create alerting policies for Dataflow metrics
Monitoring lets you create alerts and receive notifications when a
metric crosses a specified threshold. For example, you can receive a notification when system lag
of a streaming pipeline increases above a predefined value.
On theCreate new alerting policypage, define the alerting
conditions and notification channels.For example, to set an alert on the system lag for theWindowedWordCountApache Beam pipeline group, complete the following steps:
ClickSelect a metric.
In theSelect a metricfield, enterDataflow Job.
ForMetric Categories, selectJob.
ForMetrics, selectSystem lag.
Every time an alert is triggered, an incident and a corresponding
event are created. If you specified a notification mechanism
in the alert, such as email or SMS, you also receive a notification.
Build custom monitoring dashboards
You can build Monitoring dashboards with the most relevant
Dataflow-related charts. To add a chart to a dashboard, follow these
steps:
In theSelect a metricpane, forMetric, enterDataflow Job.
Select a metric category and a metric.
You can add as many charts to the dashboard as you like.
Receive worker VM metrics from the Monitoring agent
You can use the Monitoring to monitor persistent disk, CPU,
network, and process metrics. When you run your pipeline,
from your Dataflow worker VM instances, enable theMonitoring agent. See the list ofavailable Monitoring agent metrics.
To enable the Monitoring agent, use the--experiments=enable_stackdriver_agent_metricsoption when running your pipeline. Theworker service accountmust have theroles/monitoring.metricWriterrole.
To disable the Monitoring agent without stopping your pipeline,
update your pipeline bylaunching a replacement jobwithout the--experiments=enable_stackdriver_agent_metricsparameter.
Storage and retention
Information about completed or cancelled Dataflow jobs
is stored for 30 days.
Operational logs are stored in the_Defaultlog bucket.
The logging API service name isdataflow.googleapis.com. For more information about
the Google Cloud monitored resource types and services used in Cloud Logging,
seeMonitored resources and services.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eCloud Monitoring integrates with Dataflow to provide access to job metrics like status, element counts, and system lag, and it supports alerts for conditions such as long streaming system lag or failed jobs.\u003c/p\u003e\n"],["\u003cp\u003eDataflow reports custom metrics defined in Apache Beam pipelines to Monitoring, including \u003ccode\u003eCounter\u003c/code\u003e and \u003ccode\u003eDistribution\u003c/code\u003e metrics, which are subject to Monitoring's cardinality limitations and are exported as a \u003ccode\u003eGAUGE\u003c/code\u003e metric kind.\u003c/p\u003e\n"],["\u003cp\u003eMetrics Explorer in Monitoring can be used to observe standard Dataflow metrics, including job status, failure indicators, elapsed time, system lag, resource usage (vCPU, memory, persistent disk), and element/byte counts.\u003c/p\u003e\n"],["\u003cp\u003eMonitoring allows the creation of resource groups, custom dashboards for charting Dataflow metrics, and alerting policies to notify users when metrics cross defined thresholds.\u003c/p\u003e\n"],["\u003cp\u003eThe Monitoring agent can be enabled on Dataflow worker VMs to capture metrics related to persistent disk, CPU, network and processes by including a specific flag at pipeline runtime.\u003c/p\u003e\n"]]],[],null,[]]