Google Distributed Cloud includes multiple options for cluster logging
and monitoring, including cloud-based managed services, open source tools, and
validated compatibility with third-party commercial solutions. This document explains these options and provides some basic guidance on selecting the proper
solution for your environment.
Options for Google Distributed Cloud
You have several logging and monitoring options for your Google Distributed Cloud:
Cloud Logging and Cloud Monitoring, enabled by in-cluster agents
deployed with Google Distributed Cloud.
Prometheus and Grafana, disabled by default.
Validated configurations with third-party solutions.
Cloud Logging and Cloud Monitoring
Google Cloud Observability (formerly Stackdriver) is the built-in observability solution for
Google Cloud. It offers a fully managed logging solution, metrics
collection, monitoring, dashboarding, and alerting. Cloud Monitoring monitors
Google Distributed Cloud clusters in a similar way as cloud-based
GKE clusters.
You can configure the in-cluster agents for the scope of monitoring and logging,
as well as the level of metrics collected:
Scope of logging and monitoring can be set to system components only (the
default) or for system components and applications
Level of metrics collected can be configured for an optimized set of metrics
or for full metrics
Cloud Logging and Cloud Monitoring provide an ideal solution
for customers wanting a single, easy-to-configure, powerful cloud-based
observability solution. We highly recommend Logging and
Monitoring when running workloads only on
Google Distributed Cloud, or workloads on GKE and
Google Distributed Cloud. For applications with components running on
Google Distributed Cloud and traditional on-premises infrastructure, you might
consider other solutions for an end-to-end view of those applications.
Prometheus and Grafana can be enabled on each admin cluster and user cluster.
Prometheus and Grafana is recommended for application teams with prior
experience with those products, or for operational teams who prefer to retain
application metrics within the cluster and for troubleshooting issues when
network connectivity is lost.
Third-party solutions
Google has worked with several third-party logging and monitoring solution
providers to help their products work well with Google Distributed Cloud.
These include Datadog, Elastic, and Splunk. Additional validated third parties
will be added in the future.
For more information about using third-party solutions with
Google Distributed Cloud, see the following:
How logging and monitoring for Google Distributed Cloud works
Logging and monitoring agents are installed and activated in each
cluster when you create a new admin or user cluster. The agents collect data about system components—the scope of which you can configure.
To view the collected data on the Google Cloud console, you must configure the Google Cloud project that stores the logs and metrics you want to view.
The logging and monitoring agents on each cluster include:
GKE metrics agent(gke-metrics-agent). A DaemonSet that sends metrics to the Cloud Monitoring API.
Log forwarder(stackdriver-log-forwarder). A Fluent Bit DaemonSet that forwards logs
from each machine to Cloud Logging. The log forwarder buffers the log
entries on the node locally and resends them for up to four hours. If the
buffer gets full or if the log forwarder can't reach the Cloud Logging API
for more than four hours, then logs are dropped.
Global GKE metrics agent(gke-metrics-agent-global). A
Deployment that sends metrics to the Cloud Monitoring API.
Metadata agent(stackdriver-metadata-agent). A
Deployment that sends metadata for Kubernetes resources such as pods,
deployments, or nodes to the Stackdriver Resource Metadata API; this
data is used to enrich metric queries by enabling you to query by
deployment name, node name, or even Kubernetes service name.
You can see all the Deployment agents by running the following
command:
Configuring logging and monitoring agents for Google Distributed Cloud
The agents installed with Google Distributed Cloud collect data
about system components, subject to your settings and configuration, for the
purposes of maintaining and troubleshooting issues with your clusters.
System components only (default scope)
Upon installation, agents collect logs and metrics, including
performance details (for example, CPU and memory utilization) and similar
metadata, for Google-provided system components. These include all workloads in
the admin cluster, and for user clusters, workloads in the kube-system,
gke-system, gke-connect, istio-system, and config-management-system namespaces.
You can configure or disable the agents as described in the
following sections.
By default, the metrics agents running in the cluster collect and report an
optimized set of container, kubelet and kube-state-metrics metrics to Google Cloud Observability (formerly Stackdriver).
Fewer resources are needed to collect this
optimized set of metrics, which improves overall performance and scalability. This is especially important for container-level and kube-level metrics, due to the large
quantity of objects to monitor.
Excluded container metrics
The following container metrics are excluded from the optimized metrics:
container_cpu_cfs_periods_total
container_cpu_cfs_throttled_periods_total
container_cpu_load_average_10s
container_cpu_system_seconds_total
container_cpu_user_seconds_total
container_fs_io_current
container_fs_io_time_seconds_total
container_fs_io_time_weighted_seconds_total
container_fs_read_seconds_total
container_fs_reads_bytes_total
container_fs_reads_merged_total
container_fs_reads_total
container_fs_sector_reads_total
container_fs_sector_writes_total
container_fs_write_seconds_total
container_fs_writes_bytes_total
container_fs_writes_merged_total
container_fs_writes_total
container_last_seen
container_memory_cache
container_memory_failcnt
container_memory_mapped_file
container_memory_max_usage_bytes
container_memory_swap
container_network_receive_packets_dropped_total
container_network_receive_packets_total
container_network_transmit_packets_dropped_total
container_network_transmit_packets_total
container_start_time_seconds
container_spec_cpu_period
container_spec_cpu_quota
container_spec_cpu_shares
container_spec_memory_limit_bytes
container_spec_memory_reservation_limit_bytes
container_spec_memory_swap_limit_bytes
container_start_time_seconds
container_tasks_state
The complete set of Google Distributed Cloud metrics is documented inGKE Enterprise metrics.
Excluded kubelet metrics
The following kubelet metrics are excluded from the optimized metrics:
The complete set of Google Distributed Cloud metrics is documented inGKE Enterprise metrics.
To disable optimized kube-state-metrics metrics (not recommended), set theoptimizedMetricsfield
tofalsein your Stackdriver custom resource. For more information on changing
your Stackdriver custom resource, seeConfiguring Stackdriver component resources.
All Google Distributed Cloud metrics, including those excluded by default, are
described inGKE Enterprise metrics.
Before you disable the logging and monitoring agents, see thesupport
pagefor details about how
this affects Google Cloud Support's SLAs.
Logging and monitoring agents capture data stored locally, subject to your storage and
retention configuration. The data is replicated to the Google Cloud
project specified at installation by using a service account that is authorized to
write data to that project. You can disable these agents at any time, as
described earlier.
You can also manage and delete data that the logging and monitoring agents have sent to Cloud Logging and Cloud Monitoring. For more information, seeCloud Monitoring documentation.
Configuration requirements for logging and monitoring
To view Cloud Logging and Cloud Monitoring data, you
must configure the Google Cloud project that stores the logs and
metrics you want to view. This Google Cloud project is called yourlogging-monitoring project.
Enable the following APIs in your logging-monitoring project:
In a Google Distributed Cloud cluster, there is no charge for GKE Enterprise system logs and
metrics, which include the following:
Logs and metrics from all components in an admin cluster.
Logs and metrics from components in these namespaces in a user cluster:kube-system,gke-system,gke-connect,knative-serving,istio-system,monitoring-system,config-management-system,gatekeeper-system,cnrm-system.
How Prometheus and Grafana for Google Distributed Cloud work
Each Google Distributed Cloud cluster is created with Prometheus and Grafana
disabled by default. You can follow theinstallation guideto enable them.
The Prometheus Server is set up in a highly-available configuration with two
replicas running on two separate nodes. Resource requirements are adjusted to
support clusters running up to five nodes, with each handling up to 30 Pods that
serve custom metrics. Prometheus has a dedicated PersistentVolume with disk
space preallocated to fit data for a retention period of four days plus an added
safety buffer.
The admin control plane, as well as each user cluster, has a dedicated
monitoring stack that you can configure independently. Each admin and user
cluster includes a monitoring stack that delivers a full set of features:
Prometheus Server for monitoring, Grafana for observability, and Prometheus
Alertmanager for alerting.
All monitoring endpoints, transferred metric data, and monitoring APIs are
secured with Istio components by using mTLS and RBAC rules. Access to monitoring
data is restricted only to cluster administrators.
Metrics collected by Prometheus
Prometheus collects the following metrics and metadata from the admin control
plane and user clusters:
Resource usage, such as CPU utilization on Pods and nodes.
Kubernetes control plane metrics.
Metrics from add-ons and Kubernetes system components running on nodes, such
as kubelet.
Cluster state, such as health of Pods in a Deployment.
Application metrics.
Machine metrics, such as network, entropy, and inodes.
Multi-cluster monitoring
The Prometheus and Grafana instance installed on the admin cluster is specially
configured to provide insight across the entire Google Distributed Cloud instance,
including the admin cluster and each user cluster. This enables you to:
Use a Grafana dashboard to access metrics from all user clusters and
admin clusters.
View metrics from individual user clusters on Grafana dashboards; the
metrics are available for direct queries in full resolution.
Access user clusters' node-level and workload metrics for aggregated
queries, dashboards and alerting (workload metrics are limited to workloads
running in the kube-system namespace).
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[[["\u003cp\u003eGoogle Distributed Cloud offers diverse logging and monitoring options, including Cloud Logging and Cloud Monitoring, Prometheus and Grafana, and validated third-party solutions like Datadog, Elastic, and Splunk.\u003c/p\u003e\n"],["\u003cp\u003eCloud Logging and Cloud Monitoring, part of Google Cloud Observability, provide a managed solution for logging, metrics, dashboarding, and alerting, and are recommended for workloads on Google Distributed Cloud or in combination with GKE.\u003c/p\u003e\n"],["\u003cp\u003ePrometheus and Grafana are open-source tools that can be enabled for each cluster, offering application teams and operational teams the flexibility to retain application metrics within the cluster, as well as being used when network connectivity is lost.\u003c/p\u003e\n"],["\u003cp\u003eBy default, logging and monitoring agents in Google Distributed Cloud collect system component data, but the scope can be expanded to include application data and the level of metrics collected can be configured for an optimized set of metrics or full metrics.\u003c/p\u003e\n"],["\u003cp\u003eThe Prometheus and Grafana setup in the admin cluster supports multi-cluster monitoring, enabling consolidated access to metrics, dashboards, and alerting across all user and admin clusters.\u003c/p\u003e\n"]]],[],null,[]]