Supported monitoring metrics

This page lists metrics available for Memorystore for Redis Cluster and describes what each metric measures.

Backup metrics

This section lists and describes backup and import metrics.

Cluster-level metrics

This section lists and describes cluster-level backup and import metrics.

Metric name Description
redis.googleapis.com/cluster/backup/last_backup_start_time This metric shows the start time of the last backup operation.
redis.googleapis.com/cluster/backup/last_backup_status This metric shows whether the most recent backup attempt completed successfully or failed. The statuses are 1 for Success and 0 for Failed .
redis.googleapis.com/cluster/backup/last_backup_duration This metric shows the duration of the last backup operation (in milliseconds).
redis.googleapis.com/cluster/backup/last_backup_size This metric shows the size of the last backup (in bytes). This metric is a key indicator for monitoring backup efficiency and storage capacity planning.
redis.googleapis.com/cluster/import/last_import_start_time This metric shows the start time of the last import operation.
redis.googleapis.com/cluster/import/last_import_duration This metric shows the duration of the last import operation (in milliseconds).

Certificate Authority (CA) metrics

This section lists metrics that are associated with customer-managed Certificate Authorities (CA) .

Cluster-level metrics

These metrics provide a high-level overview of the certificates that are associated with machines in a cluster.

Metric name
Description
redis.googleapis.com/cluster/security/rotate_tls_cert_count

This metric shows the status of rotating certificates that are associated with machines in a cluster.

The metric can have the following statuses:

  • SUCCESS : Memorystore for Redis Cluster rotated the certificate.
  • FAILED : Memorystore for Redis Cluster didn't rotate the certificate because the certificate isn't available, Memorystore for Redis Cluster doesn't have permissions to rotate the certificate, or there's an internal error.
  • SKIPPED : Memorystore for Redis Cluster skipped rotating the certificate because it doesn't have to be rotated.

Cloud Monitoring metrics

This section lists and describes Cloud Monitoring metrics that are available for Memorystore for Redis Cluster.

Cluster-level metrics

These metrics provide a high-level overview of the overall health and performance of a cluster. You can use the metrics to understand the overall capacity and utilization of a cluster as well as to identify potential bottlenecks or areas for improvement.

Metric name Description
redis.googleapis.com/cluster/clients/average_connected_clients This metric measures the average number of active client connections to a cluster over a specified time. You can use the metric to monitor connection scaling, identify application bottlenecks, and ensure that the cluster is stable.
redis.googleapis.com/cluster/clients/maximum_connected_clients This metric shows the maximum number of active client connections across all nodes of a cluster. You can use the metric to monitor the highest connection load on the cluster at any time. This is critical to ensure a high performance for the cluster because high connection counts can increase response times.
redis.googleapis.com/cluster/clients/total_connected_clients This metric tracks the current number of active client connections to a cluster. You can use the metric to monitor the load of your database and prevent connection limits.
redis.googleapis.com/cluster/stats/total_connections_received_count This metric shows the cumulative number of client connections that are created in a cluster in the last minute. You can use the metric to analyze traffic load, ensure that connection limits aren't exceeded, and determine you need to scale the cluster.
redis.googleapis.com/cluster/stats/total_rejected_connections_count This metric tracks the total number of connections to a cluster that are rejected because the maxclients limit is reached.
redis.googleapis.com/cluster/commandstats/total_usec_count This metric measures the total CPU time that each command consumes. The metric indicates the total microseconds used, which provides insight into a cluster's performance and latency.
redis.googleapis.com/cluster/commandstats/total_calls_count This metric measures the total number of calls that are associated with a specific command on a cluster node in one minute. To identify bottlenecks or high traffic on specific commands, you can use the metric to monitor command throughput (commands per minute) across primary and replica nodes.
redis.googleapis.com/cluster/cpu/average_utilization This metric shows the mean CPU utilization for a cluster (from 0.0 to 1.0). You can use the metric to identify overprovisioned or underutilized resources, manage auto scaling thresholds, and detect performance bottlenecks, with an ideal utilization of 40%-70%.
redis.googleapis.com/cluster/cpu/maximum_utilization

This metric shows the peak CPU usage across all nodes in a cluster (from 0.0 to 1.0).

The metric summarizes only the sys_main_thread and user_main_thread states. It doesn't include other CPU states (such as sys_children or user_children ) that are available in the /cluster/node/cpu/utilization metric.

Make sure that CPU utilization doesn't exceed 0.8 seconds for the primary node and 0.5 seconds for each replica that's designated as a read replica. For more information, see CPU usage best practices .

redis.googleapis.com/cluster/stats/average_expired_keys This metric measures the mean number of key expiration events for all primary nodes of a cluster. You can use the metric to monitor the number of keys that are expiring.
redis.googleapis.com/cluster/stats/maximum_expired_keys This metric measures the maximum number of key expiration events that are occurring across all primary nodes of a cluster.
redis.googleapis.com/cluster/stats/total_expired_keys_count This metric tracks the total number of key expiration events that are occurring across all primary nodes of a cluster. You can use the to monitor the number of keys that are expiring.
redis.googleapis.com/cluster/stats/average_evicted_keys This metric tracks the mean number of keys that are evicted because of memory capacity constraints across the primary shards of a cluster.
redis.googleapis.com/cluster/stats/maximum_evicted_keys This metric shows the highest number of keys that are evicted from a node or shard of a primary cluster because of memory capacity.
redis.googleapis.com/cluster/stats/total_evicted_keys_count This metric shows the total number of keys that are evicted by a node of of a primary cluster because of memory capacity.
redis.googleapis.com/cluster/keyspace/total_keys This metric shows the number of keys that are stored in a cluster.
redis.googleapis.com/cluster/stats/average_keyspace_hits This metric shows the mean number of successful lookups of keys across a cluster.
redis.googleapis.com/cluster/stats/maximum_keyspace_hits This metric shows the maximum number of successful lookups of keys in a cluster node.
redis.googleapis.com/cluster/stats/total_keyspace_hits_count This metric shows the number of successful lookups of keys across a cluster.
redis.googleapis.com/cluster/stats/average_keyspace_misses This metric shows the mean number of failed lookups of keys across a cluster.
redis.googleapis.com/cluster/stats/maximum_keyspace_misses This metric shows the maximum number of failed lookups of keys across a cluster node.
redis.googleapis.com/cluster/stats/total_keyspace_misses_count This metric shows the total number of failed lookups of keys across all cluster nodes.
redis.googleapis.com/cluster/memory/average_utilization This metric shows the mean memory utilization across a cluster (from 0.0 to 1.0).
redis.googleapis.com/cluster/memory/maximum_utilization This metric shows the maximum memory utilization across a cluster node (from 0.0 to 1.0).
redis.googleapis.com/cluster/memory/total_used_memory This metric shows the total memory usage of a cluster.
redis.googleapis.com/cluster/memory/size This metric shows the memory size of a cluster.
redis.googleapis.com/cluster/replication/average_ack_lag This metric shows the mean acknowledgement lag (in seconds) of replicas across a cluster.

Acknowledgment lag is a bottleneck on the primary node in a cluster. This bottleneck is caused by its replicas that can't keep up with the information that the primary node sends to them. When this happens, the primary node must wait for the acknowledgment that the replicas received the information. This might slow down transaction commits and cause a performance hit on the primary node.
redis.googleapis.com/cluster/replication/maximum_ack_lag This metric shows the maximum acknowledgement lag (in seconds) of replicas across a cluster.
redis.googleapis.com/cluster/replication/average_offset_diff This metric shows the mean replication acknowledge offset diff (in bytes) across a cluster.

Replication acknowledge offset diff means the number of bytes that aren't replicated between replicas and their primary clusters.
redis.googleapis.com/cluster/replication/maximum_offset_diff This metric shows the maximum replication offset diff (in bytes) across a cluster.

Replication offset diff means the number of bytes that aren't replicated between replicas and their primary clusters.
redis.googleapis.com/cluster/stats/total_net_input_bytes_count This metric shows the count of incoming network bytes that a cluster's endpoints receives.
redis.googleapis.com/cluster/stats/total_net_output_bytes_count This metric shows the count of outgoing network bytes that a cluster's endpoints sends.

Node-level metrics

These metrics offer detailed insights into the health and performance of individual nodes within a cluster. You can use the metrics to troubleshoot issues with the nodes to optimize their performance.

Metric name Description
redis.googleapis.com/cluster/node/clients/connected_clients This metric shows the number of clients that are connected to a cluster node.
redis.googleapis.com/cluster/node/clients/blocked_clients This metric shows the number of client connections that a cluster node blocks.
redis.googleapis.com/cluster/node/server/uptime This metric measures the uptime of a cluster node.
redis.googleapis.com/cluster/node/stats/connections_received_count This metric tracks the total number of client connections that are created on a cluster node within a specified period.
redis.googleapis.com/cluster/node/stats/rejected_connections_count This metric shows the number of connections that are rejected because a cluster node reaches the maxclients limit.
redis.googleapis.com/cluster/node/commandstats/usec_count This metric shows the total time that each command consumes in a cluster node.
redis.googleapis.com/cluster/node/commandstats/calls_count This metric tracks the total number of calls for a specific Redis command on a cluster node per minute.
redis.googleapis.com/cluster/node/cpu/utilization This metric shows the CPU utilization for a cluster node (from 0.0 to 1.0).
redis.googleapis.com/cluster/node/stats/expired_keys_count This metric shows the total number of expiration events in a cluster node.
redis.googleapis.com/cluster/node/stats/evicted_keys_count This metric counts the total number of keys that a cluster node evicts because the cluster reaches its maximum memory limit. The metric can identify if a cluster is under memory pressure. High or rising counts of evicted keys indicate that a cluster is running out of space. As a result, the cluster removes keys to make room for new data.
redis.googleapis.com/cluster/node/keyspace/total_keys This metric measures the total number of keys that a cluster node stores. The metric provides visibility into data distribution and sharding across nodes.
redis.googleapis.com/cluster/node/stats/keyspace_hits_count This metric tracks the number of successful key lookups on a cluster node.
redis.googleapis.com/cluster/node/stats/keyspace_misses_count This metric tracks the number of failed key lookups on a cluster node.
redis.googleapis.com/cluster/node/memory/utilization This metric tracks the memory utilization in a cluster node (from 0.0 to 1.0). You can use the metric to prevent node failures and to ensure a cluster's stability.
redis.googleapis.com/cluster/node/memory/usage This metric measures the total memory usage of a cluster node.
redis.googleapis.com/cluster/node/stats/net_input_bytes_count This metric measures the total number of incoming network bytes that a cluster node receives.
redis.googleapis.com/cluster/node/stats/net_output_bytes_count This metric measures the total number of outgoing network bytes that a cluster node sends.
redis.googleapis.com/cluster/node/replication/offset This metric measures the replication offset bytes of a cluster node. Before you promote the replicas of a cluster to primary clusters, you can use the metric to check whether the replicas processed all data. This prevents data loss.
redis.googleapis.com/cluster/node/server/healthy This metric determines whether a cluster node is available and functioning correctly.

Cross-region replication metrics

This section lists and describes cross-region replication metrics.

Metric name Description
redis.googleapis.com/cluster/cross_cluster_replication/secondary_replication_links This metric shows the number of shard links between the primary and secondary clusters. Within a cross-region replication group, a primary cluster reports the number of CRR replication links that it has with the secondary clusters in the group. For each secondary cluster, this number is expected to be equal to the number of shards. If, unexpectedly, the number drops below the number of shards, this identifies the number of shards where replication between the replicator and follower has ceased. In an ideal state, this metric should have the same number as the primary cluster shard count.
redis.googleapis.com/cluster/cross_cluster_replication/secondary_maximum_replication_offset_diff This metric measures the maximum replication offset difference (in bytes) between the primary and secondary (replica) shards of a cluster across different regions.
redis.googleapis.com/cluster/cross_cluster_replication/secondary_average_replication_offset_diff This metric measures the average replication offset difference (in bytes) between the primary and replica shards of a cluster across different regions. High values for the metric indicate a replication lag, which you can resolve by pausing and then resuming the replication.

Persistence metrics

This sections lists persistence metrics and provides sample use cases for persistence metrics.

RDB persistence metrics

Cluster-level metrics

Metric name Description
redis.googleapis.com/cluster/persistence/rdb_saves_count This metric shows the cumulative number of times your cluster has taken an RDB snapshot (also known as save ). This metric has a status_code field. To check if a snapshot has failed, you can filter the status_code field for the following error: 3 - INTERNAL_ERROR
redis.googleapis.com/cluster/persistence/rdb_save_ages This metric shows a distribution snapshot age for all nodes across the cluster. Ideally you want to see the distribution have values that have less lag time (or the same time) than your snapshot frequency.

Node-level metrics

Metric name Description
redis.googleapis.com/cluster/node/persistence/rdb_bgsave_in_progress This metric shows if a RDB BGSAVE is currently in progress on the cluster node. TRUE means in progress.
redis.googleapis.com/cluster/node/persistence/rdb_last_bgsave_status This metric shows the success of the last BGSAVE on the cluster node. TRUE means success, if no bgrewrite has occurred the value might default to TRUE.
redis.googleapis.com/cluster/node/persistence/rdb_saves_count This metric shows the cumulative number of RDB saves executed on the cluster node.
redis.googleapis.com/cluster/node/persistence/rdb_last_save_age This metric shows the time in seconds, since the last successful snapshot.
redis.googleapis.com/cluster/node/persistence/rdb_next_save_time_until This metric shows the time in seconds, remaining until the next snapshot.
redis.googleapis.com/cluster/node/persistence/current_save_keys_total This metric shows the number of keys in the current RDB save executing on the cluster node.

AOF persistence metrics

Cluster-level metrics

Metric name Description
redis.googleapis.com/cluster/persistence/aof_fsync_lags This metric shows a distribution of the lag (from data write to durable storage sync) for all nodes in the cluster. It is only emitted for clusters with appendfsync=everysec. Ideally you want to see the distribution have values that have less lag time (or the same time) than your AOF sync frequency.
redis.googleapis.com/cluster/persistence/aof_rewrite_count This metric shows the cumulative number of times for your cluster that a node has triggered an AOF rewrite. This metric has a status_code field. To check if AOF rewrites are failing, you can filter the status_code field for the following error: 3 - INTERNAL_ERROR

Node-level metrics

Metric name Description
redis.googleapis.com/cluster/node/persistence/aof_last_write_status This metric shows the success of the most recent AOF write on the cluster node. TRUE means success, if no write has occurred the value might default to TRUE.
redis.googleapis.com/cluster/node/persistence/aof_last_bgrewrite_status This metric shows the success of the last AOF bgrewrite operation on the cluster node. TRUE means success, if no bgrewrite has occurred the value might default to TRUE.
redis.googleapis.com/cluster/node/persistence/aof_fsync_lag This metric shows the AOF lag between memory and persistent store in the cluster node. It is only applicable for AOF enabled clusters where appendfsync=EVERYSEC
redis.googleapis.com/cluster/node/persistence/aof_rewrites_count This metric shows the count of AOF rewrites in the cluster node. To check if AOF rewrites are failing, you can filter the status_code field for the following error: 3 - INTERNAL_ERROR
redis.googleapis.com/cluster/node/persistence/aof_fsync_errors_count This metric shows the count of AOF fsync() call errors and is only applicable for AOF enabled clusters where appendfsync=EVERYSEC|ALWAYS.

Common persistence metrics

Metrics that are applicable to both AOF and RDB persistence mechanisms.

Node-level metrics

Metric name Description
redis.googleapis.com/cluster/node/persistence/auto_restore_count This metric shows the count of restores from the dumpfile (AOF or RDB).

Sample use cases for persistence metrics

Check if AOF write operations cause latency and memory pressure

Suppose that you detect increased latency or memory usage on your cluster or the node within the cluster. In this case you might want to check if the extra usage is related to AOF persistence.

Because you know AOF rewrite operations can trigger transient load spikes, you can inspect the aof_rewrites_count metric which gives you the cumulative count of AOF rewrites over the lifetime of the cluster or the node within the cluster. Suppose this metric shows you that increments in the rewrites count correspond to latency increases. In this circumstance you could address the issue by reducing the write rate or increasing the shard count to reduce the frequency of rewrites.

Check if RDB save operations cause latency and memory pressure

Suppose that you detect increased latency or memory usage on your cluster or the node within the cluster. In this case you might want to check if the extra usage is related to RDB persistence.

Because you know RDB save operations can trigger transient load spikes, you can inspect the rdb_saves_count metric which gives the cumulative count of RDB saves over the lifetime of the cluster or the node within the cluster. Suppose this metric shows you that increments in the RDB saves count correspond to latency increases. In this circumstance you could reduce the RDB snapshot interval to lower the frequency of rewrites. You could also scale out the cluster to reduce the baseline load levels.

Interpret metrics for Memorystore for Redis Cluster

As seen in the list above, many of the metrics share three categories: average, maximum, and total.

For Memorystore for Redis Cluster, we provide average and maximum variations of the same metric so you can use them both to identify hotspotting for that metric family.

The total value for the metric is independent, and provides separate insight unrelated to the hotspotting purpose of average and maximum .

Understand average and maximum metrics

Suppose you compare the average_keyspace_hits and maximum_keyspace_hits values for your cluster. As the difference between the two metrics grows, a greater difference indicates more hot spotting of hits in your instance. Ideally you would have a close value between average_keyspace_hits and maximum_keyspace_hits , because this means that hits are more evenly distributed across your instance.

This principle applies to all metrics that have the average and maximum variations of the same metric.

Hot spot example

If you compare average_keyspace_hits and maximum_keyspace_hits for all of the shards in your cluster, comparing these values indicates where hot spotting occurs. For example, suppose shards in a 6-shard cluster have the following number of hits:

  • Shard 1 – 2 hits
  • Shard 2 – 2 hits
  • Shard 3 – 2 hits
  • Shard 4 – 2 hits
  • Shard 5 – 2 hits
  • Shard 6 – 8 hits

In this example the average_keyspace_hits returns a value of 3, and the maximum_keyspace_hits returns 8, indicating that shard 6 is hot.

We provide node-level metrics that you can use to identify hotspots in the cluster.

Design a Mobile Site
View Site in Mobile | Classic
Share by: