This page lists metrics available for Memorystore for Valkey and describes what each metric measures.
Backup metrics
This section lists and describes backup and import metrics.
Instance-level metrics
This section lists and describes instance-level backup and import metrics.
| Metric name | Description |
|---|---|
memorystore.googleapis.com/instance/backup/last_backup_start_time
|
This metric shows the start time of the last backup operation. |
memorystore.googleapis.com/instance/backup/last_backup_status
|
This metric shows whether the most recent backup attempt completed
successfully or failed. The statuses are 1
for Success
and 0
for Failed
. |
memorystore.googleapis.com/instance/backup/last_backup_duration
|
This metric shows the duration of the last backup operation (in milliseconds). |
memorystore.googleapis.com/instance/backup/last_backup_size
|
This metric shows the size of the last backup (in bytes). This metric is a key indicator for monitoring backup efficiency and storage capacity planning. |
memorystore.googleapis.com/instance/import/last_import_start_time
|
This metric shows the start time of the last import operation. |
memorystore.googleapis.com/instance/import/last_import_duration
|
This metric shows the duration of the last import operation (in milliseconds). |
Certificate Authority (CA) metrics
This section lists metrics that are associated with customer-managed Certificate Authorities (CA) .
Instance-level metrics
These metrics provide a high-level overview of the certificates that are associated with machines in an instance.
memorystore.googleapis.com/instance/security/rotate_tls_cert_count
This metric shows the status of rotating certificates that are associated with machines in an instance.
The metric can have the following statuses:
-
SUCCESS: Memorystore for Valkey rotated the certificate. -
FAILED: Memorystore for Valkey didn't rotate the certificate because the certificate isn't available, Memorystore for Valkey doesn't have permissions to rotate the certificate, or there's an internal error. -
SKIPPED: Memorystore for Valkey skipped rotating the certificate because it doesn't have to be rotated.
Cloud Monitoring metrics
This section lists and describes Cloud Monitoring metrics that are available for Memorystore for Valkey.
Instance-level metrics
These metrics provide a high-level overview of the overall health and performance of an instance. You can use the metrics to understand the overall capacity and utilization of an instance as well as to identify potential bottlenecks or areas for improvement.
| Metric name | Description |
|---|---|
memorystore.googleapis.com/instance/clients/average_connected_clients
|
This metric measures the average number of active client connections to an instance over a specified time. You can use the metric to monitor connection scaling, identify application bottlenecks, and ensure that the instance is stable |
memorystore.googleapis.com/instance/clients/maximum_connected_clients
|
This metric shows the maximum number of active client connections across all nodes of an instance. You can use the metric to monitor the highest connection load on the instance at any time. This is critical to ensure a high performance for the instance because high connection counts can increase response times. |
memorystore.googleapis.com/instance/clients/maximum_connection_duration
|
This metric measures the maximum duration of a client connection for a single node in an instance. You can use this metric to manage resource exhaustion, ensure load balancing, and enforce security policies. |
memorystore.googleapis.com/instance/clients/total_connected_clients
|
This metric tracks the current number of active client connections to an instance. You can use the metric to monitor the load of your database and prevent connection limits. |
memorystore.googleapis.com/instance/stats/total_connections_received_count
|
This metric shows the cumulative number of client connections that are created in an instance in the last minute. You can use the metric to analyze traffic load, ensure that connection limits aren't exceeded, and determine whether you need to scale the instance. |
memorystore.googleapis.com/instance/stats/total_rejected_connections_count
|
This metric tracks the total number of connections to an instance that
are rejected because the maxclients
limit is reached. |
memorystore.googleapis.com/instance/commandstats/total_usec_count
|
This metric measures the total CPU time that each command consumes. The metric indicates the total microseconds used, which provides insights into an instance's performance and latency. |
memorystore.googleapis.com/instance/commandstats/total_calls_count
|
This metric measures the total number of calls that are associated with a specific command on an instance node in one minute. To identify bottlenecks or high traffic on specific commands, you can use the metric to monitor command throughput (commands per minute) across primary and replica nodes. |
memorystore.googleapis.com/instance/cpu/average_utilization
|
This metric shows the mean CPU utilization for an instance (from 0.0 to 1.0). You can use the metric to identify overprovisioned or underutilized resources, manage auto scaling thresholds, and detect performance bottlenecks, with an ideal utilization of 40%-70%. |
memorystore.googleapis.com/instance/cpu/maximum_utilization
|
This metric shows the peak CPU usage across all nodes in an instance (from 0.0 to 1.0). The metric summarizes only the Make sure that CPU utilization doesn't exceed 0.8 seconds for the primary node and 0.5 seconds for each replica that's designated as a read replica. For more information, see CPU usage best practices . |
memorystore.googleapis.com/instance/stats/average_expired_keys
|
This metric measures the mean number of key expiration events for all primary nodes of an instance. You can use the metric to monitor the number of keys that are expiring. |
memorystore.googleapis.com/instance/stats/maximum_expired_keys
|
This metric measures the maximum number of key expiration events that are occurring across all primary nodes of an instance. |
memorystore.googleapis.com/instance/stats/total_expired_keys_count
|
This metric tracks the total number of key expiration events that are occurring across all primary nodes of an instance. You can use the metric to monitor the number of keys that are expiring. |
memorystore.googleapis.com/instance/stats/average_evicted_keys
|
This metric tracks the mean number of keys that are evicted because of memory capacity constraints across the primary shards of an instance. |
memorystore.googleapis.com/instance/stats/maximum_evicted_keys
|
This metric shows the highest number of keys that are evicted from a node or shard of a primary instance because of memory capacity. |
memorystore.googleapis.com/instance/stats/total_evicted_keys_count
|
This metric shows the total number of keys that are evicted by a node of of a primary instance because of memory capacity. |
memorystore.googleapis.com/instance/keyspace/total_keys
|
This metric shows the number of keys that are stored in an instance. |
memorystore.googleapis.com/instance/stats/average_keyspace_hits
|
This metric shows the mean number of successful lookups of keys across all nodes in an instance. |
memorystore.googleapis.com/instance/stats/maximum_keyspace_hits
|
This metric shows the maximum number of successful lookups of keys in an instance node. You can use the metric to monitor the instance's performance and to identify potential hotspots across the instance. |
memorystore.googleapis.com/instance/stats/total_keyspace_hits_count
|
This metric tracks the cumulative number of successful lookups of keys across all nodes in an instance. |
memorystore.googleapis.com/instance/stats/average_keyspace_misses
|
This metric shows the mean number of failed lookups of keys across an instance. You can use the metric to track how often keys are requested but aren't found in the cache. |
memorystore.googleapis.com/instance/stats/maximum_keyspace_misses
|
This metric shows the maximum number of failed lookups of keys across an instance node. |
memorystore.googleapis.com/instance/stats/total_keyspace_misses_count
|
This metric shows the total number of failed lookups of keys across all instance nodes. |
memorystore.googleapis.com/instance/memory/average_utilization
|
This metric shows the mean memory utilization across an instance (from 0.0 to 1.0). You can use the metric to monitoring the instance's capacity and to set alert thresholds. For example, you can set an alert threshold to notify users when the average memory exceeds a specific percentage (for example, 80%). |
memorystore.googleapis.com/instance/memory/maximum_utilization
|
This metric shows the maximum memory utilization across all instance nodes (from 0.0 to 1.0). You can use the metric to identify when to scale an instance. We recommend that you monitor usage to ensure that it stays under 100%. Under high write loads, performance might degrade if this metric reaches 65% to 85%. |
memorystore.googleapis.com/instance/memory/total_used_memory
|
This metric shows the total memory usage of an instance (in bytes). You can use the metric to monitor the instance's capacity. |
memorystore.googleapis.com/instance/memory/size
|
This metric measures the total, used, and available RAM across all nodes in an instance. You can use the metric to monitor the instance's capacity and to prevent node failures. |
memorystore.googleapis.com/instance/replication/average_ack_lag
|
This metric shows the mean acknowledgement lag (in seconds) of replicas
across an instance. Acknowledgment lag is a bottleneck on the primary node in an instance. This bottleneck is caused by its replicas that can't keep up with the information that the primary node sends to them. When this happens, the primary node must wait for the acknowledgment that the replicas received the information. This might slow down transaction commits and cause a performance hit on the primary node. |
memorystore.googleapis.com/instance/replication/maximum_ack_lag
|
This metric shows the maximum acknowledgement lag (in seconds) of replicas across an instance. |
memorystore.googleapis.com/instance/replication/average_offset_diff
|
This metric shows the mean replication acknowledge offset diff (in
bytes) across an instance. Replication acknowledge offset diff means the number of bytes that aren't replicated between replicas and their primary instances. |
memorystore.googleapis.com/instance/replication/maximum_offset_diff
|
This metric shows the maximum replication offset diff (in bytes) across
an instance. Replication offset diff means the number of bytes that aren't replicated between replicas and their primary instances. |
memorystore.googleapis.com/instance/stats/total_net_input_bytes_count
|
This metric shows the count of incoming network bytes that an instance's endpoints receives. |
memorystore.googleapis.com/instance/stats/total_net_output_bytes_count
|
This metric shows the count of outgoing network bytes that an instance's endpoints sends. |
Node-level metrics
These metrics offer detailed insights into the health and performance of individual nodes within an instance. You can use the metrics to troubleshoot issues with the nodes to optimize their performance.
| Metric name | Description |
|---|---|
memorystore.googleapis.com/instance/node/clients/connected_clients
|
This metric indicates the number of active client connections to an instance node, excluding replica connections. You can use the metric to monitor connection limits and to identify hotspots where a shard receives disproportionate traffic. |
memorystore.googleapis.com/instance/node/clients/blocked_clients
|
This metric shows the number of client connections that an instance node blocks. A high or rapidly increasing number of blocked client connections might indicate that many clients are waiting on operations. This can lead to an increased latency. |
memorystore.googleapis.com/instance/node/server/uptime
|
This metric measures the uptime of an instance node. You can use the metric to track how long a server runs continuously without a reboot or failure. |
memorystore.googleapis.com/instance/node/stats/connections_received_count
|
This metric tracks the total number of client connections that are created on an instance node within a specified period. You can use the metric to monitor connection traffic to individual nodes within an instance. As a result, you can analyze load distribution and identify spikes in connection activity. |
memorystore.googleapis.com/instance/node/stats/rejected_connections_count
|
This metric shows the number of connections that are rejected because
an instance node reaches the maxclients
limit. You can use the
metric to identify if a node is under high-connection pressure and is
refusing new connections because it can't handle more connections. |
memorystore.googleapis.com/instance/node/commandstats/usec_count
|
This metric shows the total time that each command consumes in an instance node. You can use the metric to analyze the performance of commands, identify slow commands, and troubleshoot latency issues at the node level. |
memorystore.googleapis.com/instance/node/commandstats/calls_count
|
This metric tracks the total number of calls for a command on an instance node per minute. You can use the metric to monitor traffic distribution, identify heavily used commands, and troubleshoot bottlenecks on individual nodes. |
memorystore.googleapis.com/instance/node/cpu/utilization
|
This metric shows the CPU utilization for an instance node (from 0.0 to 1.0). |
memorystore.googleapis.com/instance/node/stats/expired_keys_count
|
This metric shows the total number of expiration events in an instance node. You can use the metric to monitor the rate at which keys are being removed from the instance because their time to live (TTL) reaches zero. |
memorystore.googleapis.com/instance/node/stats/evicted_keys_count
|
This metric counts the total number of keys that an instance node evicts because the instance reaches its maximum memory limit. The metric can identify if an instance is under memory pressure. High or rising counts of evicted keys indicate that an instance is running out of space. As a result, the instance removes keys to make room for new data. |
memorystore.googleapis.com/instance/node/keyspace/total_keys
|
This metric measures the total number of keys that an instance node stores. The metric provides visibility into data distribution and sharding across nodes. |
memorystore.googleapis.com/instance/node/stats/keyspace_hits_count
|
This metric tracks the number of successful key lookups on an instance node. You can use the metric to monitor the efficiency that the node has to retrieve in-memory data. |
memorystore.googleapis.com/instance/node/stats/keyspace_misses_count
|
This metric tracks the number of failed key lookups on an instance node. |
memorystore.googleapis.com/instance/node/memory/utilization
|
This metric tracks the memory utilization in an instance node (from 0.0 to 1.0). You can use the metric to prevent node failures and to ensure an instance's stability. |
memorystore.googleapis.com/instance/node/memory/usage
|
This metric measures the total memory usage of an instance node. |
memorystore.googleapis.com/instance/node/stats/net_input_bytes_count
|
This metric measures the total number of incoming network bytes that an instance node receives. You can use the metric to monitor the network throughput, identify potential bottlenecks, and analyze traffic spikes on the node. |
memorystore.googleapis.com/instance/node/stats/net_output_bytes_count
|
This metric measures the total number of outgoing network bytes that an instance node sends. You can use the metric to monitor the network egress volume for the node for performance tuning and capacity planning purposes. |
memorystore.googleapis.com/instance/node/replication/offset
|
This metric measures the replication offset bytes of an instance node. Before you promote the replicas of an instance to primary instances, you can use the metric to check whether the replicas processed all data. This prevents data loss. |
memorystore.googleapis.com/instance/node/server/healthy
|
This metric determines whether an instance node is available and functioning correctly. |
Cross-region replication metrics
This section lists and describes cross-region replication metrics.
| Metric name | Description |
|---|---|
memorystore.googleapis.com/instance/cross_instance_replication/secondary_replication_links
|
This metric shows the number of shard links between the primary and secondary instances. Within a cross-region replication group, a primary instance reports the number of cross-region replication links that it has with the secondary instances in the group. For each secondary instance, this number is expected to be equal to the number of shards. If the number drops below the number of shards, then this metric identifies the number of shards when replication stopped between the replicator and the follower. In an ideal state, this metric has the same number as the shard count for the primary instance. |
memorystore.googleapis.com/instance/cross_instance_replication/secondary_maximum_replication_offset_diff
|
This metric shows the maximum replication offset difference between the primary and secondary shards. |
memorystore.googleapis.com/instance/cross_instance_replication/secondary_average_replication_offset_diff
|
This metric shows the average replication offset difference between the primary and secondary shards. |
Persistence metrics
This section lists and describes persistence metrics.
RDB persistence metrics
This section lists and describes Redis Database (RDB) persistence metrics.
Instance-level metrics
This section lists and describes instance-level RDB persistence metrics.
| Metric name | Description |
|---|---|
memorystore.googleapis.com/instance/persistence/rdb_saves_count
|
This metric tracks the cumulative number of times that an RDB persistence snapshot (also known as an RDB save ) is taken on an instance node. You can use the metric to monitor the frequency and success of RDB snapshots on a per-node basis. The metric has a |
memorystore.googleapis.com/instance/persistence/rdb_last_success_ages
|
This metric shows a distribution snapshot age for all nodes across an instance. In the case of a recovery incident, you can use the metric to view the timeframe for data staleness. Ideally, the distribution has values that have less lag time (or the same lag time) than your snapshot frequency. |
Node-level metrics
| Metric name | Description |
|---|---|
memorystore.googleapis.com/instance/node/persistence/rdb_bgsave_in_progress
|
An RDB BGSAVE
is in progress on the instance node. TRUE
means that the save is in progress. |
memorystore.googleapis.com/instance/node/persistence/rdb_last_bgsave_status
|
The success of the last BGSAVE
on the instance node. TRUE
means that a successful BGSAVE
occurs. If no bgrewrite
occurs. then the value might default to TRUE
. |
memorystore.googleapis.com/instance/node/persistence/rdb_saves_count
|
The metric shows the cumulative number of RDB saves run on the instance node. |
memorystore.googleapis.com/instance/node/persistence/rdb_last_save_age
|
The time (in seconds) since the last successful snapshot. |
memorystore.googleapis.com/instance/node/persistence/rdb_next_save_time_until
|
The time remaining (in seconds) until the next snapshot. |
memorystore.googleapis.com/instance/node/persistence/current_save_keys_total
|
The number of keys in the RDB save that runs on the instance node. |
AOF persistence metrics
Instance-level metrics
| Metric name | Description |
|---|---|
memorystore.googleapis.com/instance/persistence/aof_fsync_lags
|
This metric shows a distribution of the lag (from data write to durable storage sync) for all nodes in the instance. It is only emitted for instances with appendfsync=everysec. Ideally you want to see the distribution have values that have less lag time (or the same time) than your AOF sync frequency. |
memorystore.googleapis.com/instance/persistence/aof_rewrite_count
|
This metric shows the cumulative number of times for your instance that
a node has triggered an AOF rewrite. This metric has a status_code
field. To check if AOF rewrites are failing, you
can filter the status_code
field for the following error:
3 - INTERNAL_ERROR |
Node-level metrics
| Metric name | Description |
|---|---|
memorystore.googleapis.com/instance/node/persistence/aof_last_write_status
|
This metrics shows the success of the most recent AOF write on the instance node. TRUE means success, if no write has occurred the value might default to TRUE. |
memorystore.googleapis.com/instance/node/persistence/aof_last_bgrewrite_status
|
This metric shows the success of the last AOF bgrewrite operation on the instance node. TRUE means success, if no bgrewrite has occurred the value might default to TRUE. |
memorystore.googleapis.com/instance/node/persistence/aof_fsync_lag
|
This metric shows the AOF lag between memory and persistent store in the instance node. It is only applicable for AOF enabled instances where appendfsync=EVERYSEC |
memorystore.googleapis.com/instance/node/persistence/aof_rewrites_count
|
This metric shows the count of AOF rewrites in the instance node. To
check if AOF rewrites are failing, you can filter the status_code
field for the following error:
3 - INTERNAL_ERROR |
memorystore.googleapis.com/instance/node/persistence/aof_fsync_errors_count
|
This metric shows the count of AOF fsync() call errors and is only applicable for AOF enabled instances where appendfsync=EVERYSEC|ALWAYS. |
Common persistence metrics
These metrics are applicable to both AOF and RDB persistence mechanisms.
Node-level metrics
| Metric name | Description |
|---|---|
memorystore.googleapis.com/instance/node/persistence/auto_restore_count
|
This metric shows the count of restores from the dumpfile (AOF or RDB).
To check if restores are failing, you can filter the status_code
field for the following error:
2 - INTERNAL_ERROR |
Sample use cases for persistence metrics
Check if AOF write operations cause latency and memory pressure
Suppose that you detect increased latency or memory usage on your instance or the node within the instance. In this case you might want to check if the extra usage is related to AOF persistence.
Because you know AOF rewrite operations can trigger transient load spikes, you
can inspect the aof_rewrites_count
metric which gives you the cumulative count
of AOF rewrites over the lifetime of the instance or the node within the
instance. Suppose this metric shows you that increments in the rewrites count
correspond to latency increases. In this circumstance you could address the
issue by reducing the write rate or increasing the shard count to reduce the
frequency of rewrites.
Check if RDB save operations cause latency and memory pressure
Suppose that you detect increased latency or memory usage on your instance or the node within the instance. In this case you might want to check if the extra usage is related to RDB persistence.
Because you know RDB save operations can trigger transient load spikes, you can
inspect the rdb_saves_count
metric which gives the cumulative count of RDB
saves over the lifetime of the instance or the node within the instance. Suppose
this metric shows you that increments in the RDB saves count correspond to
latency increases. In this circumstance you could reduce the RDB snapshot
interval to lower the frequency of rewrites. You could also scale out the
instance to reduce the baseline load levels.
Interpret metrics for Memorystore for Valkey
As seen in the list above, many of the metrics share three categories: average, maximum, and total.
For Memorystore for Valkey, we provide average and maximum variations of the same metric so you can use both metrics to identify hotspots for that metric family.
The total value for the metric is independent, and provides separate insights unrelated to the purpose of the average and maximum variations for hotspots.
Understand average and maximum metrics
Suppose you compare the average_keyspace_hits
and maximum_keyspace_hits
values for your instance. As the difference between the two metrics grows, a
greater difference indicates that there are more hotspots for hits in your
instance. A close value between average_keyspace_hits
and maximum_keyspace_hits
indicates that hits are distributed across your
instance more evenly.
This principle applies to all metrics that have the average and maximum variations of the same metric.
Hotspot example
If you compare average_keyspace_hits
and maximum_keyspace_hits
for all of
the shards in your instance, comparing these values indicates where hotspots
occur. For example, suppose shards in a 6-shard instance have the following
number of hits:
- Shard 1 – 2 hits
- Shard 2 – 2 hits
- Shard 3 – 2 hits
- Shard 4 – 2 hits
- Shard 5 – 2 hits
- Shard 6 – 8 hits
In this example the average_keyspace_hits
returns a value of 3, and the maximum_keyspace_hits
returns 8, indicating that shard 6 is hot.
We provide node level metrics which could be useful to identify hotspots within the instance.

