This page describes how to troubleshoot Spanner components to find the source of the latency. To learn more about possible latency points in a Spanner request, see Latency points in a Spanner request .
-
In your client application that affects your service, confirm there's a latency increase from client round-trip latency. Check the following dimensions from your client-side metrics.
- Client Application Name
- Client locality (for example, Compute Engine VM zones) and Host (that is, VM names)
- Spanner API method
- Spanner API status
Group by these dimensions to see if the issue is limited to a specific client, status, or method. For dual-region or multi-regional workloads, see if the issue is limited to a specific client or Spanner region.
-
Check your client application health, especially the computing infrastructure on the client side (for example, VM, CPU, or memory utilization, connections, file descriptors, and so on).
-
Check latency in Spanner components:
a. Check client round-trip latency with OpenTelemetry or with OpenCensus .
b. Check Google Front End (GFE) latency with OpenTelemetry or with OpenCensus .
c. Check Spanner API request latency with OpenTelemetry or with OpenCensus .
If you have high client round-trip latency, but low GFE latency, and a low Spanner API request latency, the application code might have an issue. It could also indicate a networking issue between the client and regional GFE. If your application has a performance issue that causes some code paths to be slow, then the client round-trip latency for each API request might increase. There might also be an issue in the client computing infrastructure that was not detected in the previous step.
-
Check the following dimensions for Spanner metrics :
- Spanner Database Name
- Spanner API method
- Spanner API status
Group by these dimensions to see if the issue is limited to a specific database, status, or method. For dual-region or multi-regional workloads, check to see if the issue is limited to a specific region.
If you have a high GFE latency, but a low Spanner API request latency, it might have one of the following causes:
-
Accessing a database from another region. This action can lead to high GFE latency and low Spanner API request latency. For example, traffic from a client in the
us-east1region that has an instance in theus-central1region might have a high GFE latency but a lower Spanner API request latency. -
There's an issue at the GFE layer. Check the Google Cloud Status Dashboard to see if there are any ongoing networking issues in your region. If there aren't any issues, then open a support case and include this information so that support engineers can help with troubleshooting the GFE.
-
Check the CPU utilization of the instance . If the CPU utilization of the instance is above the recommended level, you should manually add more nodes, or set up auto scaling. For more information, see Autoscaling overview .
-
Observe and troubleshoot potential hotspots or unbalanced access patterns using Key Visualizer and try to roll back any application code changes that strongly correlate with the issue timeframe.
-
Check any traffic pattern changes.
-
Check Query insights and Transaction insights to see if there might be any query or transaction performance bottlenecks.
-
Use procedures in Oldest active queries to see any expense queries that might cause a performance bottleneck and cancel the queries as needed.
-
Use procedures in the troubleshooting sections in the following topics to troubleshoot the issue further using Spanner introspection tools:
What's next
- Now that you've identified the component that contains the latency, explore the problem further using OpenCensus. For more information, see Capture custom client-side metrics using OpenTelemetry or with OpenCensus .
- Learn how to use metrics to diagnose latency.
- Learn how to troubleshoot Spanner deadline exceeded errors .

