Google Workspace Status Dashboard
- Available
- Service information
- Service disruption
- Service outage
Incident affecting Google Docs
Incident began at 2024-08-12 13:20and ended at 2024-08-12 15:32 (times are in Coordinated Universal Time (UTC)).
Incident Report
Summary
On 12 August 2024 at 06:20 US/Pacific, multiple Google Cloud and Google Workspace products experienced connectivity issues in europe-west2 for a duration of 40 minutes. During the time, ingress traffic to europe-west2 and egress traffic from europe-west2 experienced elevated latencies, connection timeouts, and connection failures.
Root Cause
On 12 August 2024 06:20 US/Pacific, primary and backup power feeds were both lost in a Google Point of Presence (POP) due to a substation switchgear failure. The affected POP hosts about ⅓ of serving first-layer Google Front Ends (GFEs) located in europe-west2 and some distributed networking equipment for that region. The power loss impacted the following Google products and services that depend on GFEs in that region:
- Google Cloud APIs, Google Workspace, and other Google services like YouTube,
- Customer-created global external application and proxy network load balancers, including Cloud CDN
The power loss also impacted the following Google Cloud products which depended on impacted networking equipment:
- Customer-created regional external application, proxy network, and passthrough network load balancers in the europe-west2 region,
- External protocol forwarding and VM external IP address connectivity for VMs in the europe-west2 region.
- Google Cloud Interconnect connections in some LHR colocation facilities.
Impact was limited to situations where either or both of the following was true:
- Inbound requests or connections were routed into the europe-west2 region of Google’s network, from the Internet, and those requests or connections depended on networking equipment that was offline, or unreachable pending reconvergence.
- Outbound responses were routed to the Internet, from the europe-west2 region of Google’s network, and those responses depended on networking equipment that was without power.
The power outage caused Internet routes advertised by Google to be withdrawn in networks connected to Google’s network. The withdrawn routes were automatically replaced by other Google-advertised routes that didn’t depend on impacted networking equipment. Withdrawing and replacing routes relies on the BGP protocol and its timers, so replacement route convergence is not instantaneous, and overloading in the automatically selected replacement route GFEs extended the duration of the incident.
Detailed Description of Impact
- Google Workspace: _Gmail, Google Calendar, Google Chat, Google Docs, Google Drive, Google Meet and Google Tasks users connecting to Workspace services from the UK region and surrounding areas experienced connectivity issues as described in the next point.
- GFE-based products and services: _Customers on the Internet experienced a spike of broken connections followed by elevated latencies or HTTP error responses when communicating with GFE-powered Google APIs and services or customer-created global external application and proxy network load balancers. At roughly 06:23 US/Pacific, Google automatically redirected connections to the nearest possible first-layer GFEs with some latency penalty. Unfortunately, some of the nearest possible first-layer GFEs were overloaded until 06:48 when Google engineers made adjustments to more efficiently distribute incoming requests among nearby first-layer GFEs. Depending on the Google API or service or the customer-created global external load balancer, elevated latencies could have persisted until about 08:30 US/Pacific. Elevated latencies also could have applied to customer-created global external load balancers that had Cloud CDN enabled.
- Regional Google Cloud products and services: _Until replacement routes were in effect, customers on the Internet experienced connection failures to the following GCP resources in the europe-west2 region:
- Regional external application, proxy network, and passthrough network load balancers.
- External protocol forwarding and VM external IP addresses.
- Google Cloud Interconnect: _Google Cloud Interconnect connections in some LHR colocation facilities (lhr-zone1-47, lhr-zone1-832, lhr-zone1-2262, lhr-zone1-4885, lhr-zone1-99051 and lhr-zone2-47) remained offline from 06:20 US/Pacific to at least 06:57 US/Pacific, when power was restored.
At 06:43 US/Pacific, power was restored to the impacted networking equipment. Google networking equipment was fully operational by 06:57 US/Pacific, and connectivity to GFE-based products and services, regional Google Cloud products and services, and Google Cloud Interconnect resumed shortly thereafter.
Remediation and Prevention
Multiple Google engineering teams were alerted and automated recovery tooling was triggered as expected; however, manual adjustments were required to address subsequent first-layer GFE overload. Google is reviewing automation improvements in tasks that required manual intervention to reduce the duration of future power event impact. Similarly, Google is working to increase Cloud Interconnect control plane resilience and reduce mitigation time through automated reaction to isolation events.
Additionally Google's partner who maintains the affected facility power in LHR (London) is conducting a full root cause analysis with the switchboard manufacturer and substation owner(s) involved in supplying power, including follow up as to why stored or generated on-site emergency power did not carry loads.
A mini incident report has been posted to https://status.cloud.google.com/incidents/ETJGhvY9Xaktw7tgi8dF
The issue with Gmail, Google Calendar, Google Chat, Google Docs, Google Drive, Google Tasks has been resolved for all affected users as of Monday, 2024-08-12 08:35 US/Pacific.
During the issue, users connecting to Workspace services from the UK region may have experienced connectivity issues.
We will publish an analysis of this incident once we have completed our internal investigation.
We thank you for your patience while we worked on resolving the issue.
SUMMARY
Multiple Workspace services experienced brief connectivity issues for users connecting from the UK region
DESCRIPTION
We have experienced an issue with Gmail, Google Drive, Google Calendar, Google Docs, Google Chat, Google Tasks beginning at Monday, 2024-08-12 06:28 US/Pacific. The issue is now mitigated and our engineering teams are closely monitoring for any residual impact We will provide an update by Monday, 2024-08-12 09:15 US/Pacific with current details. We apologize to all who are affected by the disruption.
DIAGNOSIS
Multiple Workspace services experienced brief connectivity issues for users connecting from the UK region.
WORKAROUND
None at this time.
- Times are listed in Coordinated Universal Time (UTC)