Google Workspace Status Dashboard
- Available
- Service information
- Service disruption
- Service outage
Incident affecting Classroom
Incident began at 2023-12-07 18:00and ended at 2023-12-07 20:32 (times are in Coordinated Universal Time (UTC)).
Incident Report
Summary
On 07 December 2023, various Google Workspace and Google Cloud customers experienced intermittent authentication issues when routed to one of our data centers in the US East region. As a result, a limited number of users could not sign-up, login, or re-authenticate for a duration of 1 hour and 32 minutes.
To our Google Workspace and Google Cloud customers who were impacted during this issue, we sincerely apologize – this is not the level of quality and reliability we strive to offer you.
Root Cause
Google’s authentication stack is broad and deep, running across every region and zone to provide functionality globally across all Google products. This particular issue was scoped to a single microservice's serving jobs within a single data center; this microservice handles serving some of the user-facing pages involved in sign-up, login, and re-authentication flows.
To improve resource usage efficiency, engineers had been implementing more advanced autoscaling algorithms across these serving jobs globally. These changes were rolled out gradually over a period of several weeks without any issues.
On 7 December, the rollout kicked off rolling restarts across the individual tasks within the serving jobs in a single data center in the US East region. Due to internal network routing at the time, this data center was serving traffic from the US East area as well as South America. These factors placed additional demand on each task's thread pool, both overloading some tasks and affecting those tasks' ability to report metrics back to load-balancers. This caused requests to be intermittently dropped between the load-balancer and the individual serving tasks; these dropped requests led to user-facing 500 error pages.
Remediation and Prevention
Google engineers were alerted by our internal monitoring system on 07 December at 11:16 US/Pacific and immediately started investigating the issue. Once the nature and scope of the issue became clear, Google engineers worked to confirm that traffic could be safely re-routed away from the impacted serving jobs as a mitigation strategy. Traffic was rerouted to other data centers, and the customer impact was mitigated by 12:32 US/Pacific. Our engineers then reviewed and tuned the stability and performance of both this and related serving jobs globally before declaring the incident fully resolved.
Google is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for the impact to your organization. We thank you for your business. We are committed to preventing a repeat of this issue in the future and are completing the following actions:
- Enhance metrics and dashboards to make these complex load-balancing dynamics more legible to engineers, reducing the time to mitigate similar future issues.
- Improve performance-testing of resource-allocation profiles for each individual serving job
Detailed Description of Impact
Google Workspace:
- During the issue, a limited number Google Workspace users served from this data center had issues with log-in and sign-up for part of unauthenticated traffic, including some users going through "Sign in with Google '' interactive flows using their Google Workspace account
- Multiple Google Workspace product log-in attempts, reauthentication attempts, and some new account creation requests from users routed to this data center failed with error 502s.
- Users that were already logged in were not affected by the issue.
Google OAuth:
- Some users served from this data center intermittently experienced error 502s when attempting to "Sign in with Google" and encountering interactive sign-in or re-authentication flows.
Google Cloud Console:
- A limited number of users served from this data center experienced error 502s intermittently when attempting to log-in to the Cloud Console.
Mini Incident Report
We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues.
If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support or to Google Workspace Support using help article https://support.google.com/a/answer/1047213 .
(All Times US/Pacific)
Incident Start:7 December 2023 11:00
Incident End:7 December 2023 12:32
Duration:1 hours, 32 minutes
Affected Services and Features:
Sign-up and log-in for parts of unauthenticated traffic on Google Workspace and Google Cloud products.
Regions/Zones:us-east (partial impact)
Description:
Starting on 7 December 2023 at 11:00, users trying to sign up or login to Google Workspace products or Google Cloud Console services in the us-east region experienced intermittent issues for a duration of 1 hour and 32 minutes.
From our preliminary investigation, this was likely caused by performance issues on the sign up and login frontend servers in one of the datacenters in the affected region. To mitigate the impact, our engineering team rerouted traffic and optimized server distribution. Service was restored on 7 December 2023 at 12:32.
Google will complete a detailed Incident Report in the following days that will provide a full root cause.
Customer Impact:
Some unauthenticated users in parts of the us-east region intermittently experienced difficulties accessing the impacted products. These users saw 502 errors when trying to sign up, login, or re-authenticate services.
During the issue, some users were unable to access Classroom services.
- Times are listed in Coordinated Universal Time (UTC)