May 2024 UniSuper incident: https://cloud.google.com/blog/products/infrastructure/detail...
https://www.unisuper.com.au/about-us/media-centre/2024/a-joi...
A joint statement from UniSuper CEO Peter Chun and Google Cloud CEO Thomas Kurian
8 May 2024
UniSuper and Google Cloud understand the disruption to services experienced by members has been extremely frustrating and disappointing. We extend our sincere apologies to all members.
While supporting UniSuper to bring its systems back online, Google Cloud has been conducting a root cause analysis.
Thomas Kurian has confirmed that the disruption arose from an unprecedented sequence of events, where an inadvertent misconfiguration during provisioning of UniSuperās Private Cloud services ultimately resulted in the deletion of UniSuperās Private Cloud subscription.
This is described as an isolated, āone-of-a-kind occurrenceā that has never before occurred with any Google Cloud client globally. This should not have happened. Google Cloud has identified the sequence of events and taken measures to ensure it does not happen again.
Why did the outage last so long?
UniSuper had duplication across two geographies as protection against outages and data loss. However, the deletion of the Private Cloud subscription triggered deletion across both geographies.
Restoring the Private Cloud required significant coordination and effort between UniSuper and Google Cloud, including recovery of hundreds of virtual machines, databases, and applications.
dantiberian
today at 4:20 AM
I wrote about the UniSuper issue at the time: https://danielcompton.net/google-cloud-unisuper. It was a pretty nasty bug where their VMWare environment was created with a one-year expiry date, but was one "resource" from the perspective of Google Cloud.
The instant cascading worldwide deletion upon closing or deleting a subscription sounds like a recipe for disaster. Why not mark it for deletion and delete say... a day or a week later?
From personal experience, as a customer who once did something stupid: Google Cloud does soft deletes.
But you need to reach out to support fast enough. And really, if you deleted something important and discovered it only the next day, and not within minutes, you're having a bigger issue that a soft delete won't solve.
manapause
today at 4:01 AM
Itās a good question. That said unless there are compliance or fallback concerns i would prefer a service that burns my data on departure.
modernpacifist
today at 3:56 AM
Either mark-for-delete has the same impact as deleting in terms of shooting all the Cloud resources associated with the subscription, at which point the outage still happens but maybe the recovery is smoother or you've just delayed the inevitable by a week because no one will look at it unless there is actual impact.