Lightning Strikes Cause Of Google Cloud Outage
However, in a very few cases – less than 0.000001 percent of total disk storage space – the data was unrecoverable and permanently lost.
Google must have done something to anger the gods, for they have blasted one of Google’s European data centers with lightning not once, not twice, not thrice, but four times.
Google’s confessional also says the company “has an ongoing program of upgrading to storage hardware that is less susceptible to the power failure mode that triggered this incident”.
“This outage is wholly Google’s responsibility”, the document continues, but then goes on “…to highlight an important reminder for our customers: “GCE instances and Persistent Disks within a zone exist in a single Google datacenter and are therefore unavoidably vulnerable to datacenter-scale disasters”. Google statement did not give a list of the clients who could have lost the data and how it will recoup their damages. In fact, Google stored copies in other data centers. At the height of the calamity, about 5% of the disks in the data center were experiencing I/O errors.
And James Wilman, engineering sales director for the data centre consultants Future-Tech, said that though such data centers are designed to withstand lightning strikes via a network of conductive lightning rods, it was not impossible for strikes to get through. The servers have battery backups, and the building itself has a full auxiliary power system. Some recently written data was located on storage systems that were more susceptible to power failure from extended or repeated battery drain. That’s why only a small fraction of GCE instances were affected.
“Since the incident began, Google engineers have conducted a wide-ranging review across all layers of the data centre technology stack, from electrical distribution systems through computing hardware to the software controlling the GCE persistent disk layer”, said Google. The point of this is that users can setup resilient infrastructure that’s capable of failing over from one zone to another in the case of any problems like we saw this month. Some customers found that the GCE platform was inaccessible up until Sunday evening as storage systems were gradually checked and brought back online. It is these drives that have encountered permanent data loss.