It was supposed not to happen. I believed that Amazon had very good, redundant, systems.
I understood when an hurricane hit them in 2012 and I had some Servers in the US down, and had to check for data integrity (that time I had no noticeable losses), but this time it happens to be in Europe, concretely in Ireland zone, without any heavy climate issue, and they totally lost my instance disk, and I got no reasonable explanation, neither a line or reason that could give me some peace of mind and make me thing that this will not never ever happen again and that I can trust the platform.
I just received this email. They sent me without any action by my part. At the beginning I thought it was an Scam. I just could not believe that an email so cold, lacking humanity, letting me down without giving me any alternative, no compensation, without an email address to reply, phone to call for an explanation, no one signing it… was sent by Amazon. Improper of them.
When I log in in the panel I saw it. The volume shows “error”.
Sadly is not possible to create an snapshot…
The volume was created on the 13th January 2015, and was the disk of the best instance available, the new c4.8xlarge so what is the point in Amazon email talking about durability?. What about the SLA?.
I really like the Amazon Engineering Teams, but as company this time they really screwed it up.
I wrote to one of my friends, manager in Amazon, that I met when I was finalist in a process there, to just share this, as what happen and the automatic email is just unacceptable. I know they are concerned by quality and good service to clients.
Here is the automatic email they sent me in text, just edited the ids with word number:
[Case number] [re: Your EBS volume]
X-Original-From: Amazon Web Services <firstname.lastname@example.org> Reply-To: Amazon Web Services <email@example.com> Hello - Your EBS volume vol-number in eu-west-1 experienced a failure due to multiple failures of the underlying hardware components and we were unable to recover it. Although EBS volumes are designed for reliability, backed by multiple physical drives, we are still exposed to durability risks caused by concurrent hardware failures of multiple components, before our systems are able to restore the redundancy. We publish our durability expectations on the EBS detail page here (http://aws.amazon.com/ebs). We apologize for this data loss and the adverse impact for your business.
Imagine my face, as CTO, that decided in favor of Amazon, having vouchers of 65,000 USD from Microsoft Azure, and 10,000 USD from Google Cloud, having to explain to the CEO and to the investors that the Cloud provider I choose, and we are paying for -One c4.8xlarge costs ~ 1,000 USD/month- killed irreversibly one of our servers.
I keep my backups in the Cloud as snapshots for speed and autoscaling, for the obvious advantages, but I always keep everything backuped also outside, as well, in house and in several geographic locations, just to prevent “impossible”/unbelievable things like that.
But in normal situations and for 99,99% of companies is impossible not to loss data if your Cloud provider just wipes your disk. And the worst thing is a matter of losing trust.