Xen Bug Drives Cloud Reboot: Survey Shows Users Undeterred

The Xen vulnerability announced October 1 caused AWS, Rackspace, and SoftLayer to patch and reboot many cloud instances in the preceding week. We wanted to go beyond the anecdotes, tweets, and blogs to survey cloud users and find out how they actually fared during the reboot: Did they experience downtime of their applications? What strategies did they use to minimize the impact? What were their lessons learned?

Our survey was conducted on Oct 2-3, 2014, (after all the reboots had completed) with 449 respondents. We got responses from 349 AWS users, 66 Rackspace Public Cloud users, and 42 SoftLayer Virtual Server users. We also received 74 responses from organizations that used Xen in their internal data centers. Note that some respondents used multiple clouds, so the total adds up to more than 449.


AWS Users Had Less Downtime

The most critical measure of the reboot was the impact on customer applications. AWS came out on top with 51 percent respondents who use AWS reporting no application downtime as a result of the reboot, and another 21 percent reporting less than 5 minutes of downtime. Fewer Rackspace and SoftLayer users escaped with no downtime: 27 percent and 26 percent, respectively.

On the other end of the spectrum, the percentage of users of each cloud that reported more than an hour of application downtime was 5 percent for AWS, 13 percent for Rackspace, and 17 percent for SoftLayer.

The lower downtime for the AWS reboot could be partially explained by the fact that, unlike the Rackspace and SoftLayer reboots, it did not affect all instances. AWS reported that it affected less than 10 percent of its entire fleet. The percentage of instances slated to reboot varied for each individual AWS user depending on the instance types used and luck of the draw. Thirty-nine percent of survey respondents using AWS saw less than 10 percent of their instances slated to reboot while 51 percent saw more.

AWS Users Leveraged Additional Strategies to Avoid Downtime

The lower application downtime for AWS users is also certainly driven by some of the strategies that they employed to avoid downtime. Because the AWS reboot did not affect all instance types or all instances, AWS users were able to relaunch instances ahead of the reboot (29 percent) and move resources to unaffected instance types (12 percent). In addition, a large number of AWS users (43 percent) were using multiple availability zones (AZs).

For Rackspace and SoftLayer users, unless they had already architected across regions or leveraged automation, they were limited to moving resources between regions, a strategy employed by 15 percent of Rackspace users and 18 percent of SoftLayer users. As a result, 48 percent of Rackspace users and 39 percent of SoftLayer users took no preventative action, as compared to 20 percent of AWS users.

Most Cloud Users Weathered Reboots Well

Cloud users were given relatively short notice by cloud providers. AWS and Rackspace users started receiving maintenance notices roughly 36 hours in advance of the first reboot, although the Rackspace notices went out late Friday evening to warn of reboots starting Sunday morning. SoftLayer users got even less warning — 12 hours in some cases — resulting in only 42 percent reporting that they had enough time to prepare.

Despite the short notice, more than half of cloud users felt their companies weathered the reboots well, with AWS users feeling most confident (76 percent).

In contrast, because they had control over the timing of the process, companies using Xen in internal data centers were more likely to think they had enough time to prepare. And a large majority of Xen users (81 percent) reported they weathered the reboots well. However, 41 percent of Xen users reported that the patching process created a lot of extra work, a higher percentage than all of the public cloud users.

Cloud Users Remain Undaunted

The broad majority of cloud users are likely to continue using their provider despite the reboots. Only 10 percent of AWS users, 20 percent of Rackspace users, and 29 percent of SoftLayer users are less likely to use their particular cloud provider. Similarly only 16 percent of Xen users reported they are less likely to use Xen.

Cloud Reboots Were a Wake-Up Call

Cloud users plan to continue using cloud despite these reboots, with only 5 percent expecting to use cloud less. But their recent experience is pushing many cloud users to implement additional cloud and IT best practices. Cloud users intend to avoid futured downtime by making such changes as creating a plan (37 percent), implementing redundant architecture across AZs (35 percent) or regions (27 percent), employing more automation (30 percent), and leveraging multiple cloud providers (26 percent).

Overall, the recent cloud reboot may have revealed that cloud is actually growing up. Users seem to understand that there will be occasional issues and that they need to prepare for those issues — just as they do in their on-premises data centers.

To access the deck with the complete survey results, download the 2014 Cloud Reboot Survey.

All charts and data from this blog can be used under a Creative Commons Attribution 4.0 International License.