Why Amazon’s Elastic Block Store Matters

You’ve reached an archived blog post that may be out of date. Please visit the blog homepage for the most current posts.

On the technical side, Amazon’s EBS service may look like just another great new feature of the Elastic Compute Cloud, but on the business side it enables a whole slew of new customers. I won’t pretend that I understand all the new uses, but I can talk about those we see and are supporting.

First a couple of words about what EBS is. In short, it’s a SAN (storage area network) in the cloud. You can allocate a disk volume of 1GB to 1TB in size from what is now an endless SAN in the cloud and attach it to an instance of yours running in EC2. The volume is stored on redundant disks (with some form of RAID) and has a lifetime separate from any instance on which it is mounted, so you can unmount it and later remount it on another instance. You can also perform a snapshot backup of a volume to S3, where it is stored with the redundancy and durability of all objects on S3. Moreover, successive snapshots are incremental, providing a powerful and efficient incremental backup capability for volumes.

All this and much more is explained in detail in another post, and there’s yet more detailed EBS information on our support site. The official EBS announcement is on the EC2 detail page, Werner Vogels provides some background, and Jeff Barr’s blog entry has links to many other related announcements.

The RightScale dashboard supports all the features of EBS and offers a number of additional goodies, such as configuring volumes to automatically be attached to servers when these launch and keeping track of the ancestry of a volume or snapshot.

What does EBS enable? In short, traditional processing on large datasets and reliable storage for many servers. Let’s look at these areas one by one.

Large Datasets

Amazon Web Services are designed for scale. EC2, S3, SQS, and SDB are ideally suited for building large systems that process huge data volumes. The catch has been that they are geared toward modern service-oriented systems that can use storage accessed via HTTP PUTs and GETs (Amazon S3), can work using a non-relational database like Amazon SDB, and thrive on large numbers of simple servers (EC2). Users that have more traditional applications, such as relational databases, that require large datasets stored in a filesystem with a POSIX interface have had difficulties in meeting all their requirements for operating in AWS. While an EC2 X-large instance comes with about 1.4TB of local disk, it is rather difficult to actually use this disk space in a production system. Populating the disk with data at boot time can take hours, and backups, replication, and restoring the data in case of an instance failure are all sore points. For up to 100GB the timescales are all workable, but beyond that it gets difficult.

With EBS, the processing of large datasets contained within a filesystem becomes easily accessible. Volumes can be up to 1TB in size; beyond that it is possible to mount multiple volumes on the same instance such that filesystems of 10TB are practical. The volumes can further be backed up to S3 using snapshots, and they can be replicated by creating new volumes from the snapshots. What is particularly nice is that a volume can be created in any availability zone (think data center) of a region from a snapshot, so copying a large volume across data centers can be offloaded to EBS efficiently.

Many Virtual Appliance Servers

EBS also enables SaaS vendors that use a single-tenant “virtual appliance” model. Many software vendors have approached us with use cases where they would like to run individual servers on behalf of their customers. Often these servers are co-managed between customer and software vendor or have other properties that make the service inappropriate for multitenant SaaS implementation. In these use cases the end customer is storing important data on these servers and requires a robust data safeguarding architecture, in particular for database storage. While we today have a very effective MySQL replication and backup solution, it is really geared at multiserver setups and doesn’t fit the price and complexity budget of cookie-cutter single-server virtual appliances. For those use cases, EBS brings the desired performance and reliability and drops the complexity and price.

With EBS the canonical reliable single-server virtual appliance can be implemented with the following architecture: an EC2 instance whose type is chosen for the CPU and memory required, an EBS volume sized appropriately for the data set, a revolving set of frequent snapshots providing disaster recovery backups, and periodic application-level export of backups to S3 for archiving and off-cloud backups. In case of a total failure of the EC2 instance and the EBS volume (as might happen with, for example, a data center fire) a new instance and volume can be allocated in another availability zone from the last revolving snapshot.

When it’s time to upgrade the virtual appliance to a new software version it becomes relatively easy for the software vendor to spin up a second instance and volume with the upgraded software for important customers so they can test-drive the new version on their data and train their internal users before committing to the upgrade.

Try It Out for Yourself!

We’ve been busy integrating support for this new storage system for months so that you can start using it immediately. Our RightScale Dashboard support for EBS is available as part of our free Developer Edition. To learn more about EBS and RightScale’s support for it, check out my detailed technical review, read our EBS tutorials at wiki.rightscale.com, register for our upcoming RightScale EBS Webinar, or just drop us a line at sales@rightscale.com.