Architecting Scalable Applications in the Cloud

You’ve reached an archived blog post that may be out of date. Please visit the blog homepage for the most current posts.

As an architect in the Professional Services group, I get the opportunity to talk to a lot of RightScale customers. And I enjoy it very much since I get to learn about all their cool new ideas and the technologies and applications they are bringing to the cloud.

Over the course of the last several years and hundreds of customer conversations, a few common threads of discussion always come up. New topics also arise (for example, since the public cloud outages over the last 18 months or so, a lot more discussion has focused around disaster recovery architectures), but there are a few key items that I consistently advise our customers on. This post is the first in a series on these common conversations that revolve around the techniques for building scalable and highly available applications in the cloud.

In customer scoping calls, I like to start at the top of the architecture and work my way down, and I am going to follow that same process in this blog series. So this first post is around the considerations that come into play with the load balancing tier.

The diagram below illustrates a very typical three-tier (well, four counting the optional caching tier) architecture that many cloud-based applications use as shown, or in some modified form. I have included it here to use as a reference.

Classic Three-Tier Architecture for Cloud-Based Applications


Architecting the Load Balancing Tier

The first tier shown in the reference architecture above is composed of two load balancers, typically running HAProxy. For our customers running in Amazon Web Services’ Elastic Compute Cloud (EC2), these load balancers are commonly run on m1.large instance types, which provide 2 virtual cores, 7.5GB of memory, and a 64-bit platform. Extensive testing on this configuration has shown the capacity to handle approximately 5,000 requests per second on each load balancer, thus giving a combined total of about 10,000 requests per second for the example shown.

In March of this year AWS released support for the m1.medium instance type, which has 1 virtual core and 3.75GB memory available. While I have not had the opportunity to run my full-length tests on this new instance size, some initial smaller-scale tests I have run have shown these instances to have comparable performance characteristics to an m1.large for the load balancing function. This was surprising given that the I/O performance rating AWS assigns to these instance types is “Moderate” as opposed to “High” for the m1.large. Each use case is different, so your mileage may vary, but for cost savings an m1.medium may be a viable option.

Estimating a site’s highest traffic rate in terms of requests per second and dividing this value by 5,000 will give you an approximation as to the number of load balancers required to handle the traffic load. For detailed descriptions and analysis of the load balancing tests I performed, see my whitepaper, Load Balancing in the Cloud: Tools, Tips, and Techniques.

Regardless of estimated load, I recommend using two front-end load balancers to provide redundancy in the case of a server failure. Additionally, I recommend that these load balancers be placed in different availability zones (for those clouds that provide segregated zones) to increase the reliability and availability of the application.

For the vast majority of new deployments, two load balancers are sufficient in an application’s early lifecycle phases. A RightScale customer in the social gaming space has handled over 30 million daily active users, which would not be possible with just two load balancers, but their deployment started in a similar configuration, and additional load balancers were added as traffic to the site increased.

As cost management should always be a consideration, the load balancers can initially be run on m1.small instances (1 virtual core, 1.7GB memory, 32-bit platform) as a cost-cutting measure and can be migrated to larger instances as demand increases. However, the performance of m1.small instances was shown to be approximately 56 percent of that of an m1.large.

While auto-scaling front ends is theoretically possible, it is not a recommended best practice due to the complexities around configuring DNS programmatically using a DNS Application Programming Interface.  However, if needed, RightScale has tools in place that facilitate adding front-end load balancers when necessary. The new load balancer is launched manually, DNS is configured, and all application servers in the deployment can be triggered to automatically register themselves with this new front end.

When to Use Elastic Load Balancer (ELB)

An additional option for implementing load balancing functionality is the Elastic Load Balancer (ELB).  The ELB solution is only available within the Amazon Web Services (AWS) infrastructure, as opposed to an HAProxy-based solution, which is open source software and available on any Linux platform.

An ELB is in essence a load balancing appliance, and as such, the configuration options available are limited, and visibility into its inner-workings and performance characteristics are not possible at the level they are with instance-based load balancers. However, an ELB is an intrinsically scalable solution in that it will automatically scale to accommodate increased load.

The ramp-up time for this additional capacity may be a limiting factor for certain applications, such as environments where rapid traffic increases may occur. For any environment in which rapid, large increases in traffic may be experienced, it is always advisable to run load tests against any chosen architectural component to ensure the needs of the application are met.

Other Options for Load Balancing

Other options exist for the load balancing tier as well. Some of these are software solutions that are cross-cloud portable, while others are cloud specific. In the former category are technologies such as Riverbed’s Stingray Traffic Manager (formerly Zeus Traffic Manager) and aiCache, which is not technically a load balancer but performs quite adequately in that role.

Options for cloud-specific solutions include the Rackspace Cloud Load Balancer (CLB) and even hardware solutions such as using an F5 in a hybrid Rackspace environment in which cloud-based application servers can connect to the F5 using the RackConnect infrastructure.

A nice feature of these cloud-specific offerings (ELB, CLB, F5 in RackConnect) is that a single IP is presented to the end user, and as such only a single DNS A record (or CNAME record in the case of ELB) is required.

I could go on for days about load balancing, so if you want all the details, check out my two white papers that cover load balancing, Load Balancing in the Cloud: Tools, Tips, and Techniques and Building Scalable Applications In the Cloud. They have all the info I covered here and more.

In my next post in the series on architecting scalable applications in the cloud, I will traverse one level down in the diagram and discuss the scalability and availability considerations at the web/application server tier.