This week’s post is the second in a series on Architecting Scalable Applications in the Cloud, which summarizes some of the typical discussions I have with RightScale customers as they bring their new (or existing) applications into the cloud. Last week I discussed the issues to consider when implementing a robust, resilient, and highly available load balancing tier. This week, I will drop one level down in the architecture and talk a bit about the application tier, where a lot of the “scalability” of scalable applications comes into play.
The application tier shown in the figure below includes two standalone application servers and an auto-scaling array. In an environment in which all application servers are homogeneous, I typically recommend including all application servers in an array to simplify the process for relaunching servers in the case of instance failure. I also recommend configuring the array to auto-scale across all availability zones of the selected cloud or region (for those clouds like AWS that use the concept of zones) to increase the reliability and availability of the application. If you’re not familiar with auto-scaling arrays, check out RightScale cloud automation features.
Classic Three-Tier Architecture for Cloud-Based Applications
This automatic scaling (both up and down) of the array is implemented with a tier-wide voting process based on instance-specific metrics, the most common of which are CPU-idle, free memory, and system load. However, virtually any system metric can be used as an auto-scaling trigger, including application-specific metrics that can be added via custom plugins to the RightScale monitoring and alert system. When any of the thresholds specified by these metrics are met, the custom alert associated with that metric is initiated, which can result in a scaling up of additional application servers in the case of increased demand or the decommission of active servers if the load decreases.
As a best practice, I recommend that application server arrays scale up in a conservative manner and scale down conservatively as well. Thus, additional instances should be launched before they are needed when an upward trend in activity is detected. It is important to determine the time required for a server to become operational after launch and to factor this into the scaling metrics. Similarly, instances should only be decommissioned when they have been lightly utilized for a pre-determined period of time.
Scaling up conservatively helps to ensure that resources are continually available to your application, while scaling down conservatively prevents terminating application server instances prematurely and causing undesirable user experience. The only disadvantage is that if too conservative an approach is taken, additional server time will be charged. But due to the billing model of utility computing, you will incur charges for one hour only (the smallest billing granularity) in the case in which a server was launched unnecessarily, because the scale-down metrics will terminate this server before the next billable hour begins.
Arrays can be configured to have both a minimum and a maximum number of instances. By setting this minimum value to two or higher, you will eliminate the potential for a single point of failure in the application tier while under minimal load.
Determining Optimal Instance Size
The maximum array size allows an upper bound to be placed on the total number of instances running, and thus places a limit on infrastructure costs. The optimal instance size for an application server in a scalable array should be determined via load testing and performance benchmarking. The majority of RightScale customers running in AWS EC2 typically begin with m1.large instances for the application server role, and let load testing dictate if a move up or down in instance size is appropriate.
When possible from both a cost and user experience perspective, I advise that application servers be launched in multiple availability zones (within the same region/cloud) to increase reliability and availability. While cross-region deployments are not recommended due to the cost and latency involved, the EC2 architecture provides reasonably fast connectivity between zones within the same region, and at reasonable bandwidth costs. As such, dispersing server instances across all availability zones in a selected region is a recommended best practice with regard to application availability and reliability.
RightScale provides a free cloud cost calculator for modeling deployments and calculating costs on AWS, Google Compute Engine, Rackspace, and Windows Azure.
Ensuring High Availability
Other less obvious considerations come into play with regard to the availability of the application server tier. For example, the application code that these servers need to install is typically going to be pulled from a repository (it can be bundled into the image or included as an attachment to a boot script, but this is not a recommended best practice as it limits version flexibility and configuration). If the repository being used is not available during a scaling event, then the application servers will not become operational and therefore not able to participate in reducing the overall load.
One of the basic tenets of high availability is no single point of failure. While this may be obvious in terms of things like the number of servers in a tier, and the distribution of servers across availability zones, you must also consider the dependencies your application and its architectural components may have, and devise high availability implementations for these components as well.
In this article I’ve attempted to capture some of the more common and generally applicable concepts with regard to the application tier. For a more detailed analysis of the application tier as well as the other tiers of a scalable application, please see my white paper, Building Scalable Applications In the Cloud.
In my next post in this series, I will drop down yet another level in the architecture diagram to address the caching tier, including where and when it is useful, how it can be implemented, and the pros and cons of these various implementations.