Architecting Scalable Applications in the Cloud: The Caching Tier

Last week I discussed the application tier and the implementation considerations that come into play with regard to promoting the availability and resiliency of that tier of your architecture. Here, in the third article in my series on Architecting Scalable Applications in the Cloud, I will cover a part of the architecture that is often forgotten when implementing an application – the caching tier.

Many RightScale customers I speak with have some variation of the classic three-tier architecture (load balancers, web/application servers, and database servers), but oftentimes the caching tier (think of it as the second-and-a-half tier), is ignored. In most classic web-based applications, the database is the eventual bottleneck, so anytime you can alleviate some of its load, your application’s performance will benefit.

You can scale a load-balancing tier horizontally or scale the application tier, but it is much more difficult to scale the database tier, which I will address in next week’s post. However, the best way to prolong the effectiveness of the database tier you have is…to not need it. If you can eliminate some percentage of the queries directed to your database servers, then you have extended the viability of your current database configuration.

The third tier shown in the figure below is the data caching tier, which is typically implemented with memcached. Not all application architectures will benefit from a caching solution, but the majority of scalable applications will realize increased performance via the use of a distributed cache.

Classic Three-Tier Architecture for Cloud-Based Applications

For applications that are read-intensive, a caching implementation can provide large performance gains because it reduces application processing time and database access, sometimes dramatically. Write-intensive applications typically do not see as great a benefit, but most applications, even if they are write-intensive, have a read/write ratio greater than 1, which implies that read caching will still be beneficial.

Using Memcached and Multiple Instances

Memcached is fairly lightweight in terms of CPU utilization, but heavy (as heavy as the user will allow) on memory usage, so it is advisable to use larger instance sizes (in terms of memory) for servers in this tier. In the early phases of an application’s lifecycle, the caching footprint (the total size of cached objects) tends to be small, and your instinct may be to use a single instance to provide the cache for the entire application server tier. Although this is effective in light to moderate usage situations, it is not advisable for production applications as traffic increases.

A single caching server represents a potential single point of failure for the application cache, and a loss of this instance can result in a severe performance hit to the application and backing database. As such, I recommend that multiple instances (distributed across availability zones within the selected region/cloud where possible) be utilized in the implementation of the caching tier.

Another approach I recommend is to approximate the required caching footprint and then to overprovision the required memory by some percentage across multiple instances. This overprovisioning provides a buffer of extra caching capacity as the application’s usage increases, and the distribution across multiple instances in multiple availability zones provides reliability and availability while eliminating single points of failure.

Auto-scaling a memcached tier is not a recommended best practice because it typically requires a configuration file update and a restart of the application, but it is a fairly straightforward manual process. When additional memcached servers are added, the hashing algorithms used by the application servers to locate the correct memcached server will potentially map to different caching servers, so there will most likely be an increase in database load as the previously cached objects (and newly requested objects) are distributed across the new larger caching pool.

I typically compare this situation to a card game in which the database is the dealer, and the caching servers are the players. In this game, every time a new player joins, the dealer needs to re-deal all the cards to the players. So during this re-deal, the database sees high levels of activity as it finds the requested objects and “deals” them out the caching servers.

How to Set Time-to-Live (TTL) Values

It is this ability to add caching servers that drives a recommended best practice regarding time-to-live (TTL) values on cached objects. Although TTLs can be set to indefinite values (that is, no timeout), I don’t recommend this. TTLs should always be set to expire since the addition of a new caching server may result in a redistribution of the cached objects on an existing server. If a previously stored object never expires from the cache, it will reside in memory indefinitely, thus unnecessarily consuming resources. A restart of the memcached process will clear the cache of any non-expiring objects, but this will also affect all other objects in the cache, and application and database performance will be negatively affected as the cache is repopulated over time.

For applications with a small caching footprint, memcached can be installed co-resident with the application on the individual application server instance. Although this is not a recommended best practice in a production environment, it can be used as a cost-savings measure since a separate tier of caching servers is not required.

A disadvantage of this implementation is that cached information is most likely duplicated across application server nodes, thus multiple database accesses are required to populate the same data object in each of the caches. If a caching solution is used on each application server, each cache should be used only for local cache purposes and not as part of a distributed cache. If these caching servers were consolidated into a distributed cache, then as application servers launch and terminate in the scalable application tier to handle the dynamic traffic load, there would be constant turnover in the distributed cache, which would result in application and database degradation as the cached objects were constantly redistributed.

Other Options for Caching

Although I have focused on memcached because it is fast, cheap, and open source, it does have its limitations (first and foremost the inability to dynamically scale effectively). An additional caching tier option is Couchbase, which allows for the dynamic scalability of the caching tier and the automatic redistribution of cached objects (Couchbase handles the role of the dealer in my analogy above, so the database is not impacted).

While my focus here has been on implementing a cache between the application and database tiers, you also have the option in some cases to enhance overall system performance by adding a caching level further up the architecture. If your application tends to serve up the same pages and/or objects to many users, then a caching tier in front of your application servers may prove beneficial.

In an implementation of this type, the user’s request hits the caching tier first, and if the requested page or object exists in the cache, it is served directly, with no access to the application tier required. There are several options for caching at this level, and the solutions are often referred to as “website accelerators.” One that RightScale has experience with is Varnish. The vast majority of customer applications that we encounter here at RightScale benefit from caching of some sort – either between the application and database tiers, in front of the application tier, or (in many instances) both.

Your caching options are diverse, and there are pros and cons to each, so do your homework and determine the best fit (or fits) for your individual use case. Many of the concepts that I touch on here are covered more completely in my white paper on Building Scalable Applications In the Cloud.

In the last post of this series I will drop down to the final level of the architecture diagram and discuss some of the considerations that come into play at the database tier. While horizontally scaling this tier can be challenging (at least in the world of RDBMS), there are some tricks you can employ. And with the proliferation of NoSQL solutions, truly scalable solutions are starting to emerge and mature.