Updated March 16, 2016
A regular tune-up can help any enterprise improve its cloud operations and potentially save money. Why does even the most well-architected deployment need a tune-up? Because over time deployments grow organically in ways that you never plan. Sometimes you have to do things quickly to solve an immediate issue and you never go back and do it by the book — and when things are working, there’s a tendency not to mess with them.
A tune-up of your cloud environment can potentially save you money and improve your performance. And if you’re not already following best practices for high availability (HA) and disaster recovery (DR), a tune-up can put processes into place that will make your life easier when you get that panicked phone call at four in the morning, or help you avoid it altogether.
Here are 20 simple tips that can help you save money, improve server utilization, improve cloud security and availability, and otherwise optimize your cloud infrastructure.
Cost Optimization
The first things to look for are ways to reduce costs by eliminating unused and unneeded resources. These resources typically run without anyone noticing, and when you don’t see them, you tend to forget about them — but if they’re running, they’re costing you money. RightScale provides a free trial of Cloud Analytics to help you visualize, analyze, and optimize cloud costs across public and private clouds as well as virtualized environments such as VMware.
1. Examine Block Volumes and Snapshots
Use the RightScale dashboard to look at block storage volumes and data backups and snapshots. Sorting the display by status, you can see whether you have volumes that you haven’t attached in a long time. Those are resources you probably don’t need.
2. Eliminate Unused Object Storage
Storage objects, such as those in AWS S3, OpenStack Cloud Files, or Google Cloud Storage, can be difficult to get a handle on. Object storage can become a bit disorganized due its unstructured nature, so it is wise to revisit your “buckets” every so often to try to maintain control of the chaos.
3. Increase Capacity Only When Necessary
For storage resources, a good rule is not to over-allocate initially, but rather increase capacity when you need to. RightScale has tools that let you grow your starter volume to a larger size, so you don’t pay for more capacity until you need it.
4. Optimize Cross-Region Traffic
One area where you may not want to skimp is cross-zone or cross-data center bandwidth. It’s pretty cheap, and you need it for HA configurations. However, you may be able to save on your cross-region and cross-cloud costs, which can be significant, by compressing that traffic or using a WAN optimizer.
5. Find and Eliminate Abandoned Instances
Many organizations create development and test servers to use for a short time for a specific test or proof-of-concept. Unfortunately, it is easier to spin these servers up than it is to remember to terminate them when they are no longer needed. You should occasionally review your servers and see what is still relevant to your operations. If the number of servers you manage is too large to make this a quick process, you can create automated alerts to notify you of potentially abandoned instances. You may want to monitor for very low CPU utilization for an extended period of time, or for low interface traffic, and fire an alert when such an instance is found. You can then inspect each identified server manually and decide whether it is needed.
6. Use Reserved Instances and Discounts for Cost Savings
You may be able to save money if you know you’re going to need a particular instance size in a particular zone for a given period of time. AWS provides attractive pricing for its Reserved Instances and Spot Instances; Microsoft Azure offers 6-month and 12-month discount plans; and Google Cloud Platform offers Sustained Use Discounts and Preemptible VMs. The RightScale Pricing Service provides API access to an up-to-date repository of more than 100,000 current and historical public cloud prices.
Server Utilization
Server utilization is where you’re likely to find the greatest amount of inefficiency. Overutilized resources may generate alerts that require your attention, but you can also save money by reconfiguring underutilized resources that keep running even when they’re not necessary.
7. Size Instances Appropriately
Choose a correctly sized instance for the task at hand, perhaps via load testing. You don’t want servers to be overused, and you don’t want them to be underutilized. Put alerts in place that tell you when your instance is struggling so you can adjust your instance configuration if you need to.
8. Spread the CPU Load
If your instances have multi-core CPUs, specify CPU affinity or use irqbalance
and spread the load across the cores when your applications lend themselves to that, or when you’re running multiple processes on the same instance. You have paid for those cores, so you might as well use them.
9. Check Your Memory and Load Average
Ideally you should run at 70 to 80 percent of memory consumed. Load average should run slightly under 1.0 times the number of cores in your instance.
10. Use Monitoring and Alerts
Cloud monitoring and alerts can help you discover small problems before they become big ones. Look for trends and act on them, rather than waiting for spikes or anomalies. To save money, look for underutilization as well as overutilization.
HA/DR
11. Avoid Single Points of Failure
Place one of each component, such as load balancers, application servers, and databases, in at least two zones. Replicate data across zones for HA, and back up or replicate across regions to enable failover for DR. You can use RightScale to alert you to problems and automate resolution or failover processes.
Security
12. Take Advantage of Security Groups
If the cloud you’re on provides security groups, use them. Security groups give you the ability to specify a range of IP addresses and a range of ports and specify whether all the entries in each group are allowed in or not allowed in. You can nest security groups to set up an easy-to-manage hierarchy.
13. Check Your Firewalls
Look at your iptables rules and make sure that they’ve been enabled. Run the iptables recipe and check the output for ports that should not be open to the world and other potential problems. If you have open ports, make sure they need to be open, and make sure you know who they’re open to.
14. Deploy Security Updates
Just like in your data center, you need to make sure your cloud servers are up-to-date with the latest security fixes. For RightScale users, the latest version 13.5 of RightScale ServerTemplates™ has recipes to let you unfreeze security repositories for Ubuntu and upstream repos for CentOS and perform security updates, and we have similar functionality for Windows Updates as well.
Best Practices
15. Re-Examine Image Bundling
Image bundling is a practice that involves installing software on a base image and then creating a new image based on the bundled software. While this is a common approach in a traditional virtualized environment, it also can cause maintenance headaches, lock you into whatever versions you’ve bundled, and lead to image sprawl.
Many companies are moving to dynamic configuration, where servers are dynamically provisioned based on scripts and dynamic parameters, often leveraging tools such as Puppet or Chef.
RightScale supports these modern dynamic configuration approaches by providing a base RightImage™ containing the operating system and script-based ServerTemplates to install all the software you need at boot time. If you need to boot faster and you are running on AWS EC2, use EBS-backed images, but be aware that you have to pay for EBS storage.
Having said that, we see two rare cases where we do recommend bundling. Both tend to be with Windows-based images. If an application requires manual software installation, where you have to keep clicking Next and have no way to automate the process, that’s an argument for installing the application and then bundling it. The other situation is when an image’s boot time is unacceptably long, and therefore it can’t start in a timely manner in response to a dynamic event.
16. Think “Application”
Focusing on individual cloud instances can make it challenging to understand the interdependencies between servers as they form an application. RightScale provides deployments as application containers, so you can have a single view of all the servers that are participating in a given application, rather than spreading an application’s servers across multiple deployments.
17. Version-Control Your Configuration Scripts
Don’t use the editable HEAD version of a ServerTemplate in production. If you do, someone may make a change to a ServerTemplate, and you can wind up with multiple servers in an array running different software, making any problems that come up almost impossible to debug.
18. Keep Secrets Secret
Think about how you are managing your cloud credentials. With RightScale, you can use named credentials as placeholders for all sensitive inputs. Don’t expose things like passwords as plain text, and thereby make them visible to people with only observer privileges.
19. Start Small, Scale Up
Choose correctly sized instances for your application and auto-scale to meet the task at hand. Suppose you have an application running on servers in an array that maintains a minimum of two servers for redundancy. Find the smallest server configuration building block that meets your needs, save yourself some money, and scale up the number of servers when your application demands it.
20. Automate Operations
Don’t SSH into production instances and modify config files – if you do, your changes disappear when the server is relaunched, and if you don’t document what you did, you may not be able to do it again. Cloud automation, in the form of scripts, acts as a kind of documentation.
Takeaways
In summary, clean up deployment sprawl and seek out opportunities for cost optimization, find the right resources for the job, design in HA from the start, and use DR as insurance against something you hope never happens. For security, open up only what you have to. And use best practices to promote operational efficiencies.
To see how RightScale can help you deploy and manage your multi-cloud environments more efficiently, sign up for a free trial of RightScale Cloud Management.