Saturday, August 14, 2010

Hidden Costs in the Cloud, Part 1: Driving the Gremlins Out of Your Windows Azure Billing

grem•lin (ˈgrɛm lɪn) –noun
1. a mischievous invisible being, said by airplane pilots in World War II to cause engine trouble and mechanical difficulties.
2. any cause of trouble, difficulties, etc.

Cloud computing has real business benefits that can help the bottom line of most organizations. However, you may have heard about (or directly experienced) cases of sticker shock where actual costs were higher than expectations. These episodes are usually attributed to “hidden costs” in the cloud which are sometimes viewed as gremlins you can neither see nor control. Some people are spooked enough by the prospect of hidden costs to question the entire premise of cloud computing. Is using the cloud like using a slot machine, where you don’t know what will come up and you’re usually on the losing end?

These costs aren’t really hidden, of course: it’s more that they’re overlooked, misunderstood, or underestimated. In this article series we’re going to identify these so-called “hidden costs” and shed light on them so that they’re neither hidden nor something you have to fear. We’ll be doing this specifically from a Windows Azure perspective. In Part 1 we’ll review “hidden costs” at a high level, and in subsequent articles we’ll explore some of them in detail.

Hidden Cost #1: Dimensions of Pricing
In my opinion the #1 hidden cost of cloud computing is simply the number of dimensions there are to the pricing model. In effect, everything in the cloud is cheap but every kind of service represents an additional level of charge. To make it worse, as new features and services are added to the platform the number of billing considerations continues to increase.

As an example, let’s consider that you are storing some images for your web site in Windows Azure blob storage. What does this cost you? The answer is, it doesn’t cost you very much—but you might be charged in as many as 4 ways:

• Storage fees: Storage used is charged at $0.15/GB per month
• Transaction fees: Accessing storage costs $0.01 per 10,000 transactions
• Bandwidth fees: Sending or receive storage in and out of the data center costs $0.10/GB in, $0.15/GB out
• Content Delivery Network: If you are using this optional edge caching service, you are also paying an additional $0.15/GB and an additional $0.01 per 10,000 transactions

You might still conclude these costs are reasonable after taking everything into account, but this example should serve to illustrate how easily you can inadvertently leave something out in your estimating.

What to do about it: You can guard against leaving something out in your calculations by using tools and resources that account for all the dimensions of pricing, such as the Windows Azure TCO Calculator, Neudesic’s Azure ROI Calculator, or the official Windows Azure pricing information. With a thorough checklist in front of you, you won’t fail to consider all of the billing categories. Also make sure that anything you look at is current and up to date as pricing formulas and rates can change over time. You also need to be as accurate as you can in predicting your usage in each of these categories.

Hidden Cost #2: Bandwidth
In addition to hosting and storage costs, your web applications are also subject to bandwidth charges (also called data transfer charges). When someone accesses your cloud-hosted web site in a web browser, the requests and responses incur data transfer charges. When an external client accesses your cloud-hosted web service, the requests and responses incur data transfer charges.

Bandwidth is often overlooked or underappreciated in estimating cloud computing charges. There are several reasons for this. First, it’s not something we’re used to having to measure. Secondly, it’s less tangible than other measurements that tend to get our attention, such as number of servers and amount of storage. Thirdly, it’s usually down near the bottom of the pricing list so not everyone may notice it or pay attention to it. Lastly, it’s nebulous: many have no idea what their bandwidth use is or how they would estimate it.

What to do about it: You can model and estimate your bandwidth using tools like Fiddler, and once running in the cloud you can measure actual bandwidth using mechanisms such as IIS logs. With a proper analysis of bandwidth size and breakdown, you can optimize your application to reduce bandwidth.

You can also exercise control over bandwidth charges through your solution architecture: you aren’t charged for bandwidth when it doesn’t cross in or out of the data center. For example, a web application in the cloud calling a web service in the same data center doesn’t incur bandwidth charges.

Hidden Cost #3: Leaving the Faucet Running
As a father I’m constantly reminding my children to turn off the lights and not leave the faucets running: it costs money! Leaving an application deployed that you forgot about is a surefire way to get a surprising bill. Once you put applications or data into the cloud, they continue to cost you money, month after month, until such time as you remove them. It’s very easy to put something in the cloud and forget about it.

What to do about it: First and foremost, review your bill regularly. You don’t have to wait until end of month and be surprised: your Windows Azure bill can be viewed online anytime to see how your charges for the month are accruing. Secondly, make it someone’s job to regularly review that what’s in the cloud still needs to be there and that costs are in line with expectations. Set expiration dates or renewal review dates for your cloud applications and data. Be proactive in recognizing the faucet has been left running before the problem reaches flood levels.

Hidden Cost #4: Compute Charges Are Not Based on Usage
If you put an application in the cloud and no one uses it, does it cost you money? Well, if a tree falls in the forest and no one is around to hear, does it make a sound? The answer to both questions is yes. Since the general message of cloud computing is consumption-based pricing, some people assume their hourly compute charges are based on how much their application is used. It’s not the case: hourly charges for compute time do not work that way in Windows Azure. Rather, you are reserving machines and your charges are based on wall clock time per core. Whether those servers are very busy, lightly used, or not used at all doesn’t affect this aspect of your bill. Where consumption-based pricing does enter the picture is in the number of servers you need to support your users, which you can increase or decrease at will. There are other aspects of your bill that are charged based on direct consumption such as bandwidth.

What to do about it: Understand what your usage-based and non-usage-based charges will be, and estimate costs accurately. Don’t make the mistake of thinking an unused application left in the cloud is free—it isn’t.

Hidden Cost #5: Staging Costs the Same as Production
If you deploy an application to Windows Azure, it can go in one of two places: your project’s Production slot or its Staging slot. Many have mistakenly concluded that only Production is billed for when in fact Production and Staging are both charged for, and at the same rates.

What to do about it: Use Staging as a temporary area and set policies that anything deployed there must be promoted to Production or shut down within a certain amount of time. Give someone the job of checking for forgotten Staging deployments and deleting them—or even better, automate this process.

Hidden Cost #6: A Suspended Application is a Billable Application
Applications deployed to Windows Azure Production or Staging can be in a running state or a suspended state. Only in the running state will an application be active and respond to traffic. Does this mean a suspended application does not accrue charges? Not at all—the wall clock-based billing charges accrue in exactly the same way regardless of whether your application is suspended or not.

What to do about it: Only suspend an application if you have good reason to do so, and this should always be followed by a more definitive action such as deleting the deployment or upgrading it and starting it up. It doesn’t make any sense to suspend a deployment and leave it in the cloud: no one can use it and you’re still being charged for it.

Hidden Cost #7: Seeing Double
Your cloud application will have one more software tiers, which means it is going to need one or more server farms. How many servers will you have in each farm? You might think a good answer is 1, at least when you’re first starting out. In fact, you need a minimum of 2 servers per farm if you want the Windows Azure SLA to be upheld, which boils down to 3 9’s of availability. If you’re not aware of this, your estimates of hosting costs could be off by 100%!

The reason for this 2-server minimum is how patches and upgrades are applied to cloud-hosted applications in Windows Azure. The Fabric that controls the data center has an upgrade domain system where updates to servers are sequenced to protect the availability of your application. It’s a wonderful system, but it doesn’t do you any good if you only have 1 server.

What to do about it: If you need the SLA, be sure to plan on at least 2 servers per farm. If you can live without the SLA, it’s fine to run a single server assuming it can handle your user load.

Hidden Cost #8: Polling
Polling data in the cloud is a costly activity. If you poll a queue in the enterprise and the queue is empty, this does not explicitly cost you money. In the cloud it does, because simply attempting to access storage (even if the storage is empty) is a transaction that costs you something. While an individual poll doesn’t cost you much—only $0.01 per 10,000 transactions—it will add up to big numbers if you’re doing it repeatedly.

What to do about it: Either find an alternative to polling, or do your polling in a way that is cost-efficient. There is an efficient way to implement polling using an algorithm that varies the sleep time between polls based on whether any data has been seen recently. When a queue is seen to be empty the sleep time increases; when a message is found in the queue, the sleep time is reduced so that message(s) in the queue can be quickly serviced.

Hidden Cost #9: Unwanted Traffic and Denial of Service Attacks
If your application is hosted in the cloud, you may find it is being accessed by more than your intended user base. That can include curious or accidental web users, search engine spiders, and openly hostile denial of service attacks by hackers or competitors. What happens to your bandwidth charges if your web site or storage assets are being constantly accessed by a bot?

Windows Azure does have some hardening to guard against DOS attacks but you cannot completely count on this to ward off all attacks, especially those of a new nature. Windows Azure’s automatic applying of security patches will help protect you. If you enable the feature to allow Windows Azure to upgrade your Guest OS VM image, you’ll gain further protections over time automatically. The firewall in SQL Azure Database will help protect your data. You’ll want to run at least 2 servers per farm so that rapidly-issued security patching does not disrupt your application’s availability.

What to do about it: To defend against such attacks, first put the same defenses in place that you would for a web site in your perimeter network, including reliable security, use of mechanisms to defeat automation like CAPTCHA, and coding defensively against abuses such as cross-site scripting attacks. Second, learn what defenses are already built into the Windows Azure platform that you can count on. Third, perform a threat-modeling exercise to identify the possible attack vectors for your solution—then plan and build defenses. Diligent review of your accruing charges will alert you early on should you find yourself under attack and you can alert Microsoft.

Hidden Cost #10: Management
Cloud computing reduces management requirements and labor costs because data centers handle so much for you automatically including provisioning servers and applying patches. It’s also true—but often overlooked—that the cloud thrusts some new management responsibilities upon you. Responsibilities you dare not ignore at the risk of billing surprises.

What are these responsibilities? Regularly monitor the health of your applications. Regularly monitor your billing. Regularly review whether what’s in the cloud still needs to be in the cloud. Regularly monitor the amount of load on your applications. Adjust the size of your deployments to match load.

The cloud’s marvelous IT dollar efficiency is based on adjusting deployment larger or smaller to fit demand. This only works if you regularly perform monitoring and adjustment. Failure to do so can undermine the value you’re supposed to be getting.

What to do about it: Treat your cloud application and cloud data like any resource in need of regular, ongoing management.

• Monitor the state of your cloud applications as you would anything in your own data center.
• Review your billing charges regularly as you would any operational expense.
• Measure the load on your applications and adjust the size of your cloud deployments to match.

Some of this monitoring and adjustment can be automated using the Windows Azure Diagnostic and Service Management APIs. Regardless of how much of it is done by programs or people, it needs to be done.

Managing Hidden Costs Effectively
We’ve exposed many kinds of hidden costs and discussed what to do about them. How can you manage these concerns correctly and effectively without it being a lot of trouble?

1. Team up with experts. Work with a Microsoft partner who is experienced in giving cloud assessments, delivering cloud migrations, and supporting and managing cloud applications operationally. You’ll get the benefits of sound architecture, best practices, and prior experience.

2. Get an assessment. A cloud computing assessment will help you scope cloud charges and migration costs accurately. It will also get you started on formulating cloud computing strategy and policies that will guard against putting applications in the cloud that don’t make sense there.

3. Take advantage of automation. Buy or build cloud governance software to monitor health and cost and usage of applications, and to notify your operations personnel about deployment size adjustment needs or make the adjustments automatically.

4. Get your IT group involved in cloud management. IT departments are sometimes concerned that cloud computing will mean they will have fewer responsibilities and will be needed less. Here’s an opportunity to give IT new responsibilities to manage your company’s responsible use of cloud computing.

5. Give yourself permission to experiment. You probably won't get it exactly right the first time you try something in the cloud. That's okay--you'll learn some valuable things from some early experimentation, and if you stay on top of monitoring and management any surprises will be small ones.

I trust this has shed light on and demystified “hidden costs” of cloud computing and given you a fuller picture of what can affect your Windows Azure billing. In subsequent articles we’ll explore some of these issues more deeply. It is possible to confidently predict and manage your cloud charges. Cloud computing is too valuable to pass by and too important to remain a diamond in the rough.


Kevni said...

I totally agree David! In fact I experienced some of this recently when working with Azure.

As you rightly state, the costs are not really hidden. It is important, though, to do your homework up front, monitor your applications while deployed and review your statement regularly.

Good article!

David Pallmann said...

Check out "Bart Simpson's Guide to Windows Azure" for similar tips, hilariously presented.

Paras said...

hey "you are an original - you know that rite" ;) - my twisted version of cold luke hand dialogue for you and bart simpson guide!

i have been working on Azure for past 70 days now - it's good to know there are guys like you out there who voluntarily 'show a warning post' saying - stop rite there!

Thanks a ton! you saved my dollar's :p

Ido Flatow said...

Regarding polling - indeed this is a problem. Microsoft has a product called Windows Server HPC 2008 R2, which is used for running computational algorithms in clusters. The new version uses Windows Azure nodes, and they have an internal heartbeat between the on-premise head-node and the Azure nodes - this check creates about a million storage transactions a day.
You can read more about it here: