Saturday, August 28, 2010

Threat Modeling the Cloud

If there’s one issue in cloud computing you have to revisit regularly, it’s security. Security concerns, real or imagined, must be squarely addressed in order to convince an organization to use cloud computing. One highly useful technique for analyzing security issues and designing defenses is threat modeling, a security analysis technique long used at Microsoft. Threat modeling is useful in any software context, but is particularly valuable in cloud computing due to the widespread preoccupation with security. It’s also useful because technical and non-technical people alike can follow the diagrams easily. Michael Howard provides a very good walk-through of threat modeling here. At some level this modeling is useful for general cloud scenarios, but as you start to get specific you will need to have your cloud platform in view, which in my case is Windows Azure.

To illustrate how threat modeling works in a cloud computing context, let’s address a specific threat. A common concern is that the use of shared resources in the cloud might compromise the security of your data by allowing it to fall into the wrong hands—what we call Data Isolation Failure. A data isolation failure is one of the primary risks organizations considering cloud computing worry about.

To create our threat model, we’ll start with the end result we’re trying to avoid: data in the wrong hands.

Next we need to think about what can lead to this end result that we don’t want. How could data of yours in the cloud end up in the wrong hands? It seems this could happen deliberately or by accident. We can draw two nodes, one for deliberate compromise and one for accidental compromise; we number the nodes so that we can reference them in discussions. Either one of these conditions is sufficient to cause data to be in the wrong hands, so this is an OR condition. We’ll see later on how to show an AND condition.

Let’s identify the causes of accidental data compromise (1.1). One would be human failure to set the proper restrictions in the first place: for example, leaving a commonly used or easily-guessed database password in place. Another might be a failure on the part of the cloud infrastructure to enforce security properly. Yet another cause might be hardware failure, where a failed drive is taken out of the data center for repair. These and other causes are added to the tree, which now looks like this:

We can now do the same for the deliberately compromised branch (1.2). Some causes include an inside job, which could happen within your business but could also happen at the cloud provider. Another deliberate compromise would be a hacker observing data in transmission. These and other causes could be developed further, but we’ll stop here for now.

If we consider these causes sufficiently developed, we can explore mitigations to the root causes, the bottom leaves of the tree. These mitigations are shown in circles in the diagram below (no mitigation is shown for the “data in transmission observed” node because it needs to be developed further). For cloud threat modeling I like to color code my mitigations to show the responsible party: green for the business, yellow for the cloud provider, red for a third party.

You should not start to identify mitigations until your threat tree is fully developed, or you’ll go down rabbit trails thinking about mitigations rather than threats. Stay focused on the threats. I have deliberately violated this rule just now in order to show why it’s important. At the start of this article we identified the threat we were trying to model as “data in the wrong hands”. That was an insufficiently described threat, and we left out an important consideration: is the data intelligible to the party that obtains it? While we don’t want data falling into the wrong hands under any circumstances, we certainly feel better off if the data is unintelligible to the recipient. The threat tree we have just developed, then, is really a subtree of a threat we can state more completely as: Other parties obtain intelligible data in cloud. The top of our tree now looks like this, with 2 conditions that must both be true. The arc connecting the branches indicates an AND relationship.

The addition of this second condition is crucial, for two reasons. First, failing to consider all of the aspects in a threat model may give you a false sense of security when you haven’t examined all of the angles. More importantly, though, this second condition is something we can easily do something about by having our application encrypt the data it stores and transmits. In contrast we didn't have direct control over all of the first branch's mitigations. Let’s develop the data intelligible side of the tree a bit more. For brevity reasons we’ll just go to one more level, then stop and add mitigations.

Mitigation is much easier in this subtree because data encryption is in the control of the business. The business merely needs to decide to encrypt, do it well, and protect and rotate its keys. Whenever you can directly mitigate rather than depending on another party to do the right thing you’re in a much better position. The full tree that we've developed so far now looks like this.

Since the data intelligible and data in the wrong hands conditions must both be true for this threat to be material, mitigating just one of the branches mitigates the entire threat. That doesn’t mean you should ignore the other branch, but it does mean one of the branches is likely superior in terms of your ability to defend against it. This may enable you to identify a branch and its mitigation(s) as the critical mitigation path to focus on.

While this example is not completely developed I hope it illustrates the spirit of the technique and you can find plenty of reference materials for threat modeling on MSDN. Cloud security will continue to be a hot topic, and the best way to make some headway is to get specific about concerns and defenses. Threat modeling is a good way to do exactly that.

Saturday, August 21, 2010

Hidden Costs in the Cloud, Part 2: Windows Azure Bandwidth Charges

In Part 1 of this series we identified several categories of “hidden costs” of cloud computing—that is, factors you might overlook or underestimate that can affect your Windows Azure bill. Here in Part 2 we’re going to take a detailed look at one of them, bandwidth. We’ll first discuss bandwidth generally, then zoom in on hosting bandwidth and how your solution architecture affects your changes. Lastly, we’ll look at how you can estimate or measure bandwidth using IIS Logs and Fiddler.

Bandwidth (or Data Transfer) charges are a tricky part of the cloud billing equation because they’re harder to intuit. Some cloud estimating questions are relatively easy to answer: How many users do you have? How many servers will you need? How much database storage do you need? Bandwidth, on the other hand, is something you may not be used to calculating. Since bandwidth is something you’re charged for in the cloud, and could potentially outweigh other billing factors, you need to care about it when estimating your costs and tracking actual costs.

Bandwidth charges apply any time data is transferred into the data center or out of the data center. These charges apply to every service in the Windows Azure platform: hosting, storage, database, security, and communications. You therefore need to be doubly aware: not only are you charged for bandwidth, but many different activities can result in bandwidth charges.

In the case of Windows Azure hosting, bandwidth charges apply when your cloud-hosted web applications or web services are accessed. We’re going to focus specifically on hosting bandwidth for the remainder of this article.

An example where Windows Azure Storage costs you bandwidth charges is when a web page contains image tags that reference images in Windows Azure blob storage. Another example is any external program which writes to, reads from, or polls blob, queue, or table storage. There is a very complete article on the Windows Azure Storage Team Blog I encourage you to read that discusses bandwidth and other storage billing considerations: Understanding your Windows Azure Storage Billing.

Bandwidth charges don’t discriminate between people and programs: they apply equally to both human and programmatic usage of your cloud assets. If a user visits your cloud-hosted web site in a browser, you pay bandwidth charges for the requests and responses. If a web client (program) invokes your cloud-hosted web service, you pay bandwidth charges for the requests and responses. If a program interacts with your cloud-hosted database, you pay bandwidth charges for the T-SQL queries and results.

The good news about bandwidth is that you are not charged for it in every situation. Data transfer charges apply only when you cross the data center boundary: That is, something external to the data center is communicating with something in the data center. There are plenty of scenarios where your software components are all in the cloud; in those cases, communication between them costs you nothing in bandwidth charges.

It’s worth taking a look at how this works out in practice depending on the technologies you are using and the architecture of your solutions.

Let’s consider a typical ASP.NET solution, where you have an ASP.NET web site in the cloud whose web services and database are also in the cloud. When a user interacts with your web site, you’re incurring bandwidth charges as your browser sends and receives data to and from the web site. If your site has image or media tags that reference blobs in Windows Azure Storage, you’re also incurring bandwidth charges for accessing them. Fortunately, browser image caching will keep that from getting out of hand. The web site in turn talks to its web services, and those web services in turn interact with a database. There are no bandwidth charges for the web site communication or the database communication because all of the parties are in the data center. In the diagram below the thicker green arrows show interactions that cross the data center boundary and incur bandwidth charges. In contrast the thinner black arrows show interactions that never leave the data center and incur no bandwidth charges.

Bandwidth Profile of an ASP.NET Solution in the Cloud

Now let’s consider nearly the same scenario but this time the front end is a Silverlight application. It’s largely the same composition as the previously described ASP.NET solution, except that when a user browses to the site a Silverlight application is downloaded that then runs locally. As the front end is now running locally on the user’s computer, the bandwidth picture changes. First off, there is less bandwidth consumption from the UI because we are no longer hitting the web server repeatedly to go to different pages: instead, the interaction is all local to the Silverlight application (except for image or media tags that reference blobs in Windows Azure storage). However, there is also more bandwidth consumption from web services because the client, the Silverlight application, is now outside the data center. Whereas web service calls registered no bandwidth charge in the prior scenario, now it’s something you’re paying for. Database interaction continues to incur no bandwidth charges because that’s coming from the web services which are still in the cloud.

Bandwidth Profile of a Silverlight Solution in the Cloud

In a batch application where there is no outside interaction your solution could incur no bandwidth charges whatsoever on a regular basis. Presumably you need to insert data and retrieve results from time to time which is where bandwidth charges will enter into the picture.

Bandwidth Profile of a Batch Application in the Cloud

In a hybrid application where parts of the solution are in the cloud and parts of the solution are on-premise, you’ll incur bandwidth charges for wherever it is you cross the data center boundary. That might mean bandwidth charges for web service calls, database access, or storage access depending on where the on-premise/cloud connection(s) are. In the case where web services and backing database are in the cloud and the consuming clients are on-premise, it’s just web service interaction that will have bandwidth charges.

Bandwidth Profile of a Hybrid Application

Below is a copy of the Windows Azure price sheet for standard monthly consumption-based use of the cloud (note: you should always check the official pricing on in case the pricing model or rates change over time). Looking at the Data Transfer pricing at the bottom, we see that in North America and Europe data transfers cost $0.10/GB into the data center and $0.15/GB out of the data center. In Asia the prices are higher at $0.30/GB in and $0.45/GB out.

Windows Azure Standard Monthly Pricing (as of summer 2010)

The rates we looked at in the previous section are for consumption-based pricing which is month-to-month with no term commitment. However, Windows Azure pricing comes in a few different flavors. For a time commitment, there is subscription pricing where a certain amount of bandwidth may be included; in these cases, you start paying for bandwidth once your usage exceeds that amount.

In addition, there are special offers. For example, the Windows Azure Platform Introductory Special that is being offered at the time of this writing gives you 500MB of in/out data transfer each month at no charge. It’s only when you exceed that usage that you pay bandwidth charges.

Windows Azure Introductory Special (as of summer 2010)

Now that we’ve established the importance of taking bandwidth into account, how can you estimate what your usage will be? This is definitely easier when you are migrating an existing application because you have something you can measure and extrapolate from. Here are two approaches you can use to estimate bandwidth charges in the cloud:

1. Measuring server-side bandwidth. If you are going to be migrating an existing application, you can measure current overall bandwidth usage at the server and extrapolate from there.
2. Estimating client-side bandwidth. If you can measure or estimate the bandwidth of various kinds of client interactions with your application, you can multiply that by expected load to arrive at expected overall bandwidth usage.

If your application is web-oriented and uses Microsoft technologies, there’s a good chance it is IIS hosted. For IIS-hosted applications you can use IIS logs to measure overall application bandwidth. For other kinds of applications you’ll need to see if a similar facility is available or investigate using a server-side network monitoring tool.

You can control logging from IIS Configuration Manager. In the IIS / Logging area you can set up and configure logging. There are various formats, schedules, and fields you can select for logging. If you’re using IIS defaults, you’re probably set up for W3C format logs and the output fields don’t include bandwidth counts. To change that, click Select Fields and ensure Bytes Sent (sc-bytes) and Bytes Received (cs-bytes) are selected. While you’re setting up logging the way you want, also note the location where the log files are written to.

With logging set up to capture bytes send and bytes received, you’ll be collecting the raw data being captured from which you can measure bandwidth. Once you have some of these log data, take a look at a log file and verify the bytes sent and received are being tracked. With the W3C format, you’ll see a text file similar to the listing below where there is a text line of values for each web request/response. Depending on the fields you’ve selected the text lines may be very long. The line beginning with #Fields gives you the legend to the data on each line. In the case of the example shown, the sc-bytes and cs-bytes fields are the next to last values.

#Software: Microsoft Internet Information Services 7.5
#Version: 1.0
#Date: 2010-08-21 14:23:47
#Fields: date time cs-method cs-uri-stem cs-uri-query s-port sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
2010-08-21 14:23:47 GET / - 80 - 200 0 0 936 565 250
2010-08-21 14:23:49 GET /welcome.png - 80 - 200 0 0 185196 386 1531
2010-08-21 14:23:49 GET /favicon.ico - 80 - 404 0 64 0 333 31
2010-08-21 14:23:54 GET /myapp.aspx - 80 - 404 0 0 1749 591 375
2010-08-21 14:24:00 GET /myapp - 80 - 401 2 5 1509 586 140
2010-08-21 14:24:00 GET /myapp - 80 301 0 0 691 3269 78
2010-08-21 14:24:00 GET /myapp/ - 80 200 0 0 3570 3270 312
2010-08-21 14:24:00 GET /myapp/Silverlight.js - 80 200 0 0 8236 3116 281
2010-08-21 14:24:01 GET /favicon.ico - 80 404 0 2 1699 3016 62
2010-08-21 14:24:04 GET /myapp/ClientBin/myapp.xap - 80 200 0 0 846344 3062 2625
2010-08-21 14:24:13 POST /myapp/mysvc.svc/mysvc.svc - 80 200 0 0 752 3461 359
2010-08-21 14:24:13 POST /myapp/mysvc.svc/mysvc.svc - 80 200 0 0 3262 3513 62
2010-08-21 14:24:13 POST /myapp/mysvc.svc/mysvc.svc - 80 200 0 0 6314 3453 250
2010-08-21 14:24:15 POST /myapp/mysvc.svc/mysvc.svc - 80 200 0 0 846 3564 1281
2010-08-21 14:24:15 POST /myapp/mysvc.svc/mysvc.svc - 80 200 0 0 847 3565 718
2010-08-21 14:24:30 POST /myapp/mysvc.svc/mysvc.svc - 80 200 0 0 834 3647 14609
2010-08-21 14:24:31 POST /myapp/mysvc.svc/mysvc.svc - 80 200 0 0 8028 3461 218

Next we need to sum the sc-bytes and cs-bytes values so we know overall bandwidth for the period of the log. We can do this using a utility that parses IIS logs. The one I use is called Log Parser and is a free download from Microsoft. With an IIS log parsing utility, we can find out our overall bandwidth.

LogParser.exe "SELECT SUM(cs-bytes),SUM(sc-bytes) FROM u_ex10082114.log"
SUM(ALL cs-bytes) SUM(ALL sc-bytes)
----------------- -----------------
683352 1265207

Elements processed: 202
Elements output: 1
Execution time: 0.00 seconds

Be mindful of the solution architecture discussion earlier in this article: it’s possible some of the bandwidth you’re measuring will be charged for in the cloud and some will not. If that’s the case, you’re going to need to parse your log files with selection filters to find the subset of bandwidth you would be charged for.

IIS logs are equally useful for monitoring bandwidth once your applications have been deployed to the cloud. Karsten Januszewski’s article Downloading and Parsing IIS Logs from Windows Azure explains how to do this.

Sometimes it’s easier to look at bandwidth from the client-side. If you can measure or estimate the bandwidth of an individual client session, you can multiply that by the expected load to arrive at overall bandwidth. If your application already exists and is web-based, you can measure the bandwidth of client-side interactions using the popular Fiddler tool (described in detail in the following section).

If you do this too coarsely, the information won’t be valuable. You need to consider usage patterns for your application and their frequency. Ask yourself what the different kinds of user are, what tasks they perform, and what interaction that entails. Once you have bandwidth figures for the various usage patterns, multiply them by the expected number of users per month for each pattern.

Whether you are measuring client bandwidth or estimating it, you need to consider the major usage patterns for your application. How many kinds of user are there and what tasks and scenarios do they perform?

In the example analysis below, Fiddler was used to measure the number of requests, bytes sent, and bytes received for various tasks for a Silverlight-based training portal. Next the number of user sessions per month for each task was estimated. Multiplying session bandwidth by session count gives us total expected in and out bandwidth. Although the numbers are looking large at this point, we’re only charged pennies per gigabyte. When we round up the number of in and out gigabytes and multiply by $0.10/GB in, $0.15/GB out, we have our final figure—a mere $9.70/month. Although the bandwidth charge is low in this particular example, you can’t assume that will always be the case.

You will want to use similar techniques to estimate bandwidth for other uses of the cloud, including storage, database, security, and communication services. Once deployed, you would want to inspect your monthly bill and see how well actual bandwidth aligns with predicted bandwidth. If you see a large difference, something wasn’t taken into account.

The Fiddler tool can be used to measure the bandwidth of client interactions with a web application. To measure bandwidth with Fiddler, follow these steps:

1. Identify a Concrete Measurement Objective.

Have a clear idea of what it is you are going to measure:

• Page or Session? Are you measuring a single web page access, or a complete session where the user will be navigating to a site and then interacting with it?
• User Role and Intended Tasks. If you are measuring a session, what roles is the user in and what are they intending to accomplish? You’ll want to log that information along with the measurements you make. It may be helpful to create a script, the list of steps you will perform on the web site.
• First-time or Repeat Visit? Is this a first-time visit, or a repeat visit? This is important because the first-time visit scenario doesn’t benefit from browser caching of temporary files.

2. Ensure Proper Starting Conditions.

We don’t want to taint our results or our conclusions, so it’s important to have the right starting conditions.

a. Browser Caching. If you are measuring the repeat visit (cached) scenario, you need to have previously visited the site in the same way you will be using it now. If on the other hand you are measuring the first-time (non-cached) scenario, clear out your browser cache. In Internet Explorer 8, you do this by selecting Tools > Internet Options, clicking the Delete… button, specifying what to delete (just Temporary Internet Files), and clicking Delete.

b. Close Browser Windows and Web Applications. Close down any existing browser windows or Web-active applications as we don’t want to include unintended Internet traffic in our measurements.

c. New Browser Instance. Launch a new browser instance.

3. Prepare Fiddler.

a. Launch Fiddler. Bring up Fiddler. You’ll see a large Web Sessions window on the left side of the display and a tabbed detail area to the right.

b. Clear Previous Sessions. If the Web Sessions window isn’t empty, select everything in it (you can use Ctrl+A) and press the Delete button. If you see ongoing activity and the window doesn’t stay empty, something is wrong and you have something still running that is performing web traffic. Hunt it down, shut it down, and return to this step.

c. View Statistics. From the menu, select View > Statistics (or use the shortcut key, F7).

4. Perform Web Activity.

Perform the web activity you want to measure. Do this carefully, so that you are including everything you want to but nothing superfluous. For a measurement of a web page access, simply navigate to the page and let it load in your browser. For a session, navigate to the web site and then start interacting as planned. Note: If you are trying to gauge activity for a web site that doesn’t actually exist yet, find a web property you feel is similar in terms of content density and navigation and use that as a rough gauge.

5. Capture Results.

a. Return to Fiddler. You should see data in the Web Sessions window reflecting your web activity (if you don’t, check File > Capture Traffic is checked in the menu).

b. Select All Activity. In the Web Sessions window, select all (Ctrl+A). This will give you summarized statistics for all of your web activity in the Statistics tab at right.

c. Capture Bandwidth Results. Select the entire Statistics window content, copy to the clipboard, and paste into Word or Excel where you can save it. Right at the top is key bandwidth information: the number of requests, the number of bytes sent, and the number of bytes received.

d. Capture Bandwidth Breakdown by Content Type. While it’s useful to know the bandwidth in terms of size, it’s also important to understand how that bandwidth usage breaks down. Click on the Show Chart link at the bottom of the Statistics page and Fiddler will show you the breakdown by content type along with a chart. As in the previous step you can select, copy and paste the textual information. To copy the chart, click the Copy this Chart link on the bottom of the window. This can be very revealing: in the example below, we can see that images and JavaScript are taking up the lion’s share of the bandwidth. You may be able to optimize the bandwidth consumption of your application based on this information—for example reducing your images to smaller format and resolution.

e. Get Additional Information from Fiddler. Taking the time to learn more about Fiddler will allow you to gain deeper insights into not only the size of your bandwidth but its nature.

Bandwidth may or not be a large factor in your Windows Azure billing. The magnitude of bandwidth charges depends on just one thing—how much data you pass in and out of the data center—but there are many factors that determine that: the number of cloud services you use, the architecture of your solution, the efficiency and chattiness of your interactions, usage patterns, and load.

Tuesday, August 17, 2010

The Enigma of Private Cloud

If you swim in cloud computing circles you cannot escape hearing the term private cloud. Private cloud is surely the feature most in demand by the cloud computing market—yet perhaps the longest in coming, as cloud computing vendors have gone from initial resistance to the idea to coming to terms with the need for it and figuring out how to deliver it. The concept is something of a paradox, made worse by the fact that private cloud definitely means different things to different people. There are at least 5 meanings of private cloud in use out there, and none of them are similar. Despite all this, the market pressure for private cloud is so great that cloud computing vendors are finding ways to deliver private cloud anyway. Let’s take a deeper look at what’s going on here.

What’s Behind The Demand For Private Cloud?
The desire for private cloud is easy enough to appreciate. Organizations are enamored with the benefits of cloud computing but don’t like certain aspects of it, such as the loss of direct control over their assets or sharing resources with other tenants in the cloud. This is where the paradox comes in, because management by cloud data centers and shared resources are core to what cloud computing is and why its costs are low. The market isn’t required to be logical or think through the details, however, and when there’s sufficient demand vendors find ways to innovate. Thus, while private cloud may seem at odds with the general premise of cloud computing, it turns out we need it and will have it.

There are some other drivers behind the need for private cloud that are hard to get around. Governments may have requirements for physical control of data that simply cannot be circumvented. In some countries there are regulations that business data must be kept in the country of origin. Another influence is the future dream of things working the same way in both the cloud and the enterprise. When that day comes, solutions won’t have to be designed differently for one place or the other and enterprises will be able to move assets between on-premise and cloud effortlessly.

Defining Private Cloud
How then is private cloud to be brought about? This is where we get into many different ideas about what private cloud actually is. My pet peeve is people who use the term private cloud without bothering to define what they mean by it. Let’s take a look at understandings that are in widespread use.

1. LAN Private Cloud
Some people use private cloud to simply mean their local network, similar to how the Internet can be referred to as the cloud without any specific reference to cloud computing proper. This use of the term is rather non-specific so we can’t do much with it. Let’s move on.

2. Gateway Private Cloud
This use of private cloud centers on the idea of securely connecting your local network to your assets in the cloud. Amazon’s Virtual Private Cloud is described as “a secure and seamless bridge between a company’s existing IT infrastructure and the AWS cloud” which “connects existing infrastructure to isolated resources in the cloud through a VPN connection.” In the Windows Azure world, Microsoft is working on something in this category called Project Sydney. Sydney was mentioned at PDC 2009 last year but until it debuts we won’t know how similar or different it will be to the Amazon VPC approach. Stay tuned.

This type of private cloud is valuable for several reasons. It potentially lets you use your own network security and operations monitoring infrastructure against your assets in the cloud. It potentially lets your cloud assets access something on your local network they need such as a server that you can’t or won’t put in the cloud.

3. Dedicated Private Cloud
In this flavor of private cloud you are using a cloud computing data center where an area of it is dedicated for just your use. From this you get the benefits you’re used to in the cloud such as automated provisioning and management and elasticity, but the comfort of isolation from other tenants.

Microsoft Online Services has offered this kind of private cloud with a dedicated version of the Business Productivity Online Suite (“BPOS-D”) for customers with a large enough footprint to qualify.

It seems axiomatic that dedicated private cloud will always be more expensive than shared use of the cloud.

4. Hardware Private Cloud
In hardware private cloud, cutting edge infrastructure like that used in cloud computing data centers is made available for you to use on-premise. Of course there’s not only hardware but software as well. Microsoft’s recent announcement of the Windows Azure Appliance is in this category.

The nature of hardware private cloud makes it expensive and therefore not for everybody, but it is important that this kind of offering exist. First, it should allow ISPs to offer alternative hosting locations for the Windows Azure technology in the marketplace. Secondly, this allows organizations that must have data on their premises, such as some government bodies, to still enjoy cloud computing. Third, this solves the “data must stay in the country of origin” problem which is a significant issue in Europe.

Is there something like the hardware private cloud that’s a bit more affordable? There is, our next category.

5. Software Private Cloud
Software private cloud emulates cloud computing capabilities on-premise such as storage and hosting using standard hardware. While this can’t match all of the functionality of a true cloud computing data center, it does give enterprises a way to host applications and store data that is the same as in the cloud.

An enterprise gets some strong benefits from software private cloud. They can write applications one way and run them on-premise or in the cloud. They can effortlessly move assets between on-premise and cloud locales easily and reversibly. They can change their split between on-premise and cloud capacity smoothly. Lock-in concerns vanish. One other benefit of a software private cloud offering is that it can function as a QA environment—something missing right now in Windows Azure.

We don’t have software private cloud in Windows Azure today but there’s reason to believe it can be done. Windows Azure developers already have a cloud simulator called the Dev Fabric; if the cloud can be simulated on a single developer machine, why not on a server with multi-user access? There’s also a lot of work going on with robust hosting in Windows Server AppFabric and perhaps the time will come when the enterprise and cloud editions of AppFabric will do things the same way. Again, we’ll have to stay tuned and see.

Should I Wait for Private Cloud?
You may be wondering if it’s too soon to get involved with cloud computing if private cloud is only now emerging and not fully here yet. In my view private cloud is something you want to take into consideration—especially if you have a scenario that requires it—but is not a reason to mothball your plans for evaluating cloud computing. The cloud vendors are innovating at an amazing pace and you’ll have plenty of private cloud options before you know it. There are many reasons to get involved with the cloud early: an assessment and proof-of-concept now will bring insights from which you can plan your strategy and roadmap for years to come. If the cloud can bring you significant savings, the sooner you start the more you will gain. Cloud computing is one of those technologies you really should get out in front of: by doing so you will maximize your benefits and avoid improper use.

There you have it. Private cloud is important, both for substantive reasons and because the market is demanding it. The notion of private cloud has many interpretations which vary widely in nature and what they enable you to do. Vendors are starting to bring out solutions, such as the Windows Azure Appliance. We’ll have many more choices a year from now, and then the question will turn from “when do I get private cloud” to “which kind of private cloud should we be using?”

And please, if you have private cloud fever: please explain which kind you mean!

Upcoming Cloud Computing for Public Sector Webcast

I'll be giving a webcast on Microsoft Cloud Computing and why it makes business sense for Public Sector on Wednesday August 18 from 10-11a PT.

10 Reasons to use Microsoft's Cloud Computing Strategy in Public Sector

Neudesic’s David Pallmann discusses why cloud computing is compelling from a business perspective and in addition how it can be a high value platform in the Public Sector. We examine why cloud computing on the Microsoft platform is fiscally responsible, puts costs under control, and allows you to spend your I.T. dollars more efficiently. The discussion will include how to compute your monthly charges and how to determine the ROI on migrating existing applications to the cloud.

Upcoming Radio Talk on Windows Azure

On August 18th 5-6p PT I'll be joining David Lynn and Ed Walters of Microsoft on the radio for the Computer Outlook program to discuss Microsoft Cloud Computing. We'll be focusing on Windows Azure and a customer example.

Saturday, August 14, 2010

Hidden Costs in the Cloud, Part 1: Driving the Gremlins Out of Your Windows Azure Billing

grem•lin (ˈgrɛm lɪn) –noun
1. a mischievous invisible being, said by airplane pilots in World War II to cause engine trouble and mechanical difficulties.
2. any cause of trouble, difficulties, etc.

Cloud computing has real business benefits that can help the bottom line of most organizations. However, you may have heard about (or directly experienced) cases of sticker shock where actual costs were higher than expectations. These episodes are usually attributed to “hidden costs” in the cloud which are sometimes viewed as gremlins you can neither see nor control. Some people are spooked enough by the prospect of hidden costs to question the entire premise of cloud computing. Is using the cloud like using a slot machine, where you don’t know what will come up and you’re usually on the losing end?

These costs aren’t really hidden, of course: it’s more that they’re overlooked, misunderstood, or underestimated. In this article series we’re going to identify these so-called “hidden costs” and shed light on them so that they’re neither hidden nor something you have to fear. We’ll be doing this specifically from a Windows Azure perspective. In Part 1 we’ll review “hidden costs” at a high level, and in subsequent articles we’ll explore some of them in detail.

Hidden Cost #1: Dimensions of Pricing
In my opinion the #1 hidden cost of cloud computing is simply the number of dimensions there are to the pricing model. In effect, everything in the cloud is cheap but every kind of service represents an additional level of charge. To make it worse, as new features and services are added to the platform the number of billing considerations continues to increase.

As an example, let’s consider that you are storing some images for your web site in Windows Azure blob storage. What does this cost you? The answer is, it doesn’t cost you very much—but you might be charged in as many as 4 ways:

• Storage fees: Storage used is charged at $0.15/GB per month
• Transaction fees: Accessing storage costs $0.01 per 10,000 transactions
• Bandwidth fees: Sending or receive storage in and out of the data center costs $0.10/GB in, $0.15/GB out
• Content Delivery Network: If you are using this optional edge caching service, you are also paying an additional $0.15/GB and an additional $0.01 per 10,000 transactions

You might still conclude these costs are reasonable after taking everything into account, but this example should serve to illustrate how easily you can inadvertently leave something out in your estimating.

What to do about it: You can guard against leaving something out in your calculations by using tools and resources that account for all the dimensions of pricing, such as the Windows Azure TCO Calculator, Neudesic’s Azure ROI Calculator, or the official Windows Azure pricing information. With a thorough checklist in front of you, you won’t fail to consider all of the billing categories. Also make sure that anything you look at is current and up to date as pricing formulas and rates can change over time. You also need to be as accurate as you can in predicting your usage in each of these categories.

Hidden Cost #2: Bandwidth
In addition to hosting and storage costs, your web applications are also subject to bandwidth charges (also called data transfer charges). When someone accesses your cloud-hosted web site in a web browser, the requests and responses incur data transfer charges. When an external client accesses your cloud-hosted web service, the requests and responses incur data transfer charges.

Bandwidth is often overlooked or underappreciated in estimating cloud computing charges. There are several reasons for this. First, it’s not something we’re used to having to measure. Secondly, it’s less tangible than other measurements that tend to get our attention, such as number of servers and amount of storage. Thirdly, it’s usually down near the bottom of the pricing list so not everyone may notice it or pay attention to it. Lastly, it’s nebulous: many have no idea what their bandwidth use is or how they would estimate it.

What to do about it: You can model and estimate your bandwidth using tools like Fiddler, and once running in the cloud you can measure actual bandwidth using mechanisms such as IIS logs. With a proper analysis of bandwidth size and breakdown, you can optimize your application to reduce bandwidth.

You can also exercise control over bandwidth charges through your solution architecture: you aren’t charged for bandwidth when it doesn’t cross in or out of the data center. For example, a web application in the cloud calling a web service in the same data center doesn’t incur bandwidth charges.

Hidden Cost #3: Leaving the Faucet Running
As a father I’m constantly reminding my children to turn off the lights and not leave the faucets running: it costs money! Leaving an application deployed that you forgot about is a surefire way to get a surprising bill. Once you put applications or data into the cloud, they continue to cost you money, month after month, until such time as you remove them. It’s very easy to put something in the cloud and forget about it.

What to do about it: First and foremost, review your bill regularly. You don’t have to wait until end of month and be surprised: your Windows Azure bill can be viewed online anytime to see how your charges for the month are accruing. Secondly, make it someone’s job to regularly review that what’s in the cloud still needs to be there and that costs are in line with expectations. Set expiration dates or renewal review dates for your cloud applications and data. Be proactive in recognizing the faucet has been left running before the problem reaches flood levels.

Hidden Cost #4: Compute Charges Are Not Based on Usage
If you put an application in the cloud and no one uses it, does it cost you money? Well, if a tree falls in the forest and no one is around to hear, does it make a sound? The answer to both questions is yes. Since the general message of cloud computing is consumption-based pricing, some people assume their hourly compute charges are based on how much their application is used. It’s not the case: hourly charges for compute time do not work that way in Windows Azure. Rather, you are reserving machines and your charges are based on wall clock time per core. Whether those servers are very busy, lightly used, or not used at all doesn’t affect this aspect of your bill. Where consumption-based pricing does enter the picture is in the number of servers you need to support your users, which you can increase or decrease at will. There are other aspects of your bill that are charged based on direct consumption such as bandwidth.

What to do about it: Understand what your usage-based and non-usage-based charges will be, and estimate costs accurately. Don’t make the mistake of thinking an unused application left in the cloud is free—it isn’t.

Hidden Cost #5: Staging Costs the Same as Production
If you deploy an application to Windows Azure, it can go in one of two places: your project’s Production slot or its Staging slot. Many have mistakenly concluded that only Production is billed for when in fact Production and Staging are both charged for, and at the same rates.

What to do about it: Use Staging as a temporary area and set policies that anything deployed there must be promoted to Production or shut down within a certain amount of time. Give someone the job of checking for forgotten Staging deployments and deleting them—or even better, automate this process.

Hidden Cost #6: A Suspended Application is a Billable Application
Applications deployed to Windows Azure Production or Staging can be in a running state or a suspended state. Only in the running state will an application be active and respond to traffic. Does this mean a suspended application does not accrue charges? Not at all—the wall clock-based billing charges accrue in exactly the same way regardless of whether your application is suspended or not.

What to do about it: Only suspend an application if you have good reason to do so, and this should always be followed by a more definitive action such as deleting the deployment or upgrading it and starting it up. It doesn’t make any sense to suspend a deployment and leave it in the cloud: no one can use it and you’re still being charged for it.

Hidden Cost #7: Seeing Double
Your cloud application will have one more software tiers, which means it is going to need one or more server farms. How many servers will you have in each farm? You might think a good answer is 1, at least when you’re first starting out. In fact, you need a minimum of 2 servers per farm if you want the Windows Azure SLA to be upheld, which boils down to 3 9’s of availability. If you’re not aware of this, your estimates of hosting costs could be off by 100%!

The reason for this 2-server minimum is how patches and upgrades are applied to cloud-hosted applications in Windows Azure. The Fabric that controls the data center has an upgrade domain system where updates to servers are sequenced to protect the availability of your application. It’s a wonderful system, but it doesn’t do you any good if you only have 1 server.

What to do about it: If you need the SLA, be sure to plan on at least 2 servers per farm. If you can live without the SLA, it’s fine to run a single server assuming it can handle your user load.

Hidden Cost #8: Polling
Polling data in the cloud is a costly activity. If you poll a queue in the enterprise and the queue is empty, this does not explicitly cost you money. In the cloud it does, because simply attempting to access storage (even if the storage is empty) is a transaction that costs you something. While an individual poll doesn’t cost you much—only $0.01 per 10,000 transactions—it will add up to big numbers if you’re doing it repeatedly.

What to do about it: Either find an alternative to polling, or do your polling in a way that is cost-efficient. There is an efficient way to implement polling using an algorithm that varies the sleep time between polls based on whether any data has been seen recently. When a queue is seen to be empty the sleep time increases; when a message is found in the queue, the sleep time is reduced so that message(s) in the queue can be quickly serviced.

Hidden Cost #9: Unwanted Traffic and Denial of Service Attacks
If your application is hosted in the cloud, you may find it is being accessed by more than your intended user base. That can include curious or accidental web users, search engine spiders, and openly hostile denial of service attacks by hackers or competitors. What happens to your bandwidth charges if your web site or storage assets are being constantly accessed by a bot?

Windows Azure does have some hardening to guard against DOS attacks but you cannot completely count on this to ward off all attacks, especially those of a new nature. Windows Azure’s automatic applying of security patches will help protect you. If you enable the feature to allow Windows Azure to upgrade your Guest OS VM image, you’ll gain further protections over time automatically. The firewall in SQL Azure Database will help protect your data. You’ll want to run at least 2 servers per farm so that rapidly-issued security patching does not disrupt your application’s availability.

What to do about it: To defend against such attacks, first put the same defenses in place that you would for a web site in your perimeter network, including reliable security, use of mechanisms to defeat automation like CAPTCHA, and coding defensively against abuses such as cross-site scripting attacks. Second, learn what defenses are already built into the Windows Azure platform that you can count on. Third, perform a threat-modeling exercise to identify the possible attack vectors for your solution—then plan and build defenses. Diligent review of your accruing charges will alert you early on should you find yourself under attack and you can alert Microsoft.

Hidden Cost #10: Management
Cloud computing reduces management requirements and labor costs because data centers handle so much for you automatically including provisioning servers and applying patches. It’s also true—but often overlooked—that the cloud thrusts some new management responsibilities upon you. Responsibilities you dare not ignore at the risk of billing surprises.

What are these responsibilities? Regularly monitor the health of your applications. Regularly monitor your billing. Regularly review whether what’s in the cloud still needs to be in the cloud. Regularly monitor the amount of load on your applications. Adjust the size of your deployments to match load.

The cloud’s marvelous IT dollar efficiency is based on adjusting deployment larger or smaller to fit demand. This only works if you regularly perform monitoring and adjustment. Failure to do so can undermine the value you’re supposed to be getting.

What to do about it: Treat your cloud application and cloud data like any resource in need of regular, ongoing management.

• Monitor the state of your cloud applications as you would anything in your own data center.
• Review your billing charges regularly as you would any operational expense.
• Measure the load on your applications and adjust the size of your cloud deployments to match.

Some of this monitoring and adjustment can be automated using the Windows Azure Diagnostic and Service Management APIs. Regardless of how much of it is done by programs or people, it needs to be done.

Managing Hidden Costs Effectively
We’ve exposed many kinds of hidden costs and discussed what to do about them. How can you manage these concerns correctly and effectively without it being a lot of trouble?

1. Team up with experts. Work with a Microsoft partner who is experienced in giving cloud assessments, delivering cloud migrations, and supporting and managing cloud applications operationally. You’ll get the benefits of sound architecture, best practices, and prior experience.

2. Get an assessment. A cloud computing assessment will help you scope cloud charges and migration costs accurately. It will also get you started on formulating cloud computing strategy and policies that will guard against putting applications in the cloud that don’t make sense there.

3. Take advantage of automation. Buy or build cloud governance software to monitor health and cost and usage of applications, and to notify your operations personnel about deployment size adjustment needs or make the adjustments automatically.

4. Get your IT group involved in cloud management. IT departments are sometimes concerned that cloud computing will mean they will have fewer responsibilities and will be needed less. Here’s an opportunity to give IT new responsibilities to manage your company’s responsible use of cloud computing.

5. Give yourself permission to experiment. You probably won't get it exactly right the first time you try something in the cloud. That's okay--you'll learn some valuable things from some early experimentation, and if you stay on top of monitoring and management any surprises will be small ones.

I trust this has shed light on and demystified “hidden costs” of cloud computing and given you a fuller picture of what can affect your Windows Azure billing. In subsequent articles we’ll explore some of these issues more deeply. It is possible to confidently predict and manage your cloud charges. Cloud computing is too valuable to pass by and too important to remain a diamond in the rough.

Thursday, August 12, 2010

Every Penny Counts in the Cloud

There are many potential benefits of cloud computing that an organization can evaluate, but my favorite one to talk about is the efficient use of your I.T. dollars that is made possible by the elasticity of the cloud. This is easily explained through 2 simple diagrams that you can jot down on a nearby whiteboard or napkin whenever the opportunity arises. There are more elaborate versions of these diagrams out there in slide decks, but you can tell the story effectively just by drawing a couple of lines and narrating them.

The first diagram has a wavy line showing changing application load over time. To this we add a staircase which depicts cycles of hardware purchases (the vertical parts of the staircase) and time passing (the horizontal parts of the staircase). What this shows is that companies are forced to buy more capacity than they really need just to be on the safe side—in effect, spending more money than is necessary or spending it earlier than necessary. Even worse, if there is an unexpected change in load an undersupply situation can result where there is insufficient capacity. This arrangement is what we’re used to in the enterprise—but it is hardly ideal.

The second diagram has an identical wavy line showing changing application load over time. Instead of being joined by a staircase, the wavy line is paralleled by another wavy line that is nearly identical. This second line is the size of your footprint in the cloud. It is dialed larger or smaller in accordance with the load you measure. Anyone can see quite readily how more efficient this is.

Does this superior arrangement the cloud makes possible come at a price? Yes it does: you need to monitor your application regularly and expand or reduce your deployment in the cloud accordingly. Failure to do this will undermine the financial premise of using the cloud. Today this monitoring and adjusting is not something Windows Azure does for you as an out-of-box feature. However, you can achieve the monitoring and adjusting programatically via the Windows Azure Diagnostic and Service Management APIs.

Tuesday, August 10, 2010

Dynamically Computing Web Service Addresses for Azure-Silverlight Applications

If you work with Silverlight or with Windows Azure you can experience some pain when it comes to getting your WCF web services to work at first. If you happen to be using both Silverlight and Azure together you might feel this even more.

One troublesome area is the Silverlight client knowing the address of its Azure web service. Let’s assume for this discussion that your web service is a WCF service that you’ve defined in a .svc file (“MyService.svc”) that resides in the same web role that hosts your Silverlight application. What’s the address of this service? There are multiple answers depending on the context you’re running from:
  1. If you’re running locally, your application is running under the Dev Fabric (local cloud simulator), and likely has an address similar to
  2. If you run the web project outside of Windows Azure, it will run under the Visual Studio web server and have a different address of the form http://localhost:<port>/MyService.svc.
  3. If you deploy your application to a Windows Azure project’s Staging slot, it will have an address of the form http://<guid> Moreover, the generated GUID part of the address changes each time you re-deploy.
  4. If you deploy your application to a Windows Azure project’s Production slot, it will have the chosen production address for your cloud service of the form http://<project>
One additional consideration is that when you perform an Add Service Reference to your Silverlight project the generated client configuration will have a specific address. All of this adds up to a lot of trouble each time you deploy to a different environment. Fun this is not.

Is there anything you can do to make life easier? There is. With a small amount of code you can dynamically determine the address your Silverlight client was invoked at, and from there you can derive the address your service lives at.

You can use code similar to what’s shown below to dynamically compute the address of your service. To do so, you’ll need to make some replacements to the MyService = … statement:
1. Replace “MyService.MyServiceClient” with the namespace and class name of your generated service proxy client class.
2. Replace “CustomBinding_MyService” with the binding name used in the generated client configuration file for your service endpoint (ServiceReferences.ClientConfig).
3. Replace “MyService.svc” with the name of your service .svc file.

string hostUri = System.Windows.Browser.HtmlPage.Document.DocumentUri.AbsoluteUri;
int pos = hostUri.LastIndexOf('/');
if (pos != -1)
    hostUri = hostUri.Substring(0, pos);
MyService.MyServiceClient MyService = new MyService.MyServiceClient("CustomBinding_MyService", hostUri + "/MyService.svc");

With this in place you can effortlessly move your application between environments without having to change addresses in configuration files. Note this is specific to “traditional” WCF-based web services only. If you’re working with RIA Services, the client seems to know its service address innately and you should not need to worry about computing web service addresses.