Wednesday, April 29, 2009

Azure Best Practice #4: REST is In, SOAP is Out

Best Practice #4 is to favor REST over SOAP, except for those scenarios that demand SOAP.

When it comes to cloud computing platforms, it's a RESTful world and Azure is no exception. The vast majority of the services in the platform can only be accessed via REST (even if you're using .NET library code, REST is what's happening under the hood). SOAP isn't completely out of the picture, however. For functionality that depends on WS-* protocols (such as federated security) Azure does use SOAP.

Which should you use for your own Azure-hosted web services, REST or SOAP? It makes sense to emulate what the platform itself does and favor REST, except where you must use SOAP. Using SOAP makes sense for compatibility reasons or when you need WS-* functionality. When these conditions don't apply, a RESTful interface is recommended. It's simpler and more in line with "how we do things in the cloud."

The popularity of REST is due to its simplicity and its resource-oriented nature while SOAP has become cumbersome in the opinion of many. There is a good discussion of whether SOAP and REST are complements or competitors on David Chappell's blog.

Azure Best Practice #3: What's Good for SOA is Good for the Cloud

Azure best practice #3 is to apply SOA principles to your cloud applications. The core ideas of SOA apply equally strongly to the cloud:

  • Software components are loosely coupled.
  • Message-based programs do a lot of the work.
  • Boundaries of communication/reliability/security/transactions are a key consideration in solution architecture.
  • Use of standard protocols and message formats provides broad interoperability.
  • Stateless service development is encouraged which facilitates easy load balancing.
  • Communication is presumed to be expensive.
  • Coarse-grained interfaces are preferred over chatty interfaces.

This isn't a recommendation to use SOAP, however. See Best Practice #4 for a discussion of SOAP vs. REST.

Azure Best Practice #2: Keep Code and Data Close Together

Azure best practice #2 is to keep code and the data it needs close to each other. If the code lives in the enterprise, so should the data it frequently accesses. If the code lives in a cloud data center, that's where its data should be also--and in the same geo-location.

This best practice is simple common sense: going between the enterprise and the cloud over the Internet is not terribly fast, so you want to keep it to a minimum.

This is not an absolute rule, since you may have a perfectly legitimate reason to interconnect enterprise code with cloud data or vice-versa--but you should avoid this kind of cross-traffic when it isn't necessary.

You can use the affinity group setting in the Azure portal to ensure related hosting and storage projects are running in the same geo-location.

Example: you have a cloud-hosted web site and you need to store user profile data. The best location for that user profile data is cloud storage.

Tuesday, April 28, 2009

Azure Best Practice #1: Always Run at Least 2 Instances of Any Role

Some best practices for the Azure cloud computing platform are starting to emerge, and I'll be blogging them as I discover them. I expect the sources for these will be a combination of Microsoft guidance, personal experience, and experiences shared by others in the community.

Best Practice #1 is to always run at least 2 instances of any role (this applies to both web roles and worker roles). There's a very important reason for this: your application may be highly unavailable if you fail to do so. Why? Because Azure is constantly upgrading your instances with the latest OS patches. From what I understand talking to people on the product teams it's doing this very, very frequently. When you have 2 or more instances of a role running, Azure is careful to keep some of them running through the concept of upgrade domains; but when you're running a single instance you're going to have periods where you can't access your application.

I first encountered this issue when I wrote the Lifetracks demo application last year around Thanksgiving, though I didn't recognize the problem for what it was at first. Lifetracks would work after deploying it, but I noticed after a few days it would be "down" and I'd have to re-deploy it. In February I was at a Microsoft event and mentioned this behavior, which I had chalked up to instability in the platform. When I received the advice to run more instances I was skeptical this would make any difference, but I'm happy to report Lifetracks has stayed up from the moment I did so, over 10 weeks ago. So I can attest this is a best practice from direct personal experience.

Some additional evidence is the 22-hour outage the Azure platform experienced in March 2009. If you read Microsoft's analysis of the problem, you'll note that single-instance applications were primarily affected while multiple-instance applications continued to be available. The guidance to run at least 2 instances is stated there.

Monday, April 27, 2009


Is it too soon to be talking about design patterns for Azure--a platform that hasn't been released yet? I don't think so: we already have oodles of functionality in the platform and patterns help us think about them and how they can be combined. And it's especially important during this pre-release preview period that we in the community confirm the platform gives us good, well-thought out patterns.

And so, is born.

Posted on the site currently is my initial enumeration of 14 foundation patterns for hosting, data, communication/sync, and security--with more to come. Over time I'll be providing a detail page on each pattern.

Coming soon: composite application patterns.

Neudesic Grid Computing Framework released

I'm please to announce the release of Azure Grid, the community edition of the Neudesic Grid Computing Framework. Azure Grid is available on CodePlex and includes source code.

Azure Grid provides a solution template and base classes for easily creating grid computing applications that execute on the Azure platform. It also includes a GUI for starting and monitoring job runs.

I've already blogged extensively about grid computing and Azure Grid in my 3-part article series on Azure grid computing:

Part 1: A Design Pattern for Grid Computing on Azure

Part 2: Coding an Azure Grid Application

Part 3: Running an Azure Grid Application

Saturday, April 25, 2009

Grid Computing on the Azure Cloud Computing Platform, Part 3: Running a Grid Application

In Part 1 of this series we introduced a design pattern for grid computing on Azure and in Part 2 we wrote the code for a fraud scoring grid computing application named Fraud Check. Here in Part 3 we'll run the application, first locally and then in the cloud.

The framework we're using is Azure Grid, the community edition of the Neudesic Grid Computing Framework.

We'll need to take care of a few things before we can run our grid application--such as setting up a tracking database and specifying configuration settings--but here's a preview of what we'll see once the application is running in the cloud:

Let's get our set up out of the way so we can start running.

The Grid Application Solution Structure
Azure Grid provides you with a base solution template to which you add application-specific code. Last time, we added the code for the Fraud Check grid application but didn't cover the solution structure itself. Let's do so now.

Our grid computing solution has both cloud-side software and enterprise-side software. The Azure Grid framework holds all of this in a single solution, shown below. So what's where in this solution?

• The GridApplication project contains the application-specific code we wrote in Part 2. As you'll recall, there were 3 pieces of code to write: task code, a loader, and an aggregator. The other projects in the solution host your application code: some of that code will execute cloud-side and some will execute on-premise.
• The Azure-based grid worker code is in the AzureGrid and AzureGrid_WorkerRole projects. This is a standard Azure hosted application with a worker role that can be deployed to an Azure hosting project using the Azure portal. Your task application code will be executed here.
• The enterprise-side code is in the GridManager project. This is the desktop application used for launching and monitoring grid jobs. It also runs your loader code and aggregator code in the background.
• The StorageClient project is a library for accessing cloud storage, which derives from the Azure SDK StorageClient sample. Both the cloud-side grid worker and on-premise grid manager software make use of this library.

Running the cloud-side and on-premise side parts of the solution is simple:

• To run the Grid Manager, right-click the Grid Manager project and select Debug > Start.
• To run the Grid Worker on your local machine using the local Developer Fabric, right-click the AzureGrid project and select Debug > Start.
• To run the Grid Worker in the Azure cloud, follow the usual steps to publish a hosted application to Azure for the AzureGrid project using the Azure portal.

For testing everything locally, you may find it useful to set your startup projects to both AzureGrid and Grid Manager so that a single F5 launches both sides of the application.

Setting up a Local Database
Azure Grid tracks job runs, tasks, parameters, and results in a local SQL Server or SQL Server Express database (2005 or 2008). The project download on CodePlex includes the SQL script for creating this database.

Configuring the Solution
Both the cloud-side and on-premise parts of the solution have a small number of configuration settings to be attended to, the most important of which relate to cloud storage.

Cloud-side Grid Worker settings are specified in the AzureGrid project's ServiceConfiguration.cscfg file as shown below.

• ProjectName: the name of your application
• TaskQueueName: the name of the queue in cloud storage used to send tasks and parameters to grid workers.
• ResultsQueueName: the name of the queue in cloud storage used to receive results from grid workers.
• QueueTimeout: how long (in seconds) a task should take to execute before its task is re-queued for another worker.
• SleepInterval: how long (in seconds) a grid worker should sleep between checking for new tasks to execute.
• QueueStorageEndpoint: the queue storage endpoint for your cloud storage project or your local developer storage.
• AccountName: the name of your cloud storage project.
• AccountSharedKey: the storage key for your cloud storage project.

<ServiceConfiguration serviceName="AzureGrid" xmlns="">
<Role name="WorkerRole">
<Instances count="10"/>
<Setting name="ProjectName" value="FraudCheck"/>
<Setting name="TaskQueueName" value="grid-tasks"/>
<Setting name="ResultsQueueName" value="grid-results"/>
<Setting name="QueueTimeout" value="60"/>
<Setting name="SleepInterval" value="10"/>
<Setting name="QueueStorageEndpoint" value=""/>
<Setting name="AccountName" value="mystorage"/>
<Setting name="AccountSharedKey" value="Z86+YKIJwKqIwnnS2uVw3mvlkVKMjfQcXawiN1g83JTRycaRwwSSKwhnaNsAw3W9zNW7LxGHy2MCJ1qQMX+J4g=="/>

Configuration settings for the on-premise Grid Manager are specified in the Grid Manager project's App.Config file. Most of the settings have the same names and meaning as just described for the cloud-side configuration file and need to specify the same values. There's also a connection string setting for the local SQL Server database used for grid application tracking.
<?xml version="1.0" encoding="utf-8" ?>
<add key="ProjectName" value="FraudCheck"/>
<add key="TaskQueueName" value="grid-tasks"/>
<add key="ResultsQueueName" value="grid-results"/>
<add key="QueueTimeout" value="60"/>
<add key="SleepInterval" value="10"/>
<add key="QueueStorageEndpoint" value=""/>
<add key="AccountName" value="mystorage"/>
<add key="AccountSharedKey" value="Z86+YKIJwKqIwnnS2uVw3mvlkVKMjfQcXawiN1g83JTRycaRwwSSKwhnaNsAw3W9zNW7LxGHy2MCJ1qQMX+J4g=="/>
<add key="GridDatabaseConnectionString" value="Data Source=.\SQLEXPRESS;Initial Catalog=AzureGrid;Integrated Security=SSPI"/>

Testing and Promoting the Grid Application Across Environments
Since grid applications can be tremendous in scale, you certainly want to test them carefully. In an Azure-based grid computing scenario, I recommend the following sequence of testing for brand new grid applications:

1. Developer Test.

Test the grid application on the local developer machine with a small number of tasks. Here your focus is verifying that the application does what it should and the right data is moving between the grid application and on-premise storage.

• For cloud computation, the Grid Application executes on the local Developer Fabric.
• For cloud storage, the local Developer Fabric is used.

The Grid Manager of course executes on the local machine which is always the case.

2. QA Test.

Test the application with a larger number of tasks using multiple worker machines, still local. The goal is to verify that what worked in a single-machine environment also works in a multiple machine environment using cloud storage.

• For cloud computation, the Grid Application executes on the local Developer Fabric, but on multiple local machines.
• For cloud storage, an Azure Storage Project is used.

Note that while this is still primarily a local test, we're now using Azure cloud storage. This is necessary as we wouldn't be able to coordinate work across multiple machines otherwise.

3. Staging. Deploy the grid application to Azure hosting in the Staging environment and run the application with a small number of tasks. In this phase you are verifying that what worked locally also works with cloud-hosting of the application.

• For cloud computation, the grid application is hosted in an Azure hosting project - Staging.
• For cloud storage, an Azure Storage Project is used.

4. Production. Now you're ready to promote the application to the Azure hosting Production environment and run the application with a full workload.

• For cloud computation, the grid application is hosted in an Azure hosting project - Production.
• For cloud storage, an Azure Storage Project is used.

You can use Azure Manager to monitor the grid application is running as it should.

A Local Test Run

Now we're all set to try a local test on our developer machine. As per our test plan, we'll initially test with a small workload on the local developer machine.

The input data is the CSV worksheet shown below which was created in Excel and saved as a CSV file. The FraudCheck application expects this input file to reside in the account where GridManager.exe launches from. We'll use 25 records in this test.

Ensure your database is set up and that your cloud-side and enterprise-side configuration settings reflect local developer storage.

Build the application and launch both the AzureGrid project and the GridManager project.

The AzureGrid project won't display anything visible, but you can check it is operating via the Developer Fabric UI if you wish.

The GridManager application is a WPF-based console that looks like this after launch:

To launch a job run of your application, select Job > New Job from the menu. In the dialog that appears, give your job a name and description.

Click OK and you will see the job has been created but is not yet running. The job panel is red to indicate the job has not been started.

Now we're ready to kick off the grid application. Click the Start button, and shortly you'll see the job panel turn yellow and a grid of tasks will be displayed.

The display will update every 10 seconds. Soon you'll see some tasks are red, some are yellow, and some are green. Tasks are red while pending, yellow while being executed, and green when completed.

When all tasks are completed, the job panel will turn green.

With the grid application completed, we can examine the results. Fraud Check was written to pull its input parameters from an input spreadsheet and write its results to an output spreadsheet. Opening the newly created output spreadsheet, we can see the grid application has created a score, explanatory text, and an approval decision for each of the input records.

Viewing the Grid
In the test run we just performed, you saw Grid View, where you can get an overall sense of how the grid application is executing. Each box represents a grid task and shows its task Id. Below you can see how Grid View looks with a larger number of tasks.

• Tasks in red are pending, meaning they haven't been executed yet. These tasks are still sitting in the task queue in cloud storage where the loader put them. They haven't been picked up by a grid worker yet.
• Tasks in yellow are currently executing. You should generally see the same number of yellow boxes as worker role instances.
• Tasks in green are completed. These tasks have been executed and their results have been queued for the aggregator to pick up.

In addition to the grid view, you can also look at tasks and results. The Grid View, Task View, and Results View buttons take you to different views of the grid application.

Viewing Tasks
During or after a job run, you can view tasks by clicking the Task View button. The Task View shows a row for each task displaying its Task Id, Task Type, Status, and Worker machine name. When you're running locally you'll always see your own local machine name listed, but when running in the cloud you'll see the names of the specific machine in the cloud executing each task.

Viewing Results
Result View is similar to Task View--one row per task--but shows you the input parameters and results for each task, in XML form. You may want to expand the window to see the information fully.

Running the Grid Application in the Cloud
We've seen the grid application work on a local machine, now let's run it in the cloud. That requires us to change our configuration settings to use a cloud storage project rather than local developer storage; and also to deploy the AzureGrid project to a cloud-hosted application project.

The steps to deploy to the cloud are the same as for any hosted Azure application:

1. Right-click the AzureGrid project and select Publish.
2. On the Azure portal, go to your hosted Azure project and click the Deploy button under Staging to upload your application package and configuration files.
3. Click Run to run your instances.

Next, prepare your input. In our local test run the input to the application was a spreadsheet containing 25 rows of applicant information to be processed. This time we'll use a larger workload of 1,000 rows.

We only need to launch the Grid Manager application locally this time since the cloud-side is already running on the Azure platform.

Again we kick off a job by selecting Job, New Job from the Azure Manager menu. As before, we Click OK to complete the dialog and then click Start to begin the job. The loader generates 1,000 tasks which you can watch execute in Grid View.

Switching to Task View, you can see the machine names of the cloud workers that are executing each task.

Even before the grid application completes you can start examining results through Results View (shown earlier).

With the job complete, cloud storage is already empty. We also suspend the deployment running in the cloud since there is no work for it to avoid accruing additional compute charges.

Once the application completes, the expected output (an output CSV file in the case of FraudCheck) is present and has the expected number of rows and results for each.

we can see 1,000 rows were processed, but the records aren't in the same order. That's normal: we can't control the order of task execution nor are we concerned with it as each task is atomic in nature. That's one reason you'll typically copy some of the input parameters to your output, to allow correlation of the results.

Performance and Tuning
It's difficult to assess the performance of grid computing applications on Azure today because we are in a preview period where the number of instances a developer can run in the cloud is limited to 2 per project.

There is one way to run more than 2 instances of your grid computing application today, and that's to host your grid workers in more than one place. You can use multiple hosted accounts, perhaps teaming up with another Azure account holder. You can also run some workers locally. This works because the queues in cloud storage are the communication mechanism between grid workers and the enterprise loader/aggregator. As long as you use a common storage project, you can diversify where your grid workers reside.

We can also expect some things that are generally true of grid computing applications to be true on Azure: for example, compute-intensive applications are likely to bring the greatest return on a grid computing approach.

As we move to an expected Azure release by year's end, we should be able to collect a lot of meaningful data about grid computing performance and how to tune it.

Grid Computing on the Azure Cloud Computing Platform, Part 2: Developing a Grid Application

In Part 1 of this series we introduced a design pattern for grid computing on Azure. In this article we'll implement the pattern by developing a grid application in C# and in Part 3 we'll run the application, first locally and then in the cloud. In order to do that, we'll need some help from a grid computing framework.

The Role of a Grid Framework
Unless you're prepared to write a great deal of infrastructure software, you'll want to use a framework for your grid application that does the heavy lifting for you and lets you focus on writing your application code. While Azure performs many of the services you would want in a grid computing infrastructure, it's still necessary to add some grid-specific functionality between Azure and your grid application. A good grid computing framework should do these things for you:

• Provide a means of scheduling and controlling job runs
• Retrieve input data from on-premise storage
• Generate tasks for grid workers to execute
• Distribute tasks to available workers
• Track the status of tasks as the grid executes the application
• Aggregate results from workers
• Store results in on-premise storage

The diagram below shows how the framework brings the grid application and the Azure platform together. The application developer only has to write application-specific code to load input data, generate tasks, execute tasks, and save result data. The framework provides all of the necessary plumbing and tooling in a way that strongly leverages the Azure platform.

In this article we'll be using Azure Grid, the community edition of the Neudesic Grid Computing Framework. Azure Grid performs all of the functions listed above by providing 4 software components:

• A Loader, to which you add your own code to draw input data from on-premise resources and generate tasks.
• A Worker Role, to which you add your own code to execute application tasks.
• An Aggregator, to which you add your own code to store results back to on-premise resources.
• A Grid Manager, which allows you to start job runs and monitor their execution.

Azure Grid minimizes expense by only using cloud resources during the execution of your grid application. On-premise storage is where input data, results, and Azure Grid's tracking database reside. Cloud storage is used for communication with workers to pass parameters and gather results, and drains to empty as your grid application executes. If you also suspend your grid worker deployment when idle you won't be accruing ongoing charges for storage or compute time once your grid application completes.

The Application: Fraud Check
The application we'll be coding is a fictional fraud check application that uses rules to compute a fraud likelihood score against applicant data. Each applicant record to be processed will become a grid task. The applicant records have this structure:

By applying business rules to an applicant record, the Fraud Check application computes a numeric fraud likelihood score between 0 and 1000, where zero is the worst possible score. An application will be rejected if it scores below 500.

Designing the Grid Application
When you design a grid application you need to determine the best way to divide up the work to be done into individual tasks that can be performed in parallel. You start by considering 2 key questions:

• On what basis will you divide the work into tasks?
• How many different kinds of tasks are there?

In the case of Fraud Check, it makes sense to create a separate task for each applicant record: the fraud scoring for each record is an atomic operation, and it doesn't matter what order the records are processed in as long as they all get processed.

Only one task type is needed for Fraud Check which we'll name "FraudScore". The FraudScore task simply renders a fraud score for an applicant record.
Tasks need to operate on input data and produce results data. The input data for FraudScore will be an applicant record and its results data will be a fraud score plus a text field explaining reasons for the score. FraudScore will expect parameters and return results with the names shown below.

In some Grid computing applications tasks might also need access to additional resources to do their work such as databases or web services. FraudScore does not have this requirement, but if it did some of the input parameters would supply necessary information such as web service addresses and database connection strings.

Developing the Grid Application
Now that our grid application's input parameters, tasks, and result fields are defined we can proceed to write the application. Azure Grid only asks us to write code for the Loader, the application's tasks, and the Aggregator.

Writing the Loader Code
The Loader code is responsible for reading in input data and generating tasks with parameters . Most of the time that will come from a database, but Fraud Check is written to read input data from a spreadsheet.

Azure Grid gives you the following starting point for your Loader in a class named AppLoader. The method GenerateTasks needs to be implemented to pull your input data and generate tasks with your task type names and your parameters. Your code will create Task objects and return them as an array. The base class, GridLoader, takes care of queuing your tasks into cloud storage where they can execute.
#region Application-Specific Loader Code

/// Your applications' Loader code goes here.
/// Responsibilities:
/// 1. Read input data necessary to create tasks with parameters from local resources.
/// 2. For each task generated, create a Task object with input parameters.
/// 3. Return an array of Task objects.

/// Job id
/// Task[] array

public Task[] GenerateTasks(string jobId)
List tasks = new List();
Dictionary parameters = new Dictionary();

// TODO: implement Loader

// Example task creation:

parameters["Param1"] = "Value1";
parameters["Param2"] = "Value2";
parameters["Param3"] = "Value3";
tasks.Add(new Task(ProjectName, jobId, 1, "GridTask1", Task.Status.Pending, parameters, null));

return tasks.ToArray();


To implement the Loader for Fraud Check, we replace the sample task creation code with this code that reads input records from a spreadsheet CSV file and creates a task for each record.

using (TextReader reader = File.OpenText("FraudInput.csv"))
string[] names = null;
string[] values = null;

string line;
int lineCount = 0;
int nextTaskId = 1;

// Read lines from CSV file until empty. Line 1 contains parameter names.

while ((line = reader.ReadLine()) != null)

if (lineCount == 1)
// Reader header row of parameter names.
names = line.Split(',');
int n = 0;
foreach (string name in names)
names[n++] = name;
if (!String.IsNullOrEmpty(line))
// Load latest values for this row and generate a task
values = line.Split(',');

parameters = new Dictionary();

for (int i = 0; i < names.Length; i++)
parameters[names[i]] = String.Empty;

for (int i = 0; i < values.Length; i++)
parameters[names[i]] = values[i];

tasks.Add(new Task(ProjectName, jobId, nextTaskId++, "FraudScore", Task.Status.Pending, parameters, null));

The top row of the input spreadsheet should contain parameter names and subsequent rows should contain values, just as in shown earlier. Creating a task is simply a matter of instantiating a Task object and giving it the following information in the constructor:

• Project Name: Your application's project name. This comes from a configuration file setting.
• Job ID: The Id of this job run, a string. This value is provided to the GenerateTasks method.
• Task ID: A unique identifier for this task, an integer.
• Task Type: The name of the task to run.
• Task Status: Should be set to Task.Status.Pending which indicates a not-yet-run task.
• Parameters: A dictionary of parameter names and values.
• Results: NULL - results will be set by the grid worker that executes the task.

Adding the Task to a List completes the work. Once all of the tasks have been generated, returning the List.ToArray() passes the results to the Loader where they are queued to cloud storage.

Writing the Aggregator Code
The bookend to the Loader is the Aggregator, which processes task results and stores them locally.

Azure Grid gives you the following as a starting point for your aggregator in a class named AppAggregator. There are 3 methods to be implemented:

• OpenStorage is called when the first result is ready to be processed to give you an opportunity to open storage.
• StoreResult is called for each result set that needs to be stored. Both the input parameters and results are passed in as XML.
• CloseStorage is called after the final result has been stored to give you an opportunity to close storage.

The base class, GridAggregator, takes care of processing results from cloud storage and calling your methods to store results.

#region Application-specific Aggregator Code

/// You application's Aggregator code goes here.
/// Responsibilities:
/// 1. OpenStorage - open local storage.
/// 2. StoreResult - store a result.
/// 2. CloseStorage - close local storage.


protected override void OpenStorage()
// TODO: open storage

protected override void StoreResult(string parametersXml, string resultsXml)
// TODO: store result

protected override void CloseStorage()
// TODO: close storage


In StoreResult, both the parameters and results for the current task are passed in as XML in this format:
<Parameter name="LastName" value="Bach"/>
<Parameter name="FirstName" value="J.S."/>
<Result name="Score" value="700"/>
<Result name="Approved" value="1"/>
<Result name="Notes" value=" "/>

To implement the aggregator for Fraud Check, we'll reverse what the Loader did and append each result to a spreadsheet CSV file.

• In OpenStorage, a .csv file is opened for output and the column row of the spreadsheet CSV file is written out.
• In StoreResult, results (and also the first and last name input parameters to provide context) are extracted from XML and written out.
• In CloseStorage, the file is closed.

#region Application-specific AggregatorCode

/// You application's Aggregator code goes here.
/// Responsibilities:
/// 1. OpenStorage - open local storage.
/// 2. StoreResult - store a result.
/// 2. CloseStorage - close local storage.


TextWriter tw = null;

protected override void OpenStorage()
tw = File.CreateText("FraudOutput.csv");
tw.WriteLine("Last Name,First Name,Score,Accepted,Notes");

protected override void StoreResult(string parametersXml, string resultsXml)
XElement parameters = XElement.Parse(parametersXml);
XElement results = XElement.Parse(resultsXml);

// Write values



protected override void CloseStorage()
if (tw != null)


Writing the Application Task Code
With the loader and aggregator written, there's just one more piece to write: the application code itself. The AppWorker class contains the application task code. The current task is passed to a method named Execute is which examines the task code to determine which task code to execute.
#region Application Code

/// Application code to execute a task. The switch statement uses the task's TaskType string to determine the appropriate code to execute.

/// The task to execute.

public override void Execute(Task task)
switch (task.TaskType)
case "Task1":
case "Task2":
// Ignore unknown task.

private void Task1(Task task)
// TODO: implement task1

private void Task2(Task task)
// TODO: implement task 2

#endregion End Application Code

For Fraud Check, the switch statement checks for the one task type in our application, FraudScore, and executes the code to compute a fraud likelihood score based on the applicant data in the input parameters.
public override void Execute(Task task)
switch (task.TaskType)

case "FraudScore":

// Ignore unknown task.

The first order of business for the FraudScore code is to extract the input parameters, which are accessible through a dictionary of names and string values in the Task object.
private void FraudScore(Task task)
StringBuilder notes = new StringBuilder();

bool rejected = false;
int score = 1000;

string firstName = task.Parameters["FirstName"];
string lastName = task.Parameters["LastName"];
string state = task.Parameters["State"];
string country = task.Parameters["Country"];
int age = Convert.ToInt32(task.Parameters["Age"]);
string ssn = task.Parameters["SSN"];
string relation = task.Parameters["Relation"];
int monthsEmployed = Convert.ToInt32(task.Parameters["MonthsEmployed"]);

Next, a series of business rules execute that compute the score. Here's an excerpt:
    // Rule: if age < 18 or age > 100, automatic rejection

if (age < 18 || age > 100)
score = 0;
notes.Append("Age out of range, automatic rejection. ");
rejected = true;

// Rule: is SSN missing, reduce score by 300.

if (string.IsNullOrEmpty(ssn))
score -= 300;
notes.Append("SSN missing. ");


// Check score. If below 500, reject application.

if (score < 0) score = 0;

if (!rejected && score < 500)
rejected = true;
notes.Append("Score below 500, rejection. ");

Lastly, FraudScore updates the task with results. This is simply a matter of setting names and string values in a dictionary.
    // Store task results.

task.Results["Score"] = score.ToString();
task.Results["Notes"] = notes.ToString();

if (rejected)
task.Results["Accepted"] = "0";
task.Results["Accepted"] = "1";

The base GridWorker class and WorkerRole implementation take care of queuing the results to cloud storage where they will be retrieved by the Aggregator.

Ready to Run
We've developed our grid application and are about ready to run it. Just a quick review of what we've just accomplished: using a framework, we implemented a loader, an aggregator, and task code. We only had to write code specific to our application.

All that remains is to run the application. With a grid application, you should always test carefully, initially by running locally with a small number of tasks. Once you're confident in your application design and code integrity, you can move on to large scale execution in the cloud. We'll be doing just that in the next article in this series, Part 3.

Wednesday, April 8, 2009

Upcoming Orange County Azure User Group April Meeting: "What's New in Azure"

The next meeting of the Orange Couny Azure User Group is Thursday, April 23rd at QuickStart Intelligence.

Our topic this month is, "What's New in Azure." Since announcing the Azure platform last October, Microsoft has issued several software updates with new features. There have also been some important announcements and events recently. We'll review what's new with SQL Data Services, Windows Azure, and Live Services. We'll demo new Windows Azure features including full trust, native code support, and FastCGI support. We'll also demo a Live Mesh web application that can be synchronized across devices.

Time: April 23, 2009 from 6pm to 8pm

Location: QuickStart Intelligence

Street: 16815 Von Karman Ave, Suite 100

City/Town: Irvine, CA

RSVP link:

If attending, please RSVP so we have the right head count for pizza and beverages. Hope to see you there!

Sunday, April 5, 2009

Grid Computing on the Azure Cloud Computing Platform, Part 1: A Design Pattern

In this series of articles we're going to look at grid computing using the Azure cloud computing platform. In Part 1, we'll look at this from a design pattern and benefits perspective.

Not everyone is clear on the distinctions between grid computing and cloud computing, so let's begin with a brief explanation of each. While grid computing and cloud computing are not the same thing, there are many synergies between them and using them together makes a lot of sense.

Grid Computing
Grid computing is about tackling a computing problem with an army of computers working in parallel rather than a single computer. This approach has many benefits:

  • Time savings: a month of processing work for a single computer could be achieved in a single day if you had 30 computers dedicated to the problem. The largest grid computing project in history, the Search for Extraterrestrial Intelligence SETI@home project, has logged 2 million years of aggregate computer processing time in only 10 years of chronological time by leveraging hundreds of thousands of volunteer computers.
  • Less expensive resources: You can use less expensive resources to get work done instead of buying large servers with maximum grade processors and memory. Granted. you have to buy more of them--but the smaller, cheaper machines are more easily repurposed for other uses.
  • Reliability: A grid computing system has to anticipate the failures or changing availability of Individual computers and not let that prevent successful completion of the work.

Not all types of work lend themselves to grid computing. The work to be done is divided into smaller tasks, and a loosely-coupled network of computers work on the tasks in parallel. Smart infrastructure is needed to distribute the tasks, gather the results, and manage the system.

Not surprisingly, the early adopters of grid computing have been those who needed to solve mammoth computing problems. Thus you see grid computing applied to such things as genetics, actuarial calculations, astronomical analysis, and film animation rendering. But that's changing: grid computing is getting more and more scrutiny for general business problems, and the onset of cloud computing is going to accelerate that. Computing tasks do not have to be gargantuan to benefit from a grid computing approach, nor are compute-intensive tasks the only kind of work eligible for grid computing. Any work that has a repetitive nature to it is a good candidate for grid computing. Whether you're a Fortune 500 corporation that needs to process 4 million invoices a month or a medium-sized business with 1,000 credit applications to approve, grid computing may well make sense for you.

Grid computing is a decade older than cloud computing, so much of today's grid computing naturally doesn't use a cloud approach. The most common approaches are:

  • Dedicated machines: purchase a large number of computers and dedicate them to grid work.
  • Network cycle stealing: repurpose other machines in your organization for grid work when they are idle, such as overnight. A business workstation by day can be a grid worker at night.
  • Global cycle stealing: apply the cycle stealing concept at worldwide scale over the Internet. This is how the SETI@home project works, with over 300,000 active computers.

Cloud computing allows for an alternative approach to grid computing that has many attractive characteristics, offers a flexible scale-up/scale-down as you wish business model, and already provides much of the supporting infrastructure that traditionally has had to be custom-developed.

Cloud Computing and Microsoft's Azure Platform
Cloud computing is about leveraging massive data centers with smart infrastructure for your computing needs. Cloud computing spans application hosting and storage, as well as services for communication, workflow, security, and synchronization. Benefits of cloud computing include the following:

  • On-demand scale: you can have as much capacity as you need, virtually without limit.
  • Usage-based billing: a pay-as-you-go business model where you only pay for what you use. There is no long-term commitment and you not penalized if your level of usage changes.
  • No up-front costs: no need to buy hardware or keep it maintained or patch operating systems. Capital expenditures are converted into operating expenditures.
  • No capacity planning needed: you don't need to predict your capacity, as you have the ability to adjust how much resource you are using at will.
  • Smaller IT footprint and less IT headache: capabilities such as high availability, scalability, storage redundancy, and failover are built into the platform.

Microsoft's cloud computing platform is called Azure, and currently it consists of 4 primary service areas:

  • Windows Azure provides application hosting and storage services. Application hosting means running software such as web applications, web services, or background worker processes "in the cloud"; that is, in a cloud computing data center. Applications are load-balanced, and run as many instances as you wish with the ability to change the number of instances at a moment's notice. Cloud storage can provide file system-like Blob storage, queues, and data tables.
  • SQL Data Services provides a full relational database in the cloud, with largely the same capabilities as the SQL Server enterprise product.
  • .NET Services provides enterprise readiness for the cloud. Service Bus interconnects multiple locations or organizations with publish-subscribe messaging over the Internet. Access Control Service provides enterprise and federated security for applications. Workflow Service can execute workflows in the cloud.
  • Live Services provides a virtual desktop, data and web application synchronization across computers and devices, and a variety of communication and collaboration facilities whose common theme is social networking.

Azure is new; at the time of this writing, it is in a preview period with a commercial release expected by end of year 2009.

Putting Grid Computing and Azure Cloud Computing Together
Azure is designed to support many different kinds of applications and has no specific features for grid computing. However, Azure does provides much of the functionality needed in a grid computing system. To make Azure a great grid computing platform only requires using the right design pattern and a framework that provides grid-specific functionality. We'll look at the design pattern now and in Part 2 we will explore a framework that supports this pattern.

The first thing you'll notice about this pattern is that there is some software/data in the Azure cloud and some on-premise in the enterprise. What goes where, and why?

  • The cloud is used to perform the grid computing work itself. The use of cloud resources is geared to be temporary and minimize cost. When you're not running a grid computing solution, you shouldn't be accruing charges.
  • The enterprise is the permanent location for data. It is the source of the input data needed for the grid to do its work and the destination for the results of that work.

The software actors in this pattern are:

  • Grid Worker: The grid worker is cloud-side software that can perform the task(s) needed for the grid application. This software will be run in the cloud as a Worker Role in multiple instances. The framework uses a switch statement arrangement so that any grid worker can perform any task requested of it. Grid workers run in a loop, reading the next task to perform from a task queue, executing the task, and writing results to a results queue. When a grid worker has no more queue tasks to run, it requests to be shut down.
  • Grid Manager: the grid manager is enterprise-side software that manages the job runs of grid computing work. There are 3 components to the grid manager:

    o Loader: The loader's job is to kick off a grid application job run by generating the tasks for the grid workers to perform. The loader runs in the enterprise in order to access on-premise resources such as databases for the input data that needs to be provided for each task. When the loader runs, the tasks it generates are written to a Task Queue in the cloud.

    o Aggregator: the aggregator reads results from the results queue and stores them in a permanent location on-premise. The Aggregator also realizes when a grid application's execution is complete.

    o Console: the console is a management facility for configuring projects, starting job runs, and viewing the status of the grid as it executes. It can provide a view similar to a flight status display in an airport, showing tasks pending and tasks completed.

The data actors in this pattern are:

  • Task Queue: this is a queue in cloud storage that holds tasks. The Loader in the enterprise writes its generated tasks to this queue. Grid workers in the cloud read tasks from this queue and execute them.
  • Results Queue: this is a queue in cloud storage that holds results. Grid workers output the results of each task to this queue. The Aggregator running in the enterprise reads results from this queue and stores them durably in the enterprise.
  • Tracking Table: this is an enterprise-side database table that tracks tasks and their status. Records are written to the tracking table by the Loader and updated by the Aggregator as results are received. The tracking table enables the console to show grid status and allows the system to realize when a grid application has completed.
  • Enterprise Data: the enterprise furnishes data stores or services that supply input data for tasks or receive the results of tasks. This is organization and project-specific; the code written in the Loader and the Aggregator integrates with these data stores.

Walk-through: Creating and Executing a Grid Computing Application on Azure
Let's put all of this together and walk through how you would develop and run a grid computing application from start to finish using this pattern and a suitable framework:

1. A need for a grid computing application is established. The tasks that will be needed, input data, and results destinations are identified.

2. Using a framework, developers add the custom pieces unique to their project:

  • A Grid Worker (Azure Worker Role) is created from a template and code is added to implement each of the tasks.
  • A Loader is created from a template and code is added to implement reading input data from local resources, generating tasks, and queuing them to the Task Queue.
  • An Aggregator is created from a template and code is added to implement receiving results from the Result Queue and storing them on-premise.

3. Azure projects for application hosting and storage are configured using the Azure portal. The Grid Worker package is deployed to cloud hosting, tested, and promoted to Production.

4. Using the Grid Console, the grid job run is defined and started. This starts the Loader running.

5. The Loader reads local enterprise data and generates tasks, writing each to the Task Queue.

6.The Grid Worker project in the Azure portal is started, which spawns multiple instances of Grid Workers.

7. Each Grid Worker continually receives a new task from the Task Queue, determines the task type, executes the appropriate code, and sends the task results to the Results Queue. The way Azure queues work is very useful here: if a worker has a failure and crashes in the middle of performing a task, the task will reappear in the queue after a timeout period and will get picked up by another Grid Worker.

8. The Aggregator reads results from the Results Queue and writes them to local enterprise storage.

9. While the grid is executing, administrators can use the Console to watch status in near real-time as Grid Workers execute tasks.

10. When the Aggregator realizes all scheduled tasks have been completed, it provides notification of this condition via the Console. At this point, the grid has completed its work and its results are safely stored in the enterprise.

11. The Grid Workers are suspended via the Azure portal to avoid incurring any additional compute-time charges. Cloud storage is already empty as all queues have been fully read and no additional storage charges will accrue.

Value-Add of Azure for Grid Computing
The Azure platform does good things for grid computing, both technically and financially:

  • Cost Conscious: the use of cloud-hosted applications avoids the need to purchase computers for grid computing. Instead, you pay monthly for the Grid Worker compute time and queue storage you use. The design eliminates ongoing costs for compute time or storage time once a grid application has completed processing.
  • Scalability and Flexibility: You can have as much capacity as you want or as little as you want. Your grid computing application can run on as small a footprint as a single Grid Worker instance.
  • Reliability: The reliability mechanism built into Azure Queues ensures all tasks get performed even if a Grid Worker crashes. If a Grid Worker does crash, the Azure Fabric will start up a replacement instance.
  • Coordination: The Worker Role-queue mechanism is simple, load balanced, and works well. Using it avoids the need to write complex coordination software.
  • Simplicity: this pattern for grid computing on Azure has simplicity at its core. Roles are well-defined, no element of the software is overly complex, and the number of moving parts is kept to a minimum.

In Part 2, we'll see how this pattern is implemented in code using a grid computing framework developed for Azure.