Saturday, March 26, 2011

Amazon Web Services and Windows Azure Architectural Comparison, Part 1

At the recent Cloud Connect show in the Bay Area I attended evangelist Jinesh Varia’s talk on Amazon Web Services design patterns, Design Patterns in the Cloud: A Love Story. I wanted to learn more about how Windows Azure is similar and dissimilar to other cloud platforms, and I found Jinesh’s talk to be insightful, well delivered, and entertaining. I thought therefore it would be useful to present the same progressive scenario as it would be done in Windows Azure.

The scenario in the presentation is Thursdate, a dating website that only runs 3 hours a week on Thursday evenings. Lonely geek Andy creates this site, initially in a very modest form, and it progressively grows and scales.

In writing this, I’ve labored to avoid spin or mischaracterization. However, I must admit my knowledge of AWS is elementary compared to Windows Azure—so please do bring any errors to my attention. Nor am I claiming that everything here is an exact equivalent; this is not the case since no two cloud platforms are identical. I’m merely showing how the same scenario with the same progression would be achieved in Windows Azure.

1. Local Deployment
The first incarnation of Thursdate is hosted locally by Andy. He uses Apache as his application server, develops in PHP, and uses a MySQL database. He backs up to tapes. His deployment looks like this:

Local Deployment


2. Initial Cloud Deployment
When Andy gets the bright idea to move things into the cloud, he minimally needs a server in the cloud, a public access point for his web site, and a way to do backups. A single server in the cloud isn’t going to provide high availability or any guarantee of data persistence--but Andy isn’t concerned about that yet.

AWS: In AWS this means an Amazon EC2 Instance, an Elastic IP, and backups to the Amazon S3 storage service.

Windows Azure: In Windows Azure, the counterpart to EC2 is Windows Azure Compute. Andy must specify a role (hosting container) and number of VM instances. Here he chooses a worker role (the right container for running Apache) and one VM instance. He uploads metadata and an application package, from which Windows Azure Compute creates a Windows Server VM instance. An input endpoint is defined which provides accessibility to the web site. The input endpoint is nominally accessible as <production-name>.cloudapp.net; for a friendlier URL, a domain or subdomain can forward to this address. Backups are made to the Windows Azure Storage service in the form of blobs or data tables.

Initial Cloud Deployment


3. Designing for Failure

Pattern #1: Design for failure and nothing will fail

Andy soon realizes that failures can and will happen in a cloud computing environment and he’d better give that some attention. VM server state is not guaranteed to be persistent in a cloud computing environment. He starts keeping application logs and static data outside of the VM server by using a cloud storage service. He also makes use of database snapshots, which can be mapped to look like drive volumes.

AWS: The logs and static data are kept in the Amazon S3 storage service. Root and data snapshot drive volumes are made available to the VM server using the Amazon Elastic Block Service (EBS).

Windows Azure: Logs and static data are written to the Windows Azure Storage service in the form of blobs or tables. For snapshots, a blob can be mapped as a drive volume using the Windows Azure Drive service. As for the root volume of the VM, this is created from the Windows Azure Compute deployment just as in the previous configuration.

Updated Deployment - Designing for Failure


4. Content Caching

Pattern #2: Edge cache static content

Andy is starting to hit it big, and there is now significant usage of Thursdate in different parts of the world. He wants to take advantage of edge caching of static content. He uses a content distribution network to serve up content such as images and video performantly based on user location.

AWS: Amazon Cloudfront is the content distribution network.

Windows Azure: The Windows Azure Content Delivery Network (CDN) can serve up blob content using a network of 24+ edge servers around the world.

Updated Deployment - Caching Static Content


5. Scaling the Database
In preparing to scale, Andy must move beyond a self-hosted database on a single VM server instance. By using a database service outside of the compute VM, he will free himself to start using multiple compute VMs without regard for data loss.

AWS: The Amazon Relational Database Service (RDS) provides a managed database. Andy can continue to use MySQL.

Windows Azure: Andy must switch over to SQL Azure, Microsoft’s managed database service. This provides him with a powerful database available in sizes of 1-50GB. Data is automatically replicated such that there are 3 copies of the database. In addition, Andy can make logical backups if he chooses--to another SQL Azure database in the cloud or to an on-premise SQL Server database.

Updated Deployment - Using a Database Service


6. Scaling Compute

Pattern #3: Implement Elasticity

With a scalable data tier Andy is now free to scale the compute tier, which is accomplished by running multiple instances.

AWS: Andy runs multiple instances of EC2 through the use of an Auto-Scaling Group. He load balances web traffic to his instances by adding an Elastic Load Balancer.

Windows Azure: Andy has had the equivalents of a scaling group and an elastic load balancer all along: we just haven’t bothered to show them in the diagram until now and he hasn’t been taking advantage of them with a single compute instance. The input endpoint comes with a load balancer. The worker role is a scale group—its instances can be expanded or reduced, interactively or programmatically. The only change Andy needs to make is to up his worker role’s instance count, a change he can make in the Windows Azure management portal.

Updated Deployment – Compute Elasticity


7. High Availability and Failover

Pattern #4: Leverage Multiple Availability Zones

Andy wants to keep his service up and running even in the face of failures. He’s already taken the first important step of redundant resources for compute and data. Now he also wants to takes advantage of failover infrastructure, so that a catastrophic failure (such as a server or switch failure) doesn’t take out all of his solution.

AWS: Andy sets up a second availability domain. His Amazon RDS database in the first domain has a standby slave counterpart in the second domain. The solution can survive a failure of either availability domain.

Windows Azure: The Windows Azure infrastructure has been providing fault domains all along. Storage, database, and compute are spread across the data center to prevent any single failure from taking out all of an application’s resources. At the storage and database level, replication, failover, and synchronization are automatic. The one area where fault domains weren’t really helping Andy until recently was in the area of Compute because he was running only one instance at first. A best practice in Windows Azure is to run at least 2 instances in every role.

Updated Deployment – Fault-tolerant


Summary – Part 1
We’re halfway through the progression, and you can see that Amazon Web Services and Windows Azure have interesting similarities and differences. In both cases we have arrived at a solution that is scalable, elastic, reliable, and highly available.

My main observation is that an informed Windows Azure developer would not need to jump through all of these hoops individually (I suspect that’s also true of an informed AWS developer). Second, some of the discrete steps in this progression are automatically provided by the Windows Azure environment and don’t require any specific action to enable; that includes compute elasticity, load balancing of public endpoints, and fault domains: they're always there. The Windows Azure edition of the scenario does require Andy to change database providers in order to realize the benefits of database as a service; aside from this all of the other steps are easy.

In Part 2 we’ll complete the comparison.

4 comments:

Ewellyne said...

good stuff. looking forward to part two.

Anonymous said...

good stuff!! would love to see cost comparison at the end too ...

William simth said...

Web Designing

thank you for blog

Fire maintenance said...

Your post is well appreciated...thanks alot