AWS Summit NYC, the rest of Day 2

- April 21, 2013

In the afternoon of day 2, there were breakout sessions. Below are summaries from the ones that I found useful. There was also an expo on day 2, and my notes on the expo are towards the end of this post.

Building web-scale applications with AWS

This presentation opened with a customer anectode. As you may know, Reddit runs on AWS. When President Obama did an "Ask Me Anything" on Reddit recently, Reddit called AWS a day in advanced to give heads up. There were 3 million pageviews on Reddit that day, 500K of which was for the President's AMA. Reddit has added 60 dedicated instances and handled the extra traffic smoothly.

The presenter gave the takehome message in advance for his talk: While you scale, architect for failure, and architect with security. The rest of the presentation talked about these (mostly the first point) in more detail.

Architecting for failure. AWS has 9 regions around the world, and supports within each region different availability zones (AZs). An AZ is basically an independent datacenter. Use different AZs and regions to architect for failure.

There are various storage options for architecting for failure. There is S3, which provides durable (everything replicated to 3 nodes by default), scalable, highly-available storage. S3 also has good CloudFront (and also Route53) integration for low-latency access worldwide. Glacier provides a very cheap solution for longterm infrequently-accessed object storage, with 1 cent per gig per month. The tradeoff is that if you want to retrieve an object, this may take up to 5 hours for preparation (since it is based on magnetic-tape based storage). Finally, there is elastic block storage (EBS) which provides support up to 1 TB, is snapshotable, and is a very good choice for random IO.

Then there are database options, they provide a readily queryable system with consistency/performance options. A common use case for the database is for managing session-state for web applications. The session state store should be performant, scalable, reliable. Managing session state with the AWS database options achieve these, as well as divorcing the state from the applications (which is a great tool for architecting for failure). There are several options AWS provides for databases: DynamoDB, RDS (Relational Database Service), and NoSQL high I/O datastore solutions. AWS ElastiCache (protocol compliant with MemCached) helps provide speed support for these databases.

Data-tier scaling is the bane of the architect. It is still a hard problem, but AWS services can help make this a bit easier. There is no silver bullet, you just have to be aware of your options. First you can try vertical scaling, go for a bigger VM instance. For horizontal scaling, the simplest thing you can try is to have a master node and add read-only copies as slaves. (Using RDS and Oracle can create the read-slaves with click of a button.) If you need to go with the high-scalability version, this makes things more complex. This is the sharding and using hash-rings idea. You can shard by function or key space, in RDBMS or NoSQL solutions. DynamoDB provides a provisioned throughput NoSQL database with /fully-managed horizontal scaling/. Leveraging on its SSD-backed infrastructure, DynamoDB can pull this trick. DynamoDB is also well-integrated to AWS CloudWatch, which provided system-wide visibility into resource utilization, application performance, and operational health. Finally, AWS Redshift provides a petabyte-scale data warehousing solution optimized for high query performance on large-scale datasets.

Loose coupling. When architecting for failure, another important strategy is to use/adopt loose coupling in your applications. The looser coupled your application is the better it scales and tolerates failures. AWS SQS (Simple Queue Service) is your best friend for this. SQS offers a reliable, highly scalable, hosted queue for storing messages as they travel between computers. You use SQS as buffers between system components to helps make the system more loosely-coupled. SQS allows for parallel processing (through fan outs) as well as tolerating failure. A relevant service here is the AWS SNS (Simple Notification Service) which can be used to fanout messages to different SQS based on detection of failure or more traffic. Also you can then add CloudWatch for auto-scaling: if one queue size gets big, you can set the appropriate response to acquiring additional AWS spot instances. (AWS spot is the name-your-own price option for VMs, sort of like PriceLine but for VMs.)

Finally, when architecting for scalability and fault-tolerance, spread the load by using AWS ELB (Elastic Load Balancing) at the frontend. ELB creates highly scalable applications, by distributing load across EC2 instances, and also supports distributing load to multiple AZs.

Architecting for security. AWS provides Identity and Access Management (IAM) helps you to securely control access to AWS services and resources for your users. IAM supports temporary security credentials, and a common use case for IAMs is to set up identity federations to AWS APIs.

Some soundbites. A couple of nice soundbites from this presentation are "scalability: so your application is not a victim of its success" and "scaling is the ability to move the bottlenecks around to the least expensive part of the architecture".

After the talk, I approached the presenter and asked him: What about tightly-synchronized apps (for example transactional processing applications), how does AWS support them? His first answer is to scale vertically by using a bigger instance. When I asked about a scale-out solution, he said that there is no direct support for that. But a possible solution could be to use DynamoDB to record transaction state. DynamoDB provides consistency over its replicas, so it should be possible to leverage on that to build a scale-out transactional application.

Big Data (and Elastic MapReduce)

The first part of the talk introduced big data and the second part talked about Elastic MapReduce (EMR) for big data analytics.

With the accelerated rate of data generation the data volume grows rapidly, and storing and analyzing this data for actionable information becomes a challenge. Big data analysis have a lot of benefits. Razorfish agency analyzed credit card transaction data to improve advertising effectivenes, and achieved they achieved 500% return on ad spend by targeted advertisements. Yelp, an AWS customer, analyzed their usage logs and identified early on a bias for mobile usage, and invested heavily in the mobile development. In Jan 2013, 10 million unique mobile devices used Yelp. Social network data analysis keeps a tab on the pulse of the planet, and a lot of companies are analyzing social network data today for market analysis, product placement, etc.

EMR is AWS's managed Hadoop service. (Read here about MapReduce.) Hadoop has a robust ecosystem with several database and machine learning tools, and EMR benefits from that. EMR providesagility for experimentation, and also cost optimizations (due to its integration with AWS spot, name-your-own-price supercomputing).

AWS datapipeline helps manage and orchestrate data-intensive workloads via its pipeline automation service, and execution and rety logic. A basic pipeline is of the form "input-datanode" -> activity -> "output-datanode", but it is easily expandable with additional checks (precondition) and notifications (of faults), and arbitrary complex fanning out and fanning-in.

Partner & Solution Expo

There was a partner and solution expo as part of day 2 of the summit. The expo was more fun than I expected, there was a lot of energy in the room. You can see a list of companies in the expo under the event sponsors category at the AWS summit webpage.

Most of these companies offer some sort of cloud management solutions as-a-service (AAS) to cloud computing customers: security AAS, management AAS, scalability AAS, analytics AAS, database AAS. When I asked some of these companies "what prevents AWS from doing this cloud management support solutions themselves as part of AWS, and whether they feel threatened by that", they said "this is not AWS's business model". This seemed like a weak answer to me, because AWS has been continuously improving the management of its cloud computing services offerings to make it easier to use for the customers. Later when I asked the same question to an AWS guy, he told me that "These third parties should innovate faster than AWS otherwise they become irrelevant", which seemed like a more logical answer. Maybe the third parties can also differentiate by providing more customized service and support to companies, where AWS provides more generic services. After his forthcoming answer to this question, I asked the AWS guy, whether AWS is starting to eat IBM's lunch. He didn't comment on that. My take on this is that by democratizing IT services, AWS is enabling (will enable) many third parties to start digging into IBM's solutions/consulting market.

I talked with the AWS training people, they are in the process of preparing more training material for AWS. This will be very useful for developers, and also could be a good resource to include in the undergrad/grad distributed systems courses with project components.

I also talked with Amazon Mechanical Turk (AMT) people. They have customers from news and media companies, and were trying to make new connections in the expo. Maybe Amazon should look into integrating AMT with AWS and provide "crowdsourcing as a service" well-integrated to other AWS services.

Finally, I also chatted with the Eucalyptus people. Eucalyptus offers a private cloud solution compatible with AWS. Eucalyptus is opensource, with a business model similar to RedHat (providing support and training). They told me that their use-case is to enable you to test your applications in your private cloud in small-scale and then deploy at scale in AWS. But, that doesn't look like an essential service to me; I don't understand why one couldn't have done that initial testing on the AWS already. Their main competitor is OpenStack, which is backed by RackSpace and a large coalition. OpenStack has been gaining a lot of momentum in recent months.

Wrapping up

AWS is by far the top cloud computing provider, and it is slowly but surely building a datacenter OS, setting the standards and APIs for this along the way. AWS has currently 33 services (S3, RDS, DynamoDB, OpWorks, CloudWatch, CloudFront, ElastiCache, ELB, EMR, SQS, SNS, etc.) to support building cloud computing applications. I had the impression that AWS is about to see an explosion in its customer base. Most of the people I met during the summit have been using AWS in a limited way, and were currently investigating if or how they can use it more intensively.

During the expo session, I had a chance to approach Werner and make acquaintance. I am a tall guy, but he is really tall and big. He is very down-to-earth person; he stood in the AWS booth and met with AWS customers to listen and help for at least 2 hours.

PS: 17 presentations from the #AWS Summit in NYC are made available here.