Graviton2 and Graviton3

What do modern cloud workloads look like? And what does that have to do with new chip designs? I found these gems in Peter DeSantis's ReInvent20 and ReInvent21 talks. These talks are very informative and educational. Me likey! The speakers at ReInvent are not just introducing new products/services, but they are also explaining the thought processes behind them. To come up with this summary, I edited the YouTube video transcripts slightly (mostly shortening it). The presentation narratives have been really well planned, so this makes a good read I think. Graviton2 This part is from the ReInvent2020 talk from Peter DeSantis.   Graviton2 is the best performing general purpose processor in our cloud by a wide margin. It also offers significantly lower cost. And it's also the most power efficient processor we've ever deployed. Our plan was to build a processor that was optimized for AWS and modern cloud workloads. But, what do modern cloud workloads look like? Let's start by


World chess championship is on. Magnus is the clear favorite, but Nepomniachtchi is coming strong. It is a lot of fun. I haven't been posting for sometime. And this seems like a good time to talk about chess. My journey picking up on chess, again As a kid I wasn't much interested in chess. My younger brother was into chess, he would buy books to review grandmasters' games. I didn't go further than playing casual games with him. Maybe it was the quarantine that triggered this, but in the last year I started playing some chess on the smartphone. I had an Android phone for the last 3 years, and I used a random chess app downloaded at Play Store. I thought the app was very neat because it let me play against the computer at different levels and it allowed me to go back and try different things. Like Git, you know. The app also suggested me hints. I thought this was a dope way to improve one's chess skills. Little did I know, I was just scratching the surface. Three yea

What’s Really New with NewSQL?

This paper is by Andy Pavlo and Matthew Aslett, and it appeared in Sigmod 2016.   NoSQL managed to scale horizontally, but this came at the expense of losing transaction and rich querying capability. NewSQL followed NoSQL to amend things and restore balance to the force. NewSQL is a class of modern relational DBMSs that seek to provide on-par scalability to NoSQL for OLTP read-write workloads while still maintaining ACID guarantees for transactions. Let's dissect this definition. The biggest benefit of NewSQL is that developers do not have to write code to deal with eventually consistent updates as they would in a NoSQL system, because they will be able to use ACID transactions and SQL-like rich querying capabilities. NewSQL is about OLTP (online transaction processing), not OLAP (online data analysis like in data warehouse systems). Well, with the caveat that it can also be about HTAP (hybrid transactional-analytical processing), as the paper mentions under the future trends secti

Rabia: Simplifying State-Machine Replication Through Randomization (SOSP'21)

This paper appeared in SOSP'21 . I took notes and screen-snapshots during the presentation of this paper, and decided to put together a summary of what I understood from it. The paper has a simple idea and a somewhat unexpected result. It will be interesting to dive deep and explore to the extent this idea can be applied in practice.  Here is the idea. State-machine replication through Paxos , more accurately MultiPaxos, is commonly used in practice. (Yes, that includes state-machine replication through Raft, if I have to spell this out to the Rafters among us.) The MultiPaxos leader drives the protocol. It is basically one round, phase-2 execution of "accept this!" "yes, boss" between the leader and followers, where phase3 commit can be piggybacked to phase-2 message. This is as simple and efficient as it gets. You can't beat it! Or can you? The paper argues that, although the happy path of MultiPaxos based state-machine replication is simple and efficient

Log structured protocols in Delos (SOSP21)

This paper picks up from where the "Virtual Consensus in Delos" paper finishes , and  talks about using Delos to build control plane databases at Facebook. These are my notes from Mahesh's excellent conference presentation. These control plane databases at Facebook were required to support multiple APIs: some SQL, some key-value pairs, and some ZooKeeper namespaces. Each such API typically requires a separate database to be built. But implementing and operating even a single zero dependency control plane system is difficult. To cut this dilemma, they observe that these databases have a similar structure: a consensus protocol at the bottom, and a replicated state machine on top. A lot of that state machine uses generic logic that can be reused across different APIs. This implies that there is an hourglass architecture at work here, where Delos platform is the narrow waist. This paper focuses on protocols which allow us to layer multiple apis on this common Delos platform.

David G. Andersen AMA (SOSP Day 3)

Dave has broad research interests in computer systems in the networked environment. What do you think systems community should be working on but isn't getting enough attention? With the stuttering of Moore's law as we get into nanoscales, there is more need to extract performance from systems through integrated co-design of hardware and software. We need integrated work through the entire stack to make systems faster and more reliable. How do you pick research ideas/projects? I stumble on interesting questions. There is a lot of cross-fertilization going on between different areas I am working on. If everything else fails, once a year or so, I take a notebook, go for a walk, and write down my ideas. My two sabbaticals were also very fruitful for finding research ideas and projects. At Google, I worked with AI/ML people, which opened new horizons for me. What do you think about the future prospects of blockchain systems? Will there be a killer application, like ever? There are v

Jon Howell AMA at SOSP21 (Day 2)

 There were several AMAs at SOSP21, where the attendees can ask questions. I really like Jon Howell 's AMA session. I asked two questions I was interested about learning.   Why do you think verification is at a breakaway point now? What are the factors that make this the go-big moment? We are now figuring out how to make common problems cheap. Historically Coq proving was about being clever: do something not done before. Now we are focusing on how do we make the common stuff cheap. The Ironclad project began when Rustan Leino came to our team, and said I have a tool, let's use this tool. The Dafny demos were interesting to me. A red squiggly line showed a bug. When you give hint, the bug went away. It was a lightbulb moment. It took me extra line of code/explanation for me to get it. The verifier already got it. We had verification runouts in slicon valley,  with tlc, tlaps and others. The verifier send back question mark, and we try to decode it, what we realize was, almost

SOSP21 conference (Day 1)

SOSP, along with OSDI, is the premiere conference in computer systems. SOSP was held biannually; I had attended SOSP19 in person, and shared notes and paper summaries from that.  This year SOSP is virtual, which made it a lot easier to travel to. It is nice attending the conference from the convenience of your home. The experience is not too inferior to physical conference attending if you take the convenience factor into account.  Here are some notes from SOSP21. If you find and interesting paper, you can dig in, because: All SOSP papers are available as open access All the presentations are also available on YouTube Opening The conference opened with announcements from program committee (PC) chairs. SOSP21 had 348 submissions from 2078 authors, resulting on avg 6 authors per paper. 54 out of 348 papers are accepted. Reviews were conducted by 64 PC members, who produced 1500+ reviews. The trend is clear, the number of submissions are growing steeply.  Last year OSDI had announced i


Distributed/networked systems employ data replication to achieve availability and scalability. Consistency is concerned with the question of what should happen if a client modifies some data items and concurrently another client reads or modifies the same items possibly at a different replica. Linearizability is a strong form of consistency. (That is why it is also called as strong-consistency.) For a system to satisfy linearizability,  each operation must appear (from client perspective) to occur at an instantaneous point between its start time (when the client submits it) and finish time (when the client receives the response), and  execution at these instantaneous points should form a valid sequential execution (i.e., it should be as if operations are executed one at a time ---without concurrency, like they are being executed by a single node)  Let's simplify things further. In practice, consistency is often defined for systems that have two very specific operations: read and w

Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3 (SOSP21)

This paper comes from my colleagues at AWS S3 Automated Reasoning Group, detailing their experience applying lightweight formal methods to a new class of storage node developed for S3 storage backend. Lightweight formal methods emphasize automation and usability. In this case, the approach involves three prongs: developing executable reference models as specifications, checking implementation conformance to those models, and building infrastructure to ensure the models remain accurate in the future. ShardStore ShardStore is a new append-only key-value storage node developed for AWS S3 backend. It is over 40K lines of Rust code. Shardstore is a log-structured merge tree (LSM tree) but with shard data stored outside the tree to reduce write amplification. ShardStore employs soft updates for avoiding the cost of redirecting writes through a write-ahead log while still being crash consistent. A soft updates implementation ensures that only writes whose dependencies are persisted are sent

Popular posts from this blog

Foundational distributed systems papers

Your attitude determines your success

Progress beats perfect

Cores that don't count

Silent data corruptions at scale

Learning about distributed systems: where to start?

Read papers, Not too much, Mostly foundational ones

Sundial: Fault-tolerant Clock Synchronization for Datacenters


Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3 (SOSP21)