Friday, May 24, 2019

Paper summary. Scalable Consistency in Scatter

Here is the pdf for the paper. It is by Lisa Glendenning, Ivan Beschastnikh, Arvind Krishnamurthy, and Thomas Anderson, Department of Computer Science & Engineering University of Washington.

This paper is about peer-to-peer (P2P) systems. But the paper is from 2011, way after the P2P hype had died. This makes the paper more interesting, because it had the opportunity to consider things in hindsight. The P2P corpse was cold, and Dynamo had looted the distributed hash tables (DHT) idea from P2P and applied it in the context of datacenter computing. In return, this work liberates the Paxos coordination idea from the datacenter world and employs it in the P2P world. It replaces each node (or virtual node) in a P2P overlay ring with a Paxos group that consists of a number of nodes.

Ok, what problem do Paxos groups solve in the P2P systems? In the presence of high churn, DHTs in P2P systems suffer from inconsistent routing state and inconsistent name space partitioning issues (see Figure 1). By leveraging the Paxos group abstraction as a stable base to build these coordination operations (split, merge, migrate, repartition), Scatter achieves linearizable consistency even under adverse circumstances.

Group coordination 

Scatter supports the following multi-group operations:

  • split: partition the state of an existing group into two groups
  • merge: create a new group from the union of the state of two neighboring groups
  • migrate: move members from one group to a different group
  • repartition: change the key-space partitioning between two adjacent groups

Each multi-group operation in Scatter is structured as a distributed transaction. The paper calls this design pattern as nested consensus, and says: "We believe that this general idea of structuring protocols as communication between replicated participants, rather than between individual nodes, can be applied more generally to the construction of scalable, consistent distributed systems."

Nested consensus uses a two-tiered approach. At the top tier, groups execute a two-phase commit protocol (2PC), while within each group Paxos is used for agreeing on the actions that the group takes. Provided that a majority of nodes in each group remain alive and connected, the 2PC protocol will be non-blocking and terminate. (This is the same argument Spanner uses as it employs 2PC over Paxos groups.) For individual links in the overlay to remain highly available, Scatter maintains an additional invariant: a group can always reach its adjacent groups. To maintain this connectivity, Scatter enforces that every adjacent group of a group A has up-to-date knowledge of the membership of A.

Multi-group operations are coordinated by whichever group decides to initiate the transaction as a result of some local policy. The group initiating a transaction is called the coordinator group and the other groups involved are called the participant groups. This is the overall structure of nested consensus:

  1. The coordinator group replicates the decision to initiate the transaction.
  2. The coordinator group broadcasts a transaction prepare message to the nodes of the participant groups.
  3. Upon receiving the prepare message, a participant group decides whether or not to commit the proposed transaction and replicates its vote.
  4. A participant group broadcasts a commit or abort message to the nodes of the coordinator group.
  5. When the votes of all participant groups is known, the coordinator group replicates whether or not the transaction was committed.
  6. The coordinator group broadcasts the outcome of the transaction to all participant groups.
  7. Participant groups replicate the transaction outcome.
  8. When a group learns that a transaction has been committed then it executes the steps of the proposed transaction, the particulars of which depend on the multi-group operation.

Figure 5 shows an example of this template for group-slit operation. After each group has learned and replicated the outcome (committed) of the split operation at time t3, the following updates are executed by the respective group: (1) G1 updates its successor pointer to G2a, (2) G3 updates its predecessor pointer to G2b, and (3) G2 executes a replicated state machine reconfiguration to instantiate the two new groups which partition between them G2's original key-range and set of member nodes.

The storage service (discussed next) continues to process client requests during the execution of group transactions except for a brief period of unavailability for any reconfiguration required by a committed transaction. Also, groups continue to serve lookup requests during transactions provided that the lookups are serialized with respect to the transaction commit.

Storage service

To improve throughput for put and get operations on keys, Scatter divides the key range assigned to the Paxos group into sub-ranges and assigns these sub-ranges to nodes within the Paxos group. Each key is only assigned to one primary and is serialized by that primary. The group leader replicates information regarding the assignment of keys to primaries using Paxos, as it does with the state for multi-group operations. Once an operation is routed to the correct group for a given key, then any node in the group will forward the operation to the appropriate primary. The primaries can run Paxos on the keys assigned to themselves concurrently with each other because this does not result in a conflict: it is OK to have different keys updated at the same time, since linearizability is a per key property.

Scatter provides linearizable storage within a given key and does not attempt to linearize multi-key application transactions.  A read is served by a primary within the Paxos group which is responsible for that key. The primary uses leader lease with the rest of the nodes. It is possible to provide weaker consistency reads, as is default in ZooKeeper, by reading from one node in the group.

Figure 7 plots the probability of group failure for different group sizes for two node churn rates with node lifetimes drawn from heavy-tailed Pareto distributions observed in typical peer-to-peer systems. The plot indicates that a modest group size of 8-12 prevents group failure with high probability. The prototype implementation in the paper demonstrates that even with these very short node lifetimes, it is possible to build a scalable and consistent system with practical performance. This was surprising to me.


They evaluate Scatter in a variety of configurations, for both micro-benchmarks and for a Twitter-style application. Compared to OpenDHT, Scatter provides equivalent performance with much better availability, consistency (i.e. linearizability), and adaptability even in very challenging environments. For example, if average node lifetimes are as short as 180 seconds, therefore triggering very frequent reconfigurations to maintain data durability, Scatter is able to maintain overall consistency and data availability, serving its reads in an average of 1.3 seconds in a typical wide area setting.

This is good performance, but to put things in context of datacenter computing, the evaluation is done with "small data". When you have many gigabytes (if not terabytes) of data assigned to each node, just to copy that data at line speed may take more time than the churn rate of the the nodes in a P2P environment.

The paper also compares Scatter against statically partitioned ZooKeeper groups. Here, the key-space partitioning was derived based on historical workload characteristics, but the inability to adapt to dynamic hotspots in the access pattern limits the scalability of the ZooKeeper-based groups deployment. Further, the variability in the throughput also increases with the number of ZooKeeper instances used in the experiment.

In contrast, Scatter's throughput scales linearly with the number of nodes, with only a small amount of variability due to uneven group sizes and temporary load skews. This is because Scatter uses ring and group operations to adapt to change in access patterns. Based on the load balancing policy in Scatter, the groups repartition their keyspaces proportionally to their respective loads whenever a group's load is a factor of 1.6 or above that of its neighboring group. As this check is performed locally between adjacent groups, it does not require global load monitoring, but it might require multiple iterations of the load-balancing operation to disperse hotspots.

Hat tip for @DharmaShukla for recommending the paper to me. The paper has inspired some design decisions in Cosmos DB.

MAD questions

1. What could be some alternative designs to solve this problem?
Instead of arranging the Paxos groups in a ring, why not have a vertical-Paxos group overseeing the Paxos groups? The vPaxos box would be assigning key ranges to Paxos groups, coordinating the group operations (split, merge, load-balance) and maintaining the configuration information of the Paxos groups. This would allow adapting to changes in workload and reconfiguring in reaction to node availability in a much faster manner than that of the P2P ring, where load-balancing is done by adjacent groups dispersing load to each other in multiple iterations.

Another problem with Scatter is that it lacks WAN locality optimization. A client may need to go across the globe to contact a Paxos group responsible for keys that it interacts with the most. WPaxos can learn and adopt to these patterns. So, while we are at it, why not replace the vanilla Paxos in the Paxos group with WPaxos to achieve client access locality adaptation in an orthogonal way. Then the final set up becomes VPaxos over-seeing groups of WPaxos deployments.

2. Would it ever be possible to replace datacenters with P2P technologies?
The paper in the introduction seems fairly optimistic: "Our interest is in building a storage layer for a very large scale P2P system we are designing for hosting planetary scale social networking applications. Purchasing, installing, powering up, and maintaining a very large scale set of nodes across many geographically distributed data centers is an expensive proposition; it is only feasible on an ongoing basis for those applications that can generate revenue. In much the same way that Linux offers a free alternative to commercial operating systems for researchers and developers interested in tinkering, we ask: what is the Linux analogue with respect to cloud computing?"

I am not very optimistic...

3. Why don't we invest in better visualizations/figures for writing papers?
This paper had beautiful figures for explaining concepts. Check Figure 4 below, it shows two groups considering different operations concurrently, visualized with thought bubbles. These figures go a long way. It is a shame we don't invest any effort in standardizing and teaching good illustration techniques to support exposition. It is even discouraged to use colors because they look faded/blended when printed in black and white. For God's sake, it is 2019, and we should level up our illustration game.

What are some other examples of papers with beautiful figures illustrating concepts? Please let me know. They are a treat to read.

Tuesday, May 14, 2019

Book Notes. Steal Like an Artist: 10 Things Nobody Told You About Being Creative

This book is by Austin Kleon, 2012. I had also wrote about his other book "Show Your Work! 10 Ways to Share Your Creativity and Get Discovered." 

Here are the 10 things nobody told you about being creative:
  1. Steal like an artist.
  2. Don’t wait until you know who you are to get started.
  3. Write the book you want to read.
  4. Use your hands.
  5. Side projects and hobbies are important.
  6. The secret: do good work and share it with people.
  7. Geography is no longer our master.
  8. Be nice. (The world is a small town.)
  9. Be boring. (It’s the only way to get work done.)
  10. Creativity is subtraction.
Kleon gave a short TEDX talk about the idea behind this book.

The title is an homage to a quote attributed to Picasso: “Good artists borrow, great artists steal.” Picasso also said: "Art is theft." It’s not just where you take things from, it's where you take them to. Here are some parts I highlighted under Section 1: "steal like an artist."

Every artist gets asked the question, "Where do you get your ideas?" The honest artist answers, "I steal them."
Every new idea is just a mashup or a remix of one or more previous ideas.

You have a mother and you have a father. You possess features from both of them, but the sum of you is bigger than their parts.
You are, in fact, a mashup of what you choose to let into your life. You are the sum of your influences. The German writer Goethe said, "We are shaped and fashioned by what we love."

Your job is to collect good ideas. The more good ideas you collect, the more you can choose from to be influenced by.
Carry a notebook and a pen with you wherever you go. Get used to pulling it out and jotting down your thoughts and observations. Copy your favorite passages out of books. Record overheard conversations. Doodle when you're on the phone.

You might be scared to start. That's natural. There's this very real thing that runs rampant in educated people. It’s called "impostor syndrome."
Ask anybody doing truly creative work, and they'll tell you the truth: They don't know where the good stuff comes from. They just show up to do their thing. Every day.
Don't just steal the style, steal the thinking behind the style. You don't want to look like your heroes, you want to see like your heroes.

As with Kleon's other books, the book has beautiful artwork.

Wednesday, May 8, 2019

Book Notes. Creativity, Inc.: Overcoming the Unseen Forces That Stand in the Way of True Inspiration

This book is by Ed Catmull, cofounder of Pixar, with Amy Wallace, 2014. The book is about the cultivation and management of creativity:
If Pixar is ever successful, will we do something stupid, too? Can paying careful attention to the missteps of others help us be more alert to our own? Or is there something about becoming a leader that makes you blind to the things that threaten the well-being of your enterprise? 
I would devote myself to learning how to build not just a successful company but a sustainable creative culture. As I turned my attention from solving technical problems to engaging with the philosophy of sound management, I was excited once again.
While reading the book, I was impressed by how many questions Ed kept asking. I thought I was asking a lot of questions, but Ed is really really into asking questions and using them to achieve focus.

Here are some parts I highlighted from the book.

From childhood to PhD

Growing up in the 1950s, I had yearned to be a Disney animator but had no idea how to go about it.

In graduate school, I’d quietly set a goal of making the first computer-animated feature film.
Walt Disney was one of my two boyhood idols. The other was Albert Einstein.

Disney’s animators were at the forefront of applied technology; instead of merely using existing methods, they were inventing ones of their own.

Every time some technological breakthrough occurred, Walt Disney incorporated it and then talked about it on his show in a way that highlighted the relationship between technology and art.

That night’s episode was called “Where Do the Stories Come From?” and Disney kicked it off by praising his animators’ knack for turning everyday occurrences into cartoons.
An artist was drawing Donald Duck, giving him a jaunty costume and a bouquet of flowers and a box of candy with which to woo Daisy. Then, as the artist’s pencil moved around the page, Donald came to life, putting up his dukes to square off with the pencil lead, then raising his chin to allow the artist to give him a bow tie.

Whether it’s a T-Rex or a slinky dog or a desk lamp, if viewers sense not just movement but intention--or, put another way, emotion--then the animator has done his or her job.

I remember the optimistic energy--an eagerness to move forward that was enabled and supported by a wealth of emerging technologies. It was boom time in America, with manufacturing and home construction at an all-time high.

The first organ transplants were performed in 1954; the first polio vaccine came a year later; in 1956, the term artificial intelligence entered the lexicon.

Then, when I was twelve, the Soviets launched the first artificial satellite--Sputnik 1--into earth’s orbit.

The United States government’s response to being bested was to create something called ARPA,

Looking back, I still admire that enlightened reaction to a serious threat: We’ll just have to get smarter.
ARPA would have a profound effect on America, leading directly to the computer revolution and the Internet, among countless other innovations.
I was a quiet, focused student in high school. An art teacher once told my parents I would often become so lost in my work that I wouldn’t hear the bell ring at the end of class;
Throughout my life, people have always smiled when I told them I switched from art to physics because it seems, to them, like such an incongruous leap. But my decision to pursue physics, and not art, would lead me, indirectly, to my true calling.
Four years later, in 1969, I graduated from the University of Utah with two degrees, one in physics and the other in the emerging field of computer science.
But soon after I matriculated, also at the U of U, I met a man who would encourage me to change course: one of the pioneers of interactive computer graphics, Ivan Sutherland.
Sutherland and Dave Evans, who was chair of the university’s computer science department, were magnets for bright students with diverse interests, and they led us with a light touch.
The result was a collaborative, supportive community so inspiring that I would later seek to replicate it at Pixar.

One of my classmates, Jim Clark, would go on to found Silicon Graphics and Netscape. Another, John Warnock, would co-found Adobe, known for Photoshop and the PDF file format, among other things. Still another, Alan Kay, would lead on a number of fronts, from object-oriented programming to “windowing” graphical user interfaces.
Not only did I often sleep on the floor of the computer rooms to maximize time on the computer, but so did many of my fellow graduate students.

Making pictures with a computer spoke to both sides of my brain.

In the spring of 1972, I spent ten weeks making my first short animated film—a digitized model of my left hand.

Professor Sutherland used to say that he loved his graduate students at Utah because we didn’t know what was impossible.

My dissertation, “A Subdivision Algorithm for Computer Display of Curved Surfaces,” offered a solution to that problem.

“Texture mapping,” as I called it, was like having stretchable wrapping paper that you could apply to a curved surface so that it fit snugly.

At the U of U, we were inventing a new language. One of us would contribute a verb, another a noun, then a third person would figure out ways to string the elements together to actually say something.
Today, there is a Z-buffer in every game and PC chip manufactured on earth.

After college      

In the next decade, I would learn much about what managers should and shouldn’t do, about vision and delusion, about confidence and arrogance, about what encourages creativity and what snuffs it out.

I’ve made a policy of trying to hire people who are smarter than I am.

Alvy and I decided to do the opposite--to share our work with the outside world.

It’s hard to imagine now, but in 1976, the idea of incorporating high technology into Hollywood filmmaking wasn’t just a low priority; it wasn’t even on the radar. But one man was about to change that, with a movie called Star Wars.

In the intervening years, George has said that he hired me because of my honesty, my “clarity of vision,” and my steadfast belief in what computers could do.

A research lab is not a university, and the structure didn’t scale well. At Lucasfilm, then, I decided to hire managers to run the graphics, video, and audio groups; they would then report to me.

For all the care you put into artistry, visual polish frequently doesn’t matter if you are getting the story right.

To this day, I am thankful that the deal went south. Because it paved the way for Steve Jobs.

Alan [Kay] had been at the U of U with me and at Xerox PARC with Alvy, and he told Steve that he should visit us if he wanted to see the cutting edge in computer graphics.

I remember his assertiveness. There was no small talk. Instead, there were questions. Lots of questions. What do you want? Steve asked. Where are you heading? What are your long-term goals? He used the phrase “insanely great products” to explain what he believed in. Clearly, he was the sort of person who didn’t let presentations happen to him, and it wasn’t long before he was talking about making a deal.

As he spoke, it became clear to us that his goal was not to build an animation studio; his goal was to build the next generation of home computers to compete with Apple. This wasn’t merely a deviation from our vision, it was the total abandonment of it, so we politely declined. We returned to the task of trying to find a buyer.

At one point in this period, I met with Steve and gently asked him how things got resolved when people disagree with him. He seemed unaware that what I was really asking him was how things would get resolved if we worked together and I disagreed with him, for he gave a more general answer. He said, “When I don’t see eye to eye with somebody, I just take the time to explain it better, so they understand the way it should be.”

In the end, Steve paid $5 million to spin Pixar off of Lucasfilm—and then, after the sale, he agreed to pay another $5 million to fund the company, with 70 percent of the stock going to Steve and 30 percent to the employees.
His method for taking the measure of a room was saying something definitive and outrageous—“These charts are bullshit!” or “This deal is crap!”—and watching people react. If you were brave enough to come back at him, he often respected it--poking at you, then registering your response, was his way of deducing what you thought and whether you had the guts to champion it.

Every few weeks, I’d head down to Steve’s office in Redwood City to brief him on our progress. I didn’t relish the meetings, to be honest, because they were often frustrating.

At Pixar’s lowest point, as we floundered and failed to make a profit, Steve had sunk $54 million of his own money into the company—a significant chunk of his net worth, and more money than any venture capital firm would have considered investing, given the sorry state of our balance sheet.
After trying everything we could to sell our Pixar Image Computer, we were finally facing the fact that hardware could not keep us going.

There is nothing quite like ignorance combined with a driving need to succeed to force rapid learning.

We began to focus our energies on the creative side. We started making animated commercials for Trident gum and Tropicana orange juice and almost immediately won awards for the creative content while continuing to hone our technical and storytelling skills.

In 1991, we laid off more than a third of our employees.

Three times between 1987 and 1991, a fed-up Steve Jobs tried to sell Pixar. And yet, despite his frustrations, he could never quite bring himself to part with us. When Microsoft offered $90 million for us, he walked away. Steve wanted $120 million, and felt their offer was not just insulting but proof that they weren’t worthy of us.
How would we resolve conflicts? And his answer, which I found comically egotistical at the time, was that he simply would continue to explain why he was right until I understood. The irony was that this soon became the technique I used with Steve. When we disagreed, I would state my case, but since Steve could think much faster than I could, he would often shoot down my arguments. So I’d wait a week, marshal my thoughts, and then come back and explain it again. He might dismiss my points again, but I would keep coming back until one of three things happened: (1) He would say “Oh, okay, I get it” and give me what I needed; (2) I’d see that he was right and stop lobbying; or (3) our debate would be inconclusive, in which case I’d just go ahead and do what I had proposed in the first place. Each outcome was equally likely, but when this third option occurred, Steve never questioned me. For all his insistence, he respected passion. If I believed in something that strongly, he seemed to feel, it couldn’t be all wrong.
Katzenberg wanted Pixar to make a feature film, and he wanted Disney to own and distribute it.
Steve took the reins, rejecting Jeffrey’s logic that since Disney was investing in Pixar’s first movie, it deserved to own our technology as well. “You’re giving us money to make the film,” Steve said, “not to buy our trade secrets.” What Disney brought to the table was its marketing and distribution muscle; what we brought were our technical innovations, and they were not for sale. Steve made this a deal breaker and stuck to his guns until, ultimately, Jeffrey agreed.

Given the millions of dollars at stake and the realization that we’d never get another chance if we blew it, we had to figure it out fast. Luckily, John already had an idea. Toy Story would be about a group of toys and a boy—Andy—who loves them. The twist was that it would be told from the toys’ point of view.

On November 19, 1993, we went to Disney to unveil the new, edgier Woody in a series of story reels—a mock-up of the film, like a comic book version with temporary voices, music, and drawings of the story. That day will forever be known at Pixar as “Black Friday” because Disney’s completely reasonable reaction was to shut down the production until an acceptable script was written.

With our first feature film suddenly on life support, John quickly summoned Andrew, Pete, and Joe. For the next several months, they spent almost every waking minute together, working to rediscover the heart of the movie, the thing that John had first envisioned: a toy cowboy who wanted to be loved. They also learned an important lesson--to trust their own storytelling instincts.

1991, two of the year’s biggest blockbusters—Beauty and the Beast and Terminator 2—had relied heavily on technology that had been developed at Pixar, and people in Hollywood were starting to pay attention. By 1993, when Jurassic Park was released, computer-generated special effects would no longer be considered some nerdy sideline experiment;

And a few months later, as if on cue, Eisner called, saying that he wanted to renegotiate the deal and keep us as a partner. He accepted Steve’s offer of a 50/50 split. I was amazed; Steve had called this exactly right. His clarity and execution were stunning.
For the first time since our founding, our jobs were safe.

Pixar as a company

The point is, we value self-expression.
What makes Pixar special is that we acknowledge we will always have problems, many of them hidden from our view; that we work hard to uncover these problems, even if doing so means making ourselves uncomfortable; and that, when we come across a problem, we marshal all of our energies to solve it.

In the coming pages, I will discuss many of the steps we follow at Pixar, but the most compelling mechanisms to me are those that deal with uncertainty, instability, lack of candor, and the things we cannot see. I believe the best managers acknowledge and make room for what they do not know—not just because humility is a virtue but because until one adopts that mindset, the most striking breakthroughs cannot occur. I believe that managers must loosen the controls, not tighten them. They must accept risk; they must trust the people they work with and strive to clear the path for them; and always, they must pay attention to and engage with anything that creates fear.
Only when we admit what we don’t know can we ever hope to learn it.
When it comes to creative inspiration, job titles and hierarchy are meaningless.

Every person there, no matter their job title, felt free to speak up. This was not only what we wanted, it was a fundamental Pixar belief: Unhindered communication was key, no matter what your position. At our long, skinny table, comfortable in our middle seats, we had utterly failed to recognize that we were behaving contrary to that basic tenet.
I discovered we’d completely missed a serious, ongoing rift between our creative and production departments. In short, production managers told me that working on Toy Story had been a nightmare. They felt disrespected and marginalized—like second-class citizens. And while they were gratified by Toy Story’s success, they were very reluctant to sign on to work on another film at Pixar. I was floored. How had we missed this?
For me, this discovery was bracing. Being on the lookout for problems, I realized, was not the same as seeing problems. This would be the idea—the challenge—around which I would build my new sense of purpose.
Because making a movie involves hundreds of people, a chain of command is essential. But in this case, we had made the mistake of confusing the communication structure with the organizational structure.

Going forward, anyone should be able to talk to anyone else, at any level, at any time, without fear of reprimand. Communication would no longer have to go through hierarchical channels.
The first principle was “Story Is King,” by which we meant that we would let nothing--not the technology, not the merchandising possibilities--get in the way of our story.

The other principle we depended on was “Trust the Process.”
While Woody would choose Andy in the end, he would make that choice with the awareness that doing so guaranteed future sadness.
For the next six months, our employees rarely saw their families. We worked deep into the night, seven days a week. Despite two hit movies, we were conscious of the need to prove ourselves, and everyone gave everything they had. With several months still to go, the staff was exhausted and starting to fray.

I had expected the road to be rough, but I had to admit that we were coming apart. By the time the film was complete, a full third of the staff would have some kind of repetitive stress injury.
Critics raved that Toy Story 2 was one of the only sequels ever to outshine the original.

Though I was immensely proud of what we had accomplished, I vowed that we would never make a film that way again. It was management’s job to take the long view, to intervene and protect our people from their willingness to pursue excellence at all costs. Not to do so would be irresponsible.

Good idea or Good team?                

If you give a good idea to a mediocre team, they will screw it up. If you give a mediocre idea to a brilliant team, they will either fix it or throw it away and come up with something better.

Getting the team right is the necessary precursor to getting the ideas right.
Getting the right people and the right chemistry is more important than getting the right idea.
Ideas come from people. Therefore, people are more important than ideas.
Why are we confused about this? Because too many of us think of ideas as being singular, as if they float in the ether, fully formed and independent of the people who wrestle with them.
Find, develop, and support good people, and they in turn will find, develop, and own good ideas.
We should trust in people, I told them, not processes. The error we’d made was forgetting that “the process” has no agenda and doesn’t have taste.

Once you’re aware of the suitcase/handle problem, you’ll see it everywhere. People glom onto words and stories that are often just stand-ins for real action and meaning.

Around this time, John coined a new phrase: “Quality is the best business plan.”
That didn’t mean that we wouldn’t make mistakes. Mistakes are part of creativity. But when we did, we would strive to face them without defensiveness and with a willingness to change.


What is the nature of honesty? If everyone agrees about its importance, why do we find it hard to be frank? How do we think about our own failures and fears? Is there a way to make our managers more comfortable with unexpected results—the inevitable surprises that arise, no matter how well you’ve planned? How can we address the imperative many managers feel to overcontrol the process? With what we have learned so far, can we finally get the process right? Where are we still deluded?
Candor is forthrightness or frankness--not so different from honesty, really. And yet, in common usage, the word communicates not just truth--telling but a lack of reserve.

A hallmark of a healthy creative culture is that its people feel free to share ideas, opinions, and criticisms. Lack of candor, if unchecked, ultimately leads to dysfunctional environments.
The Braintrust, which meets every few months or so to assess each movie we’re making, is our primary delivery system for straight talk.
Its premise is simple: Put smart, passionate people in a room together, charge them with identifying and solving problems, and encourage them to be candid with one another.
The Braintrust is one of the most important traditions at Pixar.
The passion expressed in a Braintrust meeting was never taken personally because everyone knew it was directed at solving problems.
And largely because of that trust and mutual respect, its problem-solving powers were immense.
Candor could not be more crucial to our creative process. Why? Because early on, all of our movies suck. That’s a blunt assessment, I know, but I make a point of repeating it often, and I choose that phrasing because saying it in a softer way fails to convey how bad the first versions of our films really are. I’m not trying to be modest or self-effacing by saying this. Pixar films are not good at first, and our job is to make them so—to go, as I say, “from suck to not-suck.” This idea—that all the movies we now think of as brilliant were, at one time, terrible—is a hard concept for many to grasp. But think about how easy it would be for a movie about talking toys to feel derivative, sappy, or overtly merchandise-driven. Think about how off-putting a movie about rats preparing food could be, or how risky it must’ve seemed to start WALL-E with 39 dialogue-free minutes. We dare to attempt these stories, but we don’t get them right on the first pass. And this is as it should be. Creativity has to start somewhere, and we are true believers in the power of bracing, candid feedback and the iterative process—reworking, reworking, and reworking again, until a flawed story finds its throughline or a hollow character finds its soul.
(It takes about twelve thousand storyboard drawings to make one 90-minute reel, and because of the iterative nature of the process I’m describing, story teams commonly create ten times that number by the time their work is done.)

People who take on complicated creative projects become lost at some point in the process. It is the nature of things—in order to create, you must internalize and almost become the project for a while, and that near-fusing with the project is an essential part of its emergence. But it is also confusing. Where once a movie’s writer/director had perspective, he or she loses it. Where once he or she could see a forest, now there are only trees.
You may be thinking, How is the Braintrust different from any other feedback mechanism?
The first is that the Braintrust is made up of people with a deep understanding of storytelling and, usually, people who have been through the process themselves.

The second difference is that the Braintrust has no authority. This is crucial: The director does not have to follow any of the specific suggestions given. After a Braintrust meeting, it is up to him or her to figure out how to address the feedback.
By removing from the Braintrust the power to mandate solutions, we affect the dynamics of the group in ways I believe are essential.
While problems in a film are fairly easy to identify, the sources of those problems are often extraordinarily difficult to assess.
The Braintrust’s notes, then, are intended to bring the true causes of problems to the surface—not to demand a specific remedy.
I like to think of the Braintrust as Pixar’s version of peer review, a forum that ensures we raise our game—not by being prescriptive but by offering candor and deep analysis.

The film itself—not the filmmaker—is under the microscope.
The feedback usually begins with John. While everyone has an equal voice in a Braintrust meeting, John sets the tone, calling out the sequences he liked best, identifying some themes and ideas he thinks need to be improved. That’s all it takes to launch the back-and-forth. Everybody jumps in with observations about the film’s strengths and weaknesses.
Andrew felt there was a similarly impactful opportunity here that was being missed--and, thus, was keeping the film from working--and he said so candidly. “Pete, this movie is about the inevitability of change,” he said. “And of growing up.” [Inside Out]

And it was Brad Bird who pointed that out to Andrew in a Braintrust meeting. “You’ve denied your audience the moment they’ve been waiting for,” he said, “the moment where EVE throws away all her programming and goes all out to save WALL-E. Give it to them. The audience wants it.” As soon as Brad said that, it was like: Bing! After the meeting, Andrew went off and wrote an entirely new ending in which EVE saves WALL-E, and at the next screening, there wasn’t a dry eye in the house.

“Sometimes the Braintrust will know something’s wrong, but they will identify the wrong symptom,” he told me.

Instead of saying, ‘The writing in this scene isn’t good enough,’ you say, ‘Don’t you want people to walk out of the theater and be quoting those lines?’ It’s more of a challenge. ‘Isn’t this what you want? I want that too!’

Fail early, Fail fast

Left to their own devices, most people don’t want to fail. But Andrew Stanton isn’t most people. As I’ve mentioned, he’s known around Pixar for repeating the phrases “fail early and fail fast” and “be wrong as fast as you can.” He thinks of failure like learning to ride a bike; it isn’t conceivable that you would learn to do this without making mistakes—without toppling over a few times. “Get a bike that’s as low to the ground as you can find, put on elbow and knee pads so you’re not afraid of falling, and go,” he says.

In a fear-based, failure-averse culture, people will consciously or unconsciously avoid risk.
Their work will be derivative, not innovative. But if you can foster a positive understanding of failure, the opposite will happen.
I have found that people who pour their energy into thinking about an approach and insisting that it is too early to act are wrong just as often as people who dive in and work quickly.

The overplanners just take longer to be wrong (and, when things inevitably go awry, are more crushed by the feeling that they have failed). There’s a corollary to this, as well: The more time you spend mapping out an approach, the more likely you are to get attached to it. The nonworking idea gets worn into your brain, like a rut in the mud. It can be difficult to get free of it and head in a different direction. Which, more often than not, is exactly what you must do.
To be a truly creative company, you must start things that might fail.

Fear can be created quickly; trust can’t. Leaders must demonstrate their trustworthiness, over time, through their actions—and the best way to do that is by responding well to failure. The Braintrust and various groups within Pixar have gone through difficult times together, solved problems together, and that is how they’ve built up trust in each other. Be patient. Be authentic. And be consistent. The trust will come.

Your employees are smart; that’s why you hired them. So treat them that way. They know when you deliver a message that has been heavily massaged. When managers explain what their plan is without giving the reasons for it, people wonder what the “real” agenda is. There may be no hidden agenda, but you’ve succeeded in implying that there is one. Discussing the thought processes behind solutions aims the focus on the solutions, not on second-guessing. When we are honest, people know it.
Management’s job is not to prevent risk but to build the ability to recover.

Protecting the new, the original

Originality is fragile. And, in its first moments, it’s often far from pretty. This is why I call early mock-ups of our films “ugly babies.” They are not beautiful, miniature versions of the adults they will grow up to be. They are truly ugly: awkward and unformed, vulnerable and incomplete. They need nurturing—in the form of time and patience—in order to grow.

(This reminds me of what I wrote here.)

The Ugly Baby idea is not easy to accept. Having seen and enjoyed Pixar movies, many people assume that they popped into the world already striking, resonant, and meaningful—fully grown, if you will. In fact, getting them to that point involved months, if not years, of work.
When Andrew finished his pitch, those of us in attendance were silent for a moment. Then, John Lasseter spoke for all of us when he said, “You had me at the word fish.”

To view lack of conflict as optimum is like saying a sunny day is optimum. A sunny day is when the sun wins out over the rain. There’s no conflict. You have a clear winner. But if every day is sunny and it doesn’t rain, things don’t grow. And if it’s sunny all the time—if, in fact, we don’t ever even have night—all kinds of things don’t happen and the planet dries up. The key is to view conflict as essential, because that’s how we know the best ideas will be tested and survive. You know, it can’t only be sunlight.”
For many years, I was on a committee that read and selected papers to be published at SIGGRAPH, the annual computer graphics conference I mentioned in chapter 2. These papers were supposed to present ideas that advanced the field. The committee was composed of many of the field’s most prominent players, all of whom I knew; it was a group that took the task of selecting papers very seriously. At each of the meetings, I was struck that there seemed to be two kinds of reviewers: some who would look for flaws in the papers, and then pounce to kill them; and others who started from a place of seeking and promoting good ideas. When the “idea protectors” saw flaws, they pointed them out gently, in the spirit of improving the paper—not eviscerating it. Interestingly, the “paper killers” were not aware that they were serving some other agenda (which was often, in my estimation, to show their colleagues how high their standards were). Both groups thought they were protecting the proceedings, but only one group understood that by looking for something new and surprising, they were offering the most valuable kind of protection. Negative feedback may be fun, but it is far less brave than endorsing something unproven and providing room for it to grow.
I suppose I could simply have mandated that our production managers add the cost of adding interns to their budgets. But that would have made this new idea the enemy—something to resent. Instead, I decided to make the interns a corporate expense—they would essentially be available, at no extra cost, to any department who wanted to take them on. The first year, Pixar hired eight interns who were placed in the animation and technical departments. They were so eager and hard-working and they learned so fast that every one of them, by the end, was doing real production work. Seven of them ultimately returned, after graduation, to work for us in a full-time capacity. Every year since then, the program has grown a little more, and every year more managers have found themselves won over by their young charges. It wasn’t just that the interns lightened the workload by taking on projects. Teaching them Pixar’s ways made our people examine how they did things, which led to improvements for all. A few years in, it became clear that we didn’t need to fund interns out of the corporate coffers anymore; as the program proved its worth, people became willing to absorb the costs into their budgets. In other words, the intern program needed protection to establish itself at first, but then grew out of that need. Last year, we had ten thousand applications for a hundred spots.

Whether it’s the kernel of a movie idea or a fledgling internship program, the new needs protection. Business-as-usual does not. Managers do not need to work hard to protect established ideas or ways of doing business. The system is tilted to favor the incumbent. The challenger needs support to find its footing. And protection of the new—of the future, not the past—must be a conscious effort.

“In many ways, the work of a critic is easy,” Ego [from Ratatouille] says. “We risk very little yet enjoy a position over those who offer up their work and their selves to our judgment. We thrive on negative criticism, which is fun to write and to read. But the bitter truth we critics must face is that in the grand scheme of things, the average piece of junk is probably more meaningful than our criticism designating it so. But there are times when a critic truly risks something, and that is in the discovery and defense of the new. The world is often unkind to new talent, new creations. The new needs friends.”

People want to hang on to things that work--stories that work, methods that work, strategies that work. You figure something out, it works, so you keep doing it—this is what an organization that is committed to learning does. And as we become successful, our approaches are reinforced, and we become even more resistant to change.

Up had to go through these changes--changes that unfolded over not months but years--to find its heart. Which meant that the people working on Up had to be able to roll with that evolution without panicking, shutting down, or growing discouraged. It helped that Pete understood what they were feeling.
“It wasn’t until I finished directing Monsters, Inc. that I realized failure is a healthy part of the process,” he told me. “Throughout the making of that film, I took it personally—I believed my mistakes were personal shortcomings, and if I were only a better director I wouldn’t make them.” To this day, he says, “I tend to flood and freeze up if I’m feeling overwhelmed. When this happens, it’s usually because I feel like the world is crashing down and all is lost. One trick I’ve learned is to force myself to make a list of what’s actually wrong. Usually, soon into making the list, I find I can group most of the issues into two or three larger all-encompassing problems. So it’s really not all that bad. Having a finite list of problems is much better than having an illogical feeling that everything is wrong.”
This could just be my Lutheran, Scandinavian upbringing, but I believe life should not be easy. We’re meant to push ourselves and try new things—which will definitely make us feel uncomfortable.

Status Quo

“Better the devil you know than the devil you don’t.” For many, these are words to live by. Politicians master whatever system it took to get elected, and afterward there is little incentive to change it.

Which brings us to one of my core management beliefs: If you don’t try to uncover what is unseen and understand its nature, you will be ill prepared to lead.

That couldn’t have happened if the producer of the movie--and the company’s leadership in general--hadn’t been open to a new viewpoint that challenged the status quo. That kind of openness is only possible in a culture that acknowledges its own blind spots. It’s only possible when managers understand that others see problems they don’t—and that they also see solutions.
You might say I’m an advocate for humility in leaders. But to be truly humble, those leaders must first understand how many of the factors that shape their lives and businesses are—and will always be—out of sight.

I think we’re out of the woods now, but it took a while. And all because a flawed mental model, constructed in response to a single event, had taken hold. Once a model of how we should work gets in our head, it is difficult to change.

Friday, April 12, 2019

Azure Cosmos DB: Microsoft's Cloud-Born Globally Distributed Database

It has been almost 9 months since I started my sabbatical work with the Microsoft Azure Cosmos DB team. 

I knew what I signed up for then, and I knew it was overwhelming.
It is hard not to get overwhelmed. Cosmos DB provides a global highly-available low-latency all-in-one database/storage/querying/analytics service to heavyweight demanding businesses. Cosmos DB is used ubiquitously within Microsoft systems/services, and is also one of the fastest-growing services used by Azure developers externally. It manages 100s of petabytes of indexed data, and serves 100s of trillions of requests every day from thousands of customers worldwide, and enables customers to build highly-responsive mission-critical applications.
But I underestimated how much there is to learn about, and how long it would be to develop a good sense of the big picture. By "developing a good sense of the big picture", I mean learning/internalizing the territory myself, and, when looking at the terrain, being able to appreciate the excruciatingly tiny and laborious details in each square-inch as well.

Cosmos DB has many big teams working on many different parts/fronts at  any point. I am still amazed by the activity going on simultaneously in so many different fronts. Testing, security, core development, resource governance, multitenancy, query and indexing, scale-out storage, scale-out computing, Azure integration, serverless functions, telemetry, customer support, devops, etc. Initially all I cared for was the core development front and the concurrency control protocols there. But overtime via diffusion and osmosis I came to learn, understand, and appreciate the challenges in other fronts as well.

It may sound weird to say this after 9 months, but I am still a beginner. Maybe I have been slow, but when you join a very large project, it is normal that you start working on something small, and on a parallel thread you gradually develop a sense of an emerging big picture through your peripheral vision. This may take many months, and in Microsoft big teams are aware of this, and give you your space as you bloom.

This is a lot like learning to drive. You start in the parking lots, in less crowded suburbs, and then as you get more skilled you go into the highway networks and get a fuller view of the territory. It is nice to explore, but you wouldn't be able to do it until you develop your skills. Even when someone shows you the map view and tell you that these exists, you don't fully realize and appreciate them. They are just concepts to you, and you have a very superficial familiarity with them until you start to explore them yourself.

As a result, things seem to move slowly if you are an engineer working on a small thing, say datacenter automatic failover component, in one part of the territory. Getting this component right may take months of your time. But it is OK. You are building a new street in one of the suburbs, and this adds up to the big picture. The big picture keeps growing at a steady rate, and the colors get more vibrant, even though individually you feel like you are progressing slowly.

"A thousand details add up to one impression." -Cary Grant 
"The implication is that few (if any) details are individually essential, while the details collectively are absolutely essential." -McPhee

When I started in August, I had the suspicion that I needed to unpack this term: "cloud-native".
(The term "cloud-native" is a loaded key term, and the team doesn't use it lightly. I will try to unpack some of it here, and I will revisit this in my later posts.)
Again, I underestimated how long it would take me to get a better appreciation of this. After learning about the many fronts things are progressing, I was able to get a better understanding of how important and challenging this is.

So in the rest of the post, I will try to give an overview of things I learned about what a cloud-native distributed database service does. It is possible to write a separate blog post for each paragraph, and I hope to expand on each in the future.

What does a cloud-born distributed database look like?

On the quest to achieve cost-effective, reliable and assured, general-purpose and customizable/flexible operation, a globally distributed cloud database faces many challenges and mutually conflicting goals in different dimensions. But it is important to hit as high a mark in as many of the dimensions as possible.

While several databases hit one or two of these dimensions,  hitting all these dimensions together is very challenging.  For example, the scale challenge becomes much harder when the database needs to provide guaranteed performance at any point in the scale curve. Providing virtually unlimited storage becomes more challenging when the database also needs to meet stringent performance SLAs at any point. Finally, achieving these while providing tenant-isolation and managing resources to prevent any tenant from impacting the performance of others is extra challenging, but is required for providing and cost-efficient cloud database.

Trying to add support and realize substantial improvements for all of these dimensions does not work as an afterthought. Cosmos DB is able to hit a high mark in all these dimensions because it is designed from the ground up to meet all these challenges together. Cosmos DB provides a frictionless cloud-native distributed database service via:

  • Global distribution (with guaranteed consistency and latency) by virtue of transparent multi-master replication,
  • Elastic scalability of throughput and storage worldwide (with guaranteed SLAs) by virtue of horizontal partitioning, and
  • Fine grained multi-tenancy by virtue of highly resource-governed system stack all the way from the database engine to the replication protocol.

To realize these, Cosmos DB uses a novel nested distributed replication protocol, robust scalability techniques, and well-designed resource governance abstractions, and I will try to introduce these next.

Scalability, Global Distribution, and Resource Governance

To achieve high-scalability and availability, Cosmos DB uses a novel protocol to replicate data across nodes and datacenters with minimal latency overheads and maximum throughput. For scalability, data is automatically sharded guided by storage and throughput triggers, and the shards are assigned to different partitions. Each partition is served by a multiple node replica-set in each region, with one node acting as a primary replica to handle all write operations and replication within the replicaset. Reading the data is carried out from a quorum of secondary replicas by the clients. (Soft-state scale out storage and computation implementations provide independent scaling for storage-heavy and computation-heavy workloads.)

Since the replica-set maintains multiple copies of the data, it can mask some replica failures in a region without sacrificing the availability even in the strong consistency mode. (While two simultaneous replica failure may be rare, it is more common to observe one replica failure while another replica is unavailable due to rolling upgrades, and masking two replica failures can help for smooth uninterrupted operation for the cloud database.) Each region contains an independent configuration manager to maintain system configuration and perform leader election for the partitions. Based on the membership changes, the replication protocol also reconfigures the size of read and write quorums.

In addition to the local replication inside a replica-set, there is also geo-replication which implements distribution across any number of Azure regions ---50+ of them. Geo-replication is achieved by a nested consensus distribution protocol across the replica-sets in different regions. Cosmos DB provides multimaster active-active replication in order to allow reads and writes from any region. For most consistency models, when a write originates in some region, it becomes immediately available in that region while being sent to an arbiter (ARB) for ordering and conflict resolution. The ARB is a virtual process that can co-locate in any of the regions. It uses the distribution protocol to copy the data to primaries in each region, which then replicate the data in their respective regions. The Cosmos DB distribution and replication protocols are verified at the design level with TLA+ model checker and the implementations are further tested for consistency problems using (an MS Windows port of) Jepsen tests.

Cosmos DB is engineered from the ground up with resource governance mechanisms in order to provide an isolated provisioned throughput experience (backed up by SLAs), while achieving high density packing (where 100s of tenants share the same machine and 1000s share the same cluster). To this end, Cosmos DB define an abstract rate-based currency for throughput, called Request Unit or RU (plural, RUs) that provide a normalized model for accounting the resources consumed by a request, and charge the customers for throughput across various database operations consistently and in a hardware agnostic manner. Each RU combines a small share of CPU, memory and storage IOPS. Tenants in Cosmos DB control the desired performance they need from their containers by specifying the maximum throughput RUs for a given container. Viewed from the lens of resource governance, Cosmos DB is a massively distributed queuing system with cascaded stages of components, each carefully calibrated to deliver predictable throughput while operating within the allotted budget of system resources, guided by the principles of Little's Law and Universal Scalability Law. The implementation of resource governance leverages on the backend partition-level capacity management and across-machine load balancing.

Contributions of Cosmos DB

By leveraging on the cloud-native datastore foundations above, Cosmos DB provides an almost "one size fits" database. Cosmos DB provides local access to data with low latency and high throughput, offers tunable consistency guarantees within and across datacenters, and provides a wide variety of data models and APIs leveraging powerful indexing and querying support. Since Cosmos DB supports a spectrum of consistency guarantees and works with a diverse set of backends, it resolves the integration problems companies face when multiple teams use different databases to meet different use-cases.

Vast majority of other data-stores provide either eventual, strong consistency, or both. In contrast, the spectrum of consistency guarantees Cosmos DB provides meets the consistency and availability needs of all enterprise applications, web applications, and mobile apps. Cosmos DB enables applications decide what is best for them and to make different consistency-availability tradeoffs. An eventually-consistent store allows diverging state for a period of time in exchange for more availability and lower latency and maybe suitable for relaxed consistency applications, such as recommendation systems or search engines. On the other hand, a shopping cart application may require a "read your own writes" property for correct operation. This is served by the session consistency guarantee at Cosmos DB. Extending this across many clients requires provides "read latest writes" semantics and requires strong consistency. This ensures coordination among many clients of the same application and makes the job of the application developer easy. The strong consistency is preserved both within a single region as well as across all associated regions. The bounded staleness consistency model guarantees any read request returns a value within the most recent k versions or t time. It offers global total order except within the staleness window, therefore it is a slightly weaker guarantee than the strong consistency.

Another way Cosmos DB manages to become an all-in-one database is by providing multiple APIs and serving different data models. Web and mobile applications need a spectrum of choices/alternatives for their data models: some applications take advantage of simpler models, such as key-value or column-family, and yet some other applications require more specialized data models, such as document and graph stores. Cosmos DB is designed to support APIs and data models by projecting the internal document store into different representations depending on the selected model. Clients can pick between document, SQL, Azure Table, Cassandra, and Gremlin graph APIs to interact with their datastore. This not only provides great flexibility for our clients, but also allows them to effortlessly migrate their applications over to Cosmos DB.

Finally, Cosmos DB provides the tightest SLAs in the industry. In contrast to many other databases that provide only availability SLAs, Cosmos DB also provides  consistency SLAs, latency SLAs, and throughput SLAs. Cosmos DB guarantees the read and write operations to take under 10ms at the 99th percentile for a typical 1KB object. The SLAs for throughput guarantee that the clients to receive the throughput equivalent to the resources provisioned to the account via RUs.

Tuesday, April 9, 2019

Book Notes. Show Your Work! 10 Ways to Share Your Creativity and Get Discovered, by Austin Kleon

I recently read this book on Kindle. I really liked this short book, and in general, Austin Kleon's work. These are my notes, without context.

A new way of operating

But it’s not enough to be good. In order to be found, you have to be findable. I think there’s an easy way of putting your work out there and making it discoverable while you’re focused on getting really good at what you do.

Almost all of the people I look up to and try to steal from today, regardless of their profession, have built sharing into their routine. These people aren’t schmoozing at cocktail parties; they’re too busy for that. They’re cranking away in their studios, their laboratories, or their cubicles, but instead of maintaining absolute secrecy and hoarding their work, they’re open about what they’re working on, and they’re consistently posting bits and pieces of their work, their ideas, and what they’re learning online. Instead of wasting their time “networking,” they’re taking advantage of the network. By generously sharing their ideas and their knowledge, they often gain an audience that they can then leverage when they need it — for fellowship, feedback, or patronage.

If Steal Like an Artist was a book about stealing influence from other people, this book is about how to influence others by letting them steal from you.

You don't have to be a genius

If you look back closely at history, many of the people who we think of as lone geniuses were actually part of “a whole scene of people who were supporting each other, looking at each other’s work, copying from each other, stealing ideas, and contributing ideas.” *Scenius* doesn’t take away from the achievements of those great individuals; it just acknowledges that good work isn’t created in a vacuum, and that creativity is always, in some sense, a collaboration, the result of a mind connected to other minds.

Amateurs are not afraid to make mistakes or look ridiculous in public. They’re in love, so they don’t hesitate to do work that others think of as silly or just plain stupid.

Mediocrity is, however, still on the spectrum; you can move from mediocre to good in increments. The real gap is between doing nothing and doing something.

Think process, not product

The best way to get started on the path to sharing your work is to think about what you want to learn, and make a commitment to learning it in front of others. Find a scenius, pay attention to what others are sharing, and then start taking note of what they’re not sharing. Be on the lookout for voids that you can fill with your own efforts, no matter how bad they are at first. Don’t worry, for now, about how you’ll make money or a career off it. Forget about being an expert or a professional, and wear your amateurism (your heart, your love) on your sleeve. Share what you love, and the people who love the same things will find you.

Whether you share it or not, documenting and recording your process as you go along has its own rewards: You’ll start to see the work you’re doing more clearly and feel like you’re making progress. And when you’re ready to share, you’ll have a surplus of material to choose from.

Share something small everyday

Once a day, after you’ve done your day’s work, go back to your documentation and find one little piece of your process that you can share. Where you are in your process will determine what that piece is. If you’re in the very early stages, share your influences and what’s inspiring you. If you’re in the middle of executing a project, write about your methods or share works in progress. If you’ve just completed a project, show the final product, share scraps from the cutting - room floor, or write about what you learned. If you have lots of projects out into the world, you can report on how they’re doing — you can tell stories about how people are interacting with your work.

Don’t show your lunch or your latte; show your work.

Don’t worry about everything you post being perfect. Science fiction writer Theodore Sturgeon once said that 90 percent of everything is crap. The same is true of our own work. The trouble is, we don’t always know what’s good and what sucks. That’s why it’s important to get things in front of others and see how they react.

Don’t say you don’t have enough time. We’re all busy, but we all get 24 hours a day. People often ask me, “How do you find the time for all this ?” And I answer, “I look for it.” You find time the same place you find spare change: in the nooks and crannies.

Absolutely everything good that has happened in my career can be traced back to my blog.

Tell good stories

“The problem with hoarding is you end up living off your reserves. Eventually, you’ll become stale. If you give away everything you have, you are left with nothing. This forces you to look, to be aware, to replenish... Somehow the more you give away, the more comes back to you.” — Paul Arden

There’s not as big of a difference between collecting and creating as you might think. A lot of the writers I know see the act of reading and the act of writing as existing on opposite ends of the same spectrum: The reading feeds the writing, which feeds the reading.

If you want to be more effective when sharing yourself and your work , you need to become a better storyteller .

Teach what you know

Teaching doesn’t mean instant competition. Just because you know the master’s technique doesn’t mean you’re going to be able to emulate it right away.

In their book, Rework, Jason Fried and David Heinemeier Hansson encourage businesses to emulate chefs by out - teaching their competition. “What do you do? What are your ‘recipes’? What’s your ‘cookbook’? What can you tell the world about how you operate that’s informative, educational, and promotional?” They encourage businesses to figure out the equivalent of their own cooking show.

The minute you learn something, turn around and teach it to others. Share your reading list.

When you share your knowledge and your work with others , you receive an education in return.

Make stuff you love and talk about stuff you love and you’ll attract people who love that kind of stuff. It’s that simple.

Sunday, April 7, 2019

How I pulled the fire alarm in my apartment complex

This is an embarrassing story for me. But I guess I should talk about it for sake of transparency. You have a right to know what kind of person's paper reviews, TLA+ models, distributed systems musings, Paxos commentary, and life/research advice you are reading. Yep, I am the kind of goofy guy that triggers the fire alarm for the entire complex unknowingly but inevitably.

The Background

This happened on Sunday, August 5, 2018. (Yes, I have been too embarrassed to post this any earlier.) It was only 6 days after we moved into our apartment as part of my sabbatical at Microsoft Cosmos DB.

So that morning, we were about to leave home for grocery shopping. With 3 kids leaving the house is a ...process. You get them ready, you convince them about leaving house. And they somehow can detect desperation in you and drag their feet if you like them to leave soon. It seems like my kids derive pleasure on being the last one to leave the house, the last one to put on shoes, or the last one to get into the car. Any ways this takes a good 20 minutes some days.

So the process is happening and they are slowly preparing to leave. And I leave the apartment to start waiting for them outside the door, in order to show them how serious I am about leaving and we are about to leave.

Just outside our door, there is this red box for fire alarm just outside our apartment. This is normally where the doorbell should be in any sensibly designed building. But for some reason, in our apartment complex, this is where they decided the fire alarm should be.

I am very distressed about this fire alarm being right next to our door. I am scared that one of my kids, maybe the 7 year old, or the 3.5 year old would open this box and then set off the fire alarm.

Trainwreck in motion  

So I am thinking that I should take a closer look at this box, and see if my kids might accidentally set it off. This way I can warn them about this danger-box.

The box is dusty, so I dust it off. And I proceed to open the box, to see what kind of button or arm mechanism the box has inside. This is because I want to tell my kids to stay away from it. You know, for science. (Yeah, that is not very solid reasoning, which will become clear to me soon.)

Anyhow, I open the box; it opens easily. And in this 3 seconds after opening the box, I realize I have gone too far and did something very wrong. Of course the arm is the box itself, there is no arm or button in the box!!

And 1, 2, 3... The shit hits the fan. All the fire alarms in the apartment complex starts screaming!

I try to close the box, but of course it is triggered. And can't be closed.

The fire alarm is too loud. And first I am in denial. This can't be happening. I shouldn't have set off the fire alarm for the entire complex by opening this box, I am innocent.

Every one starts leaving their apartments, and start looking around to see what is wrong. I am still standing in front of the box, trying to figure out what to do. I can't think of anything except that this can not happening.

The old lady above our apartment walks outside and asks me what happened. I told her I set off the fire alarm accidentally. She said she will look for the number of the apartment manager so they can come reset it. Another lady next door comes out alarmed, I also admit her that I had triggered the alarm.

Now there is a crowd in front of the apartment complex. And I have to come clean to them. An old guy from the adjoint apartment block says "Jesus" after hearing my explanation. I think he might have said "what a dork" under his breath. And he tells others "He set off the alarm". Again, I am pretty desensitized to all this because I am still shocked by this thing.

The alarms are blaring and the old lady tells me we should call 911 and ask the firefighters to come reset the thing. I call 911, and ask the firefighters. We are next to the fire station so it takes them 3 minutes to get here. Of course they come with the big red truck, and all in suites. They can not be unprepared. They always need to arrive with their entire equipment and tools. Those may be need.

I show them the box. One of them takes a look. But then of course before resetting the box, they need to first disarm the control panel so the sound can stop. Our neighbor tells them about it. Turns out the plans they brought for the apartment complex is outdated, and they didn't have this marked up in there. So a couple of them goes that way. And after a minute, the sounds stop. Oh, bliss, finally.

One of the firefighter says to the waiting crowd, "Thank you everyone  for adhering the rules. Going out is the right thing to do when fire alarm sets off". The firefighter is friendly but rubs it in my face that I set off the alarm.

He then proceeds to the box with a ring of 20-30 keys. He tries them one by one on the box to find a fitting key. These must be the master keys for the companies that build fire alarms. One of them fits eventually. And the box is reset, triggered, and shut down again.

The mousetrap is set again.

Then another firefighter goes back to the control panel behind the building to arm the system again. They are done. This is probably all within 15 minutes from when I set it off to when they are done. The firefighter chief says this was not all in vain, they get to update their map and plan for the complex. Chaos engineering anyone.

And a good thing is that there is no fee for setting the fire-alarm accidentally. Phew...

The aftermath 

My son, 11 years old, is very disappointed in me. He says "Dad, how could you not know? Even kids know about the fire alarms and to leave them alone." He spends the entire day in being very disappointed in me. But things are normal after a day. And I start joking with him afterwards by pretending to reach out to the box to get him mad.

My wife is only mildly angry with me. After the shock is over, and we are finally on our way to shopping for groceries, she tells me: "Why are you like this? And why am I not surprised you would do something this goofy?"

When I told the story to my PhD students, Ailidani and Aleksey, they are amused. But again they are not very surprised. Somehow this is something they expect their advisor would do. Set off the fire alarm, while checking it.

But for my defense, this box is like a mousetrap for the ADD-me. It is not at all obvious to me that the arm is the box cover. When I see a box, my instinct is to open it to see what is inside.

What kind of affordance is this? It is definitely not obvious that this is an arm.

Friday, April 5, 2019

The aging puzzle

Recently I came across this site. Looks like Sebastian Baltes has been doing interesting work. He studies the human aspects of software engineering. He interviews software engineers to learn about their work habits and processes, for example, how they achieve software development expertise by practicing through some tasks, and how this expertise helps them perform better in other software development tasks.

One of the things he investigated is how aging effects performance for software development. He inspects this through self-reporting of the developers, so the data is subjective and not empirical. The developers reported they felt their short term memory became limited and it got harder to write code as they aged. Some of the developers said "when you are young you are more competitive, but as you get old, you don't feel like competing that much."

Is this true? What do we have to look forward to as we age?

What does the data say?

"Young people are just smarter." 
--Mark Zuckerberg (2007), when he was 24

It turns out, among some other things, Mark Zuckerberg got this very wrong. Longstanding beliefs say the adult brain is best in its youth, but research now suggests otherwise. The middle-aged mind preserves many of its youthful skills and even develops some new strengths. I was surprised to learn that bilateralization of the brain is a real thing:
Several groups, including Grady’s, have also found that older adults tend to use both brain hemispheres for tasks that only activate one hemisphere in younger adults. Younger adults show similar bilateralization of brain activity if the task is difficult enough, Reuter-Lorenz says, but older adults use both hemispheres at lower levels of difficulty. 
The strategy seems to work. According to work published in Neuroimage (Vol. 17, No. 3) in 2002, the best-performing older adults are the most likely to show this bilateralization. Older adults who continue to use only one hemisphere don’t perform as well.

There have been studies of the effects of aging on professors' publication records. These studies show that there was no slow down of publications with age. 

But it looks like there is no data on whether or how developers' performance degrade as they age.

My speculations

The old professors I interacted with throughout my career were sharp, and some of them surprisingly sharp. I don't think the aging puts a big strain on the brain. As you age, the body starts to suffer first, not the brain. If you take care of your body, most importantly if you are able to keep slim and  sleep well at night, you can get a lot of mileage from your brain.

I think the self-reported observations from old knowledge workers may have many underlying causes. One cause could be confirmation bias. People may be getting more sensitive about age (which is especially the case in an ageist work environment), and pay more attention to this. Inevitably they see what they look for, and take this as real.

For the some of the old developers the loss of meaning could be a problem. When things get too monotonous and development work loses its novelty, it would be hard to extract meaning from the job.

It is often said that old people are slow to adopt new things. Douglas Adams has a famous quote, which you may have seen:
1. Anything that is in the world when you're born is normal and ordinary and is just a natural part of the way the world works.
2. Anything that's invented between when you're 15 and 35 is new and exciting and revolutionary and you can probably get a career in it.
3. Anything invented after you're 35 is against the natural order of things. 

While this is witty, this is not true for the 35+ years old I interact with in the academia and in the tech industry.

I think the first part is true. As a kid, you adapt quickly and since you don't have much experience, you don't question or reject something. But as you grow up, you develop some taste, so you may reject some things, even when you are between 15-35. And after 35, maybe you have too many experiences and scars, which may be make you more cautious and closed-minded about things.

In any case this may be an important point. To avoid falling behind technology and keep your edge as you get older, it would help to keep your curiosity alive. The other day, I got into an elevator with my 4 year old daughter, and she was utterly delighted by the elevator. I envied her and wished I could get that excited about things. But it is possible to keep curiosity alive and it is possible to look at things in a new light by being more mindful about this.

Finally, it is a fact that young people are statistically more likely to be risk takers.
But there is one talent that does decline over time—our willingness to take risks. For evolutionary reasons, risk-taking peaks between the ages of 17 to 27, then drops off precipitously.
Well, risk taking is not necessarily good, as it often does not pay off. The survivorship bias means that only the successful risk-takers gets all the publicity. On the other hand, I agree that old people may tend to get overly conservative and cautious. But, do you know why old people check 3 times if they locked the door or turned off the oven? Because they have been bitten by it before.

To avoid becoming overly-cautious and overly risk-averse, we may need to reset our attitudes every 5-10 years or so. Some people say psychedelics help for covering over the old beaten tracks, and resetting bad habits/thoughts. I think just taking time off, going on a journey, and reflecting on our behavior/attitudes could be very effective solutions for this.

MAD Questions 

This entire thing was already very speculative anyways. So I will be lazy and leave it with one MAD question.

How do we dig our way out of ageism?

Many people say (and I agree) that ageism in software industry is a real problem.
The software industry is overwhelmingly young. The median age of Google and Amazon employees is 30, whereas the median age of American workers is 42. A 2018 Stack Overflow survey of 100,000 programmers around the world found that three-quarters of them were under 35. Periodic posts on Hacker News ask, "What happens to older developers?" Anxious developers in their late thirties chime in and identify themselves as among the "older." 
Kevin Stevens, a 55-year-old programmer, faced a similar attitude when he applied for a position at Stack Exchange six years ago. He was interviewed by a younger engineer who told him, "I'm always surprised when older programmers keep up on technology." Stevens was rejected for the job. He now works as a programmer at a hospitality company where he says his age is not an issue.

How do we dig our way out of this situation? Could the capitalist free market offer a solution eventually? How likely is it that we will see a company that disrupts the agist companies by hiring and making better use of older developers?

Two-phase commit and beyond

In this post, we model and explore the two-phase commit protocol using TLA+. The two-phase commit protocol is practical and is used in man...