Tuesday, June 19, 2018

Academic/research impact

impact (n)
1. the action of one object coming forcibly into contact with another
2. the effect or influence of one person, thing, or action, on another

Impact (with the second meaning :-) is an often talked about desirable thing in academic and research world. Impact is especially important for promoting an associate professor to a full professor status.

Unfortunately there is no well-defined description or an objective way to quantify impact. There are approximated metrics, each with significant flaws:
  • Citation count is easily quantifiable --Google Scholar shows your citation count and h-index. However, this can be tricked. Some journals have boosted their impact factors by asking accepted articles to cite other articles from the journal. Publishing in popular/crowded areas (e.g., wireless networks) can also boost citation count. It is possible to have high citation count with little impact, and little citation count with high impact. As a personal anectode, many of the work I am most proud of has low citation count, and I have not anticipated how some of my quick "low-hanging-fruit" papers gathered a lot of citations.
  • Name recognition/identity is not quantifiable and varies depending on who you ask. Also this can be tricked by owning a very tiny corner without having much impact.
  • Number of PhD students graduated may not have a strong correlation with impact.
  • Best paper awards are great, but let's be cynical here. The awards show you know how to write good papers: one may even argue, you are a crowd-pleaser, you are too much into group-think/community-consensus. Truly innovative papers fail to get best paper awards, heck, they get rejected for many years. They only get test-of-time and hall-of-fame awards in retrospective. 
  • Grants are nice, but being cynical you can argue, what they truly measure is how good you are at telling what the reviewers want to hear.

OK, enough of being cynical. Of course these metrics capture a lot of value and approximate impact in most cases. But my point is, it is easy to fake impact-potential. It is easy to play the game and improve those metrics, without having impact.

Recognizing impact

OK, it may be hard to truly recognize impact except in retrospect. I will now argue that even in retroscpect, there is contention in recognizing impact. What are the dimensions of impact, which dimensions are more important? There is disagreement even about that.

I recently performed a series of twitter polls on this. Here are the results with my commentary.

Although majority votes favored improving a huge fraction of lives in tiny amount over improving a tiny fraction of lives in huge amount, it is easy to make a case for the latter as well. In fact, I was thinking that would be the winning option---I would have voted for that. Many web services improve huge number of lives in a tiny way, but how much does that matter? How bad would you be off with a little bit of inconvenience. On the other hand a huge improvement in a tiny fraction of people's lives (disabled, poor, sick, etc.) is a big deal for them. Isn't that more impactful? (I tear up watching these videos every time. Even the colorblind seeing color the first time... It is a big deal for them.)

This poll returned an even split between small current improvement over a bigger improvement potential. I would have voted for big improvement potential: the power of transformational is orders of magnitude more than the incremental. However, I don't deny that small incremental improvements may also form critical mass and stepping stones and can lead to transformational improvements as well.



These two showed strong preference for novelty over completeness. It is possible to make a case the other way as well. Let's say the first/simple solution gets stuck 1% of the case and the improved solution covers that. This may make or break the practicality of the solution at large scale. Corner cases and performance problems are deal breakers at large scale.

This surprised me in a good way. There was a great appreciation that defining/identifying a problem is very important, and with that out of the way the solution becomes possible and often easy.

This also showed there is good appreciation of theory that enables later practical applications.

This one was a bit disappointing for me. Atomic clocks enabled GPS, so I think it had more impact than GPS. Atomic clocks is a general tool and enables other applications applications: long-baseline interferometry in radioastronomy, detecting gravity-waves, and even used in georeplicated distributed databases.

DARPA has been boasting about its funding atomic clocks research as a use-inspired basic research project. At the time of funding, the applications of precise clocks was unclear, but DARPA funded physicists to work on it. And once atomic clocks were built, applications started to emerge. Around 2000-2010 DARPA has been mentioning this in most of their presentations (I haven't been to DARPA presentations recently). DARPA has been proud of this, and wanted to solicit other work like this. These presentations included this diagram and urged for other use-inspired basic research.

PurityThere is something to be said for purity/abstractness. Abstractness can imply generality and may mean more impact. On the other hand, there may also be a phase transition in terms of impact/practicality in the abstractness spectrum. Sometimes things get too abstract and abstruse. But it may be hard to differentiate when that happens and draw a line. Maybe the solution is use-inspired research/theory. It may help to have some practical relevance considerations while doing theory. It may help to do theory that can help as a tool for enabling other work.

Is it possible to optimize/plan for impact?

After my twitter poll frenzy, a friend forwarded me this talk: why greatness cannot be planned: the myth of the objective. I highly recommend the talk. While I don't agree with all conclusions there, it has merit, and gets several things right.

The talk (and I guess the book as well) argues that having research objectives does not help in achieving them and can even be harmful, because the stepping stones don't resemble the end objective. So the advice given is: don't plan for impact/objectives, just do interesting stuff!

That has also been my conclusion thinking about research impact many years in the back of my mind. Instead of trying to chase impact, it is more impactful to do interesting work that you enjoy, and to do what only you can do. I wrote this in 2014:

So publishing less is bad, and publishing more does not guarantee you make an impact. Then what is a good heuristic to adopt to be useful and to have an impact? 
I suggest that the rule is to "be daring, original, and bold". We certainly need more of that in the academia. The academia moves more like a herd, there are flocks here and there mowing the grass together. And staying with the herd is a conservative strategy. That way you avoid becoming an outlier, and it is easier to publish and get funded because you don't need justify/defend your research direction; it is already accepted as a safe research direction by your community. (NSF will ask to see intellectual novelty in proposals, but the NSF panel reviewers will be unhappy if a proposal is out in left field and is attempting to break new ground. They will find a way to reject the proposal unless a panelist champions the proposal and challenges the other reviewers about their concocted reasons for rejecting. As a result, it is rare to see a proposal that suggests truly original/interesting ideas and directions.) 
To break new ground, we need more mavericks that leave the herd and explore new territory in the jungle. Looking at the most influential names in my field of study, distributed systems, I see that Lamport, Dijkstra, Lynch, Liskov were all black sheep. 
The interesting thing is, to be the black sheep, you don't need to put on an act. If you have a no bullshit attitude about research and don't take fashionable research ideas/directions just by face value, you will soon become the black sheep. But, as Feynman used to say "what do you care what other people think?". Ignore everybody, work on what you think is the right thing.
Word of caution!! While being different and being black sheep can speed up your way to the top, it works only after you paid the price. It is a very steep climb up there, and you need many grueling years of preparation.

Of course, doing stuff that is interesting is a very subjective thing.  That may be ok, when choosing what to work on. But evaluating "interesting/novelty" is too subjective. It takes us back in full circle in admitting there is no well-defined description or an objective way to quantify impact.  As I speculate in MAD question 3, I believe we deserve better metrics and more research into impact metrics.

MAD questions

1. What is the most impactful research you know?  What are its prominent characteristics?  Is it an enabling tool for other work? Is it a simple idea (paradigm shift)? Is it a synthesis of other ideas? Is it a tour-de-force complex technical solution?

You can vote/comment on this by Wednesday June 20.
2. What heuristics do you employ for increasing impact in your research?

3. In team sports, we select an mvp as the player with a high profile/visibility, but in reality the real mvp could have low profile in terms of scoring and the metrics tracks but still provides more value. When I was a child, assists were not tracked in soccer, maybe it was too much of a hassle. What about a defender that played strategically and blocked attacks before they developed? How do we credit that? Moneyball (Michael Lewis) shed some light on this.
The central premise of Moneyball is that the collective wisdom of baseball insiders (including players, managers, coaches, scouts, and the front office) over the past century is subjective and often flawed. Statistics such as stolen bases, runs batted in, and batting average, typically used to gauge players, are relics of a 19th-century view of the game and the statistics available at that time. Before sabermetrics was introduced to baseball, teams were dependent on the skills of their scouts to find and evaluate players. Scouts are those who are experienced in the sport, usually having been involved as players or coaches. The book argues that the Oakland A's' front office took advantage of more analytical gauges of player performance to field a team that could better compete against richer competitors in Major League Baseball (MLB).

Coming back to research/academic impact, don't we need/deserve a better more informed approach for tracking and identifying impact? Are there severely underrated work/researchers that have had huge impact that went unnoticed?

Wednesday, June 13, 2018

About self-stabilization and blockchain consensus

This is a half-baked exploratory piece about how some self-stabilization ideas may relate with blockchain consensus in open/permissionless environments. I will write about some similarities and some major differences between the self-stabilizing distributed algorithms work and blockchain consensus deployments.

Self-stabilization

Let's start with a brief review self-stabilization.

Stabilization is a type of fault-tolerance that handles faults in a principled unified manner instead of on a case-by-case basis. Instead of trying to figure out how much faults can disrupt the system's operation, stabilization assumes arbitrary state corruption, which covers all possible worst-case collusions of faults and program actions. Stabilization then advocates designing recovery actions that takes the program back to invariant states starting from any arbitrary state. This makes stabilization suitable for dealing with unanticipated faults.

The most distinct property of self-stabilization is its emphasis on liveness and providing eventual safety. When the invariant is violated, liveness still holds, and the system makes progress from faulty-states moving closer to invariant states until eventually safety is reestablished. In that sense stabilization is related to eventual-consistency approaches as well.

If you like to see an example, here is Dijkstra's self-stabilizing token ring program. In this problem, the processes are arranged in a ring fashion and there is a unique token circulating in this ring. (You can think of the token may be providing mutual exclusion to the processes; whichever process has the token can access the critical-section/shared-resource).

The similarity between self-stabilization and blockchain consensus is that they are both OK with finite-duration divergence of state/consistency.  In (most) blockchain consensus protocols, you may have a fork, which is resolved eventually. In Bitcoin the fork is resolved with addition of new blocks and using the longest-chain rule with time. Avalanche uses DAG not a chain, so the "fork" is resolved using increased confidence values with time. (In other words, the unpopular branches in DAG atrophy over time.)

In stabilization literature, there are distantly related approaches for blockchain consensus. I would say two important/early ones are Error-detecting codes and fault-containing self-stabilization (2000) and Probabilistic self-stabilization (1990).

Also, the undecided-state dynamics, population protocols by Aspnes et.al., and
self-* properties through gossiping are somewhat related work for Avalanche family consensus protocols.

State-management perspective

There are big differences in self-stabilization and blockchain attitude for state.

Stabilization likes to use soft-state and minimal state. That helps with efficient/lightweight eventual consistency.

Blockchain achieves eventual-consistency with hard-state, and lots of it. This is achieved through full replication at each node and using error-checking codes. While the stabilization approach does not like keeping history (because it can be corrupted), blockchain approach embraces history. However, blockchain maintains history in such a way that corruption is evident and can be weeded out! The error-checking codes (or confidences in Avalanche) are chained to provide increasing toughness/tolerance to tampering for older entries.

Another difference is in terms of partial/local state versus global state: In stabilization nodes often have partial/local state, and the global state is composition of these local stated. In decentralized consensus, each node has full state, the entire chain or the entire DAG. So in some sense stabilization acts distributedly across nodes, and blockchain consensus acts in a decentralized manner per node.

Dynamic networks and churn perspective

I like to also take some time to discuss how some self-stabilization work and blockchain consensus deal with open environments.

Self-stabilizing graph algorithms provide some tolerance/cushion for dynamic (time-varying) network. The edge costs may change (rewiring the graph) and the stabilizing algorithm reacts/adapts to the new graph. Some examples are  stabilizing clustering and shortest path tree algorithms. While self-stabilization tolerates some churn, if the churn is high, the self-stabilizing algorithm may not be able to catch up.

In contrast, dynamic networks is not a big problem for blockchain consensus, especially since communication is often done via an epidemic broadcast/gossip or pull-based gossip as in Avalanche. The blockchain consensus protocols just need a way to propagate the transactions and chain state for replication at the nodes.

There is also the issue of Sybil-resistance. If participation is very inexpensive, Sybil attack is possible in an open/permissionless environment. The attack works by introducing a large number of Byzantine sock-puppets to the system and by violating the byzantine nodes ratio in the system. Sybil resistance is orthogonal to what I discussed and can be achieved for both approaches by incorporating PoW or PoS techniques.

MAD questions

1. Is it possible to use blockchain full-state replication as composing block for building larger-scale stabilizing systems?
Maybe the blockchain full-state replication may emulate a Virtual Node (VN) in an open/trustless region/zone, and you can have a stabilizing solution built/deployed over these unfallable/uncrashable VNs.

2. Is open/permissionless environment overrated? If we have federation like systems hierarchically-arranged, and quick reconfiguration/stabilization to choose suitable quorums for consensus, would that be enough?

Sunday, June 10, 2018

Book Review. Endurance: A year in space, a lifetime of discovery

I didn't know what to expect when I picked this book from the library. A book by an astronaut could turn out to be boring and mundane for me. The thing is, I am interested in space, but I wouldn't say I am passionate about it. I never wished I could be an astronaut as a child (or as an adult). I guess I wanted to be the engineer that designed those systems, rather than the astronaut that piloted them.

Long story short, I enjoyed reading this book a lot. I never got bored, on the contrary I was very engaged. The book interleaved Scott Kelly's growing up and his 1 year stay at the International Space Station (ISS) at every other chapter. At the end of the book, Scott's life timeline has caught up to the beginning of his year-long ISS stay, and his ISS stay timeline concluded with the Soyuz return capsule entering the atmosphere.

I learned a lot about the space program. It still weirds me out that while we are very much earth-bound with our lives, technology, perspectives, problems, and dreams/aspirations, and yet there are some dozen people that get to inhabit space and look onward to Mars. The future is not uniformly distributed.

The book also included a lot of life lessons and reflections on human relationships. I guess, being in space and looking down to earth, would give a great vantage point on such reflection.

I had also read Seveneves by Neal Stephenson and The Martian by Andy Weir and loved those as well. I guess I should be looking for more books about space. Please recommend some.

In this book, Scott Kelly credits in several places "The Right Stuff" book by (the recently deceased) Tom Wolfe to have turned around his life and got him on track to being an astronaut despite all odds lined against him. So I should also plan on reading some work by Tom Wolfe.

From the book

Here are some random interesting passages from the book. Bring your own context. I will start skipping passages when I am tired of typing.

Page 30:
After a while the bus slows, then comes to a stop well before the launchpad. We nod at one another, step off, and take up our positions. We've all undone the rubber-band seals [on our Sokol suits] that had been so carefully and publicly leak-checked just an hour before. I center myself in front of the right rear tire and reach into my Sokol suit. I don't really have to pee, but it's a tradition: When Yuri Gagarin was on his way to the launchpad for his historic first spaceflight, he asked to pull over---right about here--- and peed on the right rear tire of the bus. Then he went to space and came back alive. So now we all must do the same. The tradition is so well respected that women space travelers bring a bottle of urine or water to splash on the tire rather than getting entirely out of their suits.

[ I guess the danger and unpredictability of the situation is enough to make to make scientist/engineered minded people very superstitious. ]

Page 40-41:
In my freshman year, I started out with great hope that I could turn things around and be a good student, as I had every previous school year. This determination always lasted just a few days, until I realized once again that it was impossible for me to concentrate in class or to study on my own. Soon I was waking up each morning and struggling to think of a reason to go to class, knowing I wouldn't absorb any of the professor's lecture. Often, I didn't go. How was I going to graduate, let alone do well enough to be accepted by any medical school?

Everything changed that afternoon when I picked up The Right Stuff. I'd never read anything like it before. I'd heard the word "voice" used to describe literature, but this was something I could actually hear in my head. Even out in the middle of the swamp, Wolfe wrote in this rot-bog of pine truncks, scum slicks, dead dodder vines, and mosquito eggs, even out in this great overripe sump, the smell of "burned beyond recognition" obliterated everything else. I felt the power of those words washing over me, even if some of the words I had to look up in the dictionary. Perilous, neophyte, virulent. I felt like I had found my calling. I wanted to be like the guys in this book, guys who could land a jet on an aircraft carrier at night and then walk away with a swagger. I wanted to be a naval aviator. I was still a directionless, undereducated eighteen-year-old with terrible grades who knew nothing about airplanes. But The Right Stuff had given me the outline of a life plan.

Page 50:
Unlike the early days of spaceflight, when piloting skill was what mattered, twenty-first-century astronauts are chosen for our ability to perform a lot of different jobs and to get along well with others, especially in stressful and cramped circumstances for long periods of time. Each of my crewmates is not only a close coworker in an array of different high-intensity jobs but also a roommate and a surrogate for all humanity.

Page 140:
The increased fluid pressure may squish our eyeballs out of shape and cause swelling in the blood vessels of our eyes and optic nerves. ... It's possible, too, that high CO2 is causing or contributing to changes in our vision ... High sodium in our diets could also be a factor ... Only male astronauts have suffered damage to their eyes while in space, so looking at the slight differences in the head and neck veins of male and female astronauts might also help scientists start to nail down the causes. If we can't we just might have to send an all-women crew to Mars.

Page 150:
Launch time comes and goes. Shortly after, my laptop's internet connection starts working again. I look up the video for the SpaceX launch, but the connection isn't strong enough to stream the video. I get a jerky, frozen image. then my eye stops on a headline: "SpaceX Rocket Explods During Cargo Launch to Space Station."
You've got to be fucking kidding me.
The flight director gets on a privatized space-to-ground channel and tells us the rocket has been lost.

Page 159: [ Upon getting a second chance after being disqualified from flying F14s ]
"You know, you can fly the airplane okay, but you're not flying it all the time," he told me. "You're on altitude and airspeed, but you're not on top of it." I had been trained to keep my altitude within a 200 foot range, so I didn't worry if I was 10 feet off the precise altitude, or 20, or 50. But Scrote pointed out that this imprecision in the end would lead me far from where I needed to wind up, and fixing it would take a lot of my attention. I had to always be making small, constant corrections if I wanted to make the situation better. He was right. My flying got better, and I've been able to apply what I learned from him to a lot of other areas of life as well.

Page 161:
Being in an F-14 squadron in the 1990s was like a cross between playing a professional sport and being in a rock-and-roll band. The movie Top Gun didn't quite capture the arrogance and bravado of it all. The level of drunkenness and debauchery was unbelievable (and is, thankfully, no longer the standard).

Page 164: [ A Marine saying about failures and mistakes ]
There are those who have, and those who will.

Page 290:
This wasn't my first time training with the Russians, of course... By now, I was intimately familiar with the way the Russian space agency handles training similarly to NASA, such as an emphasis on simulator training, and the way they don't, like their emphasis on the theoretical versus the practical---to an extreme. If NASA were to train an astronaut how to mail a package, they would take a box, put an object in the box, show you the route to the post office, and send you on your way with postage. The Russians would start in the forest with a discussion on the species of tree used to create the pulp that will make up the box, then go into excruciating detail on the history of box making. Eventually you would get to the relevant information about how the package is actually mailed, if you didn't fall asleep first. It seems to me this is part of their system of culpability---everyone involved in training needs to certify that the crew was taught everything they could possibly need to know. If anything should go wrong, it must then be the crew's fault.

Page 294: [ Real artists The Russians ship! ]
Once Sasha was back in his seat and it seemed clear we weren't going to catch fire, we talked about our predicament. I decided not to voice concern about the flammability risk.
"It's too bad we won't launch today," I said.
"Da," Sasha agreed. "We will be the first crew to scrub after strapping in since 1969." This is an incredible statistics, considering how often the space shuttle used to scrub, right up to the seconds before launch, even after the main engines had lit.
A voice for the control center interrupted us. "Guys, start your Sokol suit leak checks."
What? Sasha and I looked at each other with identical What-the-fuck? expressions. We were now inside five minutes to launch. Sasha raced to get strapped back into his seat properly.

Page 300-301: [January 8, 2011 During Scott's second ISS mission]
Mission control told me that the chief of the Astronaut Office, Peggy Whitson, needed to talk to me and would be calling on a private line in five minutes. I had no idea why, but I knew the reason wouldn't be anything good. Five minutes is  along time to think about what emergency might have occurred on the ground. Maybe my grandmother had died. Maybe one of my daughters had been hurt.
...
Peggy came on the line. "I don't know how to tell you this," she said, "so I'm just going to tell you. Your sister-in-law, Gabby was shot."
[ His sister-in-law is Gabrielle Giffords, the former congresswoman from Arizona. ]

Page 311: [ Upon being disqualified from the year-long ISS mission ]
When I got home that night, I told Amiko about being medically disqualified. Rather than looking disappointed, as I expected, she looked puzzled.
"So they are going to send someone who has been on two long flights and has not suffered vision damage?" she asked.
"Right," I said.
"But if the point of this mission is to learn more about what happens to your body on a long mission," she asked, "why would they send someone who is known to be immune to one of the things they intend to study?"
This was a good point.

... I decided to present my case to management. They listened, and to my surprise they reversed their decision.

When I was preparing for the press conference to announce Misha and me as the one-year crew members, I asked what I thought was an innocent question about genetic research. I mentioned something we haven't previously discussed: Mark would be a perfect control study throughout the year. [ Scott's identical twin brother Mark Kelly is also an astronaut. ]
It turns out my mentioning this had enormous ramifications. Because NASA was my employer, it would be illegal for them to ask me for my genetic information. But once I had suggested it, the possibilities of studying the genetic effects of spaceflight transformed the research. The Twins Study became an important aspect of the research being done on station. A lot of people assumed that I was chosen for this mission because I have an identical twin, but that was just serendipitous.

Page 350:
I've been thinking about the whole arc of my life that had brought me here, and I always think about what it meant to me to read The Right Stuff as a young man. I feel certain that I wouldn't have done any of things I have if I hadn't read that book---if Tom Wolfe hadn't written it. On a quiet Saturday afternoon, I call Tom Wolfe to thank him. He sounds truly amazed to hear from me. I tell him we're passing over the Indian Ocean, how fast we're going, how our communication system works. We talk about books and about New York and about what I plan to do first when I get back (jump into my swimming pool). We agree to have lunch when I'm back on Earth, and that's now one of the things I'm looking forward to most.

Page 360:
As much as I worked on scientific experiments, I think I learned at least as much about practical issues of how to conduct a long-range exploration mission. This is what crew members on ISS are always doing---we are not just solving problems and trying to make things better for our own spaceflights, but also studying how to make things better for the future. ... And the larger struggles of my mission---most notably, CO2 management and upkeep of the Seedra--- will have a larger impact on future missions on the space station and future space vehicles. NASA has agreed to manage CO2 at a much lower target level, and better versions of CO2 scrubbers are being developed that will one day replace the Seedra...

MAD questions

1. What type of computer/software systems should we have in space?
The RAM is exposed to radiation, so memory can be corrupted. A self-stabilizing program that tolerates memory corruption could be useful in some situations. I also heard triple modular redundancy and replicas are preferred for some computer/sensor systems. How about Byzantine tolerant systems? Would that be useful for computers in space?

The reliability and assurance needed for computers in space stations should be in a league of their own. I wonder if there is a nice description of modern programming techniques/styles for computers deployed in space.

2. Could there be a book out there for you that could change your life? Give your life a whole new meaning/purpose?

3. Or could something you write/contribute can help change someone's life?

Sunday, June 3, 2018

Snowflake to Avalanche: A Novel Metastable Consensus Protocol Family for Cryptocurrencies

This paper is by Team-Rocket ---the authors are pseudonymous, I presume a guy, a gal, and a cat is involved. I learned about the paper when Emin Gun Sirer announced it on Twitter. I started speculating about the authors last week, and this is my latest guess.

OK, back to the facts. Here is the synopsis from the paper. "This paper introduces a brand new family of consensus protocols suitable for cryptocurrencies, based on randomized sampling and metastable decision. The protocols provide a strong probabilistic safety guarantee, and a guarantee of liveness for correct clients."

Below I first summarize the protocols and at the end I will provide my comments/evaluations. The analysis of the paper is very strong and span 8 pages, but I skip that section in my review.

Introduction

Traditional consensus protocols incur high communication overhead and accurate knowledge of membership. So while they work great for less than 20 participants, they do not work for large numbers of participants and open/permissionless environments. Nakamoto consensus family protocols address that problem, but they are costly and wasteful. Bitcoin currently consumes around 63.49 TWh/year, about twice as all of Denmark.

This paper introduces a new family of consensus protocols, inspired by gossip algorithms: The system operates by repeatedly sampling the participants at random, and steering the correct nodes towards the same consensus outcome. The protocols do not use proof-of-work (PoW) yet achieves safety through an efficient metastable mechanism. So this family avoids the worst parts of traditional and Nakamoto consensus protocols.

Similar to Nakamoto consensus, the protocols provide a probabilistic safety guarantee, using tunable security parameters to make the possibility of a consensus failure arbitrarily small.

The protocols guarantee liveness only for virtuous transactions. Liveness for conflicting transactions issued by Byzantine clients is not guaranteed. This point is important, here is the explanation:
"In a cryptocurrency setting, cryptographic signatures enforce that only a key owner is able to create a transaction that spends a particular coin. Since correct clients follow the protocol as prescribed, they are guaranteed both safety and liveness. In contrast, the protocols do not guarantee liveness for rogue transactions, submitted by Byzantine clients, which conflict with one another. Such decisions may stall in the network, but have no safety impact on virtuous transactions. This is a very sensible tradeoff for building cryptocurrency systems." (While this is OK for cryptocurrency systems, it would be a problem for general consensus where conflicting requests from clients will be present.)

OK, then, does this well-formed non-conflicting transactions make consensus trivial in cryptocurrency systems? Would nonconflicting transactions reduce consensus to plain gossip? Then, what does the paper contribute? With plain gossip, the Byzantine client can first introduce one transaction, and then introduce another transaction. With plain gossip the last write wins and double-spending would ensue. So plain gossip won't do; the protocol needs to sample and establish/maintain some sort of communal memory of transactions such that an established transaction should be impossible to change. Nakamoto uses PoW-based chain-layering/amberification to achieve that. This paper shows how this amberification can be achieved via sampling-based gossip and DAG-layering without PoW! Avalanche is a nice name to denote this irreversible process.

The model

The paper adopts Bitcoin's unspent transaction output (UTXO) model: The clients issue cryptographically signed transactions that fully consume an existing UTXO and issue new UTXOs. Two transactions conflict if they consume the same UTXO and yield different outputs. Correct clients never issue conflicting transactions. It is also impossible for Byzantine clients to forge conflicts with transactions issued by correct clients.

On the other hand, Byzantine clients can issue multiple transactions that conflict with one another, and the correct clients should only consume at most one of those transactions. The goal of the Avalanche family of consensus protocols is to accept a set of non-conflicting transactions in the presence of Byzantine behavior.

The Avalanche family of protocols provide the following guarantees with high probability (whp):

  • Safety. No two correct nodes will accept conflicting transactions.
  • Liveness. Any transaction issued by a correct client (aka virtuous transaction) will eventually be accepted by every correct node.


Slush: Introducing Metastability

The paper starts with a non-Byzantine protocol, Slush, and then builds up Snowflake, Snowball, and Avalanche, with better Byzantine fault-tolerance (BFT) and irreversibility properties.


Slush is presented using a decision between two conflicting colors, red and blue. A node starts out initially in an uncolored state. Upon receiving a transaction from a client, an uncolored node updates its own color to the one carried in the transaction and initiates a query.

To perform a query, a node picks a small, constant-sized ($k$) sample of the network uniformly at random, and sends a query message. Upon receiving a query, an uncolored node adopts the color in the query, responds with that color, and initiates its own query, whereas a colored node simply responds with its current color.

Once the querying node collects k responses, it checks if a fraction $\alpha*k$ are for the same color, where $\alpha > 0.5$ is a protocol parameter. If the $\alpha*k$ threshold is met and the sampled color differs from the node's own color, the node flips to that color. It then goes back to the query step, and initiates a subsequent round of query, for a total of $m$ rounds. Finally, the node decides the color it ended up with at time $m$. The paper shows in the analysis that m grows logarithmically with $n$.

This simple protocol illustrates the basic idea but it has many shortcomings. It assumes synchronized rounds available to all. In Line 15, the "accept color" comes at the end of m rounds; there are no early accepts. Finally, Slush does not provide a strong safety guarantee in the presence of Byzantine nodes, because the nodes lack state: Byzantine nodes can try to flip the memoryless nodes to opposite colors.

Snowflake: BFT

Snowflake augments Slush with a single counter that captures the strength of a node's conviction in its current color. In the Snowflake protocol in Figure 2:

  1. Each node maintains a counter $cnt$;
  2. Upon every color change, the node resets $cnt$ to 0;
  3. Upon every successful query that yields $\geq \alpha*k$ responses for the same color as the node, the node increments $cnt$.



Here the nodes can accept colors in an asynchronous manner, not all at the end of $m$ rounds. Each can accept when its own counter exceeds $\beta$. When the protocol is correctly parameterized for a given threshold of Byzantine nodes and a desired $\epsilon$ guarantee, it can ensure both safety and liveness.

Things already got interesting here. The analysis shows that there exists a phase-shift point after which correct nodes are more likely to tend towards a decision than a bivalent state. Further, there exists a point-of-no-return after which a decision is inevitable. The Byzantine nodes lose control past the phase shift, and the correct nodes begin to commit past the point-of- no-return, to adopt the same color, whp.

Snowball: Adding confidence

Snowflake's notion of state is ephemeral: the counter gets reset with every color flip. That is too much history to forget based on one sampling result. Snowball augments Snowflake with momentum by adding confidence counters that capture the number of queries that have yielded a threshold result for their corresponding color (Figure 3):

  1. Upon every successful query, the node increments its confidence counter for that color.
  2. A node switches colors when the confidence in its current color becomes lower than the confidence value of the new color.



Avalanche: Adding a DAG

Adding a DAG improves efficiency, because a single vote on a DAG vertex implicitly votes for all transactions on the path to the genesis vertex. Secondly, it also improves security, because the DAG intertwines the fate of transactions, similar to the Bitcoin blockchain. This makes past decisions (which are buried under an avalanche) much harder to undo.

When a client creates a transaction, it names one or more parents in the DAG. This may not correspond to application-specific dependencies: e.g., a child transaction need not spend or have any relationship with the funds received in the parent transaction. The paper includes a detailed discussion of parent selection in the implementation section.

In the cryptocurrency application, transactions that spend the same funds (double-spends) conflict, and form a conflict set, out of which only a single one can be accepted. As we mentioned above, a conflict set is disparate from how the DAG is constructed, yet the protocol needs to maintain and check for conflict sets for the safety of consensus.

Avalanche embodies a Snowball instance for each conflict set. While Snowball used repeated queries and multiple counters to capture the amount of confidence built in conflicting transactions (colors), Avalanche takes advantage of the DAG structure and uses a transaction's progeny/descendents. When a transaction T is queried, all transactions reachable from T by following the DAG edges are implicitly part of the query. A node will only respond positively to the query if T and its entire ancestry are currently the preferred option in their respective conflict sets. If more than a threshold of responders vote positively, the transaction is said to collect a chit, cT=1, otherwise, cT=0. Nodes then compute their confidence as the sum of chit values in the progeny of that transaction. Nodes query a transaction just once and rely on new vertices and chits, added to the progeny, to build up their confidence.



As Figure 5 shows, when node u discovers a transaction T through a query, it starts a one-time query process by sampling $k$ random peers. A query starts by adding $T$ to Transaction set, initializing $cT=0$, and then sending a message to the selected peers. Each correct node u keeps track of all transactions it has learned about in set $T_u$, partitioned into mutually exclusive conflict sets $PT$, $T \in T_u$. Since conflicts are transitive, if $T_i$ and $T_j$ are conflicting, then $PT_i = PT_j$.



Figure 6 shows what happens when a node receives a query for transaction T from peer j. The node determines if T is currently strongly preferred and returns a positive response to peer j. The transaction T is strongly preferred, if every single ancestor of T is  preferred among its competing transactions (listed in its corresponding conflict set).

Note that the conflict set of a virtuous transaction is always a singleton.  Figure 7 illustrates a sample DAG built by Avalanche, where the shaded regions indicate conflict sets. Sampling in Avalanche will create a positive feedback for the preference of a single transaction in its conflict set. For example, because T2 has larger confidence than T3, its descendants are more likely collect chits in the future compared to T3. So T9 would have an advantage over T6 and T7 in its conflict set.


Figure 4 illustrates the Avalanche protocol main loop, executed by each node. In each iteration, the node attempts to select a transaction T that has not yet been queried. If no such transaction exists, the loop will stall until a new transaction is added. It then selects k peers and queries those peers. If more than $\alpha*k$ of those peers return a positive response, the chit value is set to 1. After that, it updates the preferred transaction of each conflict set of the transactions in its ancestry. Next, T is added to the set Q so it will never be queried again by the node.


Similar to Bitcoin, Avalanche leaves determining the acceptance point of a transaction to the application. Committing a transaction can be performed through a safe early commitment. For virtuous transactions, T is accepted when it is the only transaction in its conflict set and has a confidence greater than threshold $\beta_1$. Alternatively, T can also be accepted after a $\beta_2$ number of consecutive successful queries. If a virtuous transaction fails to get accepted due to a liveness problem with parents, it could be accepted if reissued with different parents.

Implementation

The Team Rocket implemented a bare-bones payment system by porting Bitcoin transactions to Avalanche. They say: "Deploying a full cryptocurrency involves bootstrapping, minting, staking, unstaking, and inflation control. While we have solutions for these issues, their full discussion is beyond the scope of this paper."

As I mentioned before, this section talks in depth about parent selection:
The goal of the parent selection algorithm is to yield a well-structured DAG that maximizes the likelihood that virtuous transactions will be quickly accepted by the network. While this algorithm does not affect the safety of the protocol, it affects liveness and plays a crucial role in determining the shape of the DAG. A good parent selection algorithm grows the DAG in depth with a roughly steady "width". The DAG should not diverge like a tree or converge to a chain, but instead should provide concurrency so nodes can work on multiple fronts. 
Perhaps the simplest idea is to mint a fresh transaction with parents picked uniformly at random among those transactions that are currently strongly preferred. But this strategy will yield large sets of eligible parents, consisting mostly of historical, old transactions. When a node samples the transactions uniformly that way, the resulting DAG will have large, ever-increasing fan-out. Because new transactions will have scarce progenies, the voting process will take a long time to build the required confidence on any given new transaction.
Not only are the ancestors important, progeny is also important for low-latency transaction acceptance. The best transactions to choose lie somewhere near the frontier, but not too far deep in history. The adaptive parent selection algorithm chooses parents by starting at the DAG frontier and retreating towards the genesis vertex until finding an eligible parent.

Evaluation

The basic payment system is implemented in 5K lines of C++ code. Experiments are conducted on Amazon EC2 by running from hundreds to thousands of virtual machine instances using c5.large instances.



For throughput, maintaining a partial order (DAG) that just captures the spending relations allows for more concurrency in processing than a classic BFT log replication system where all transactions have to be linearized. Also, the lack of a leader in Avalanche helps prevent bottlenecks.



Figure 21 shows that all transactions are confirmed within approximately 1 second. Figure 22 shows transaction latencies for different numbers of nodes and that the median latency is more-or-less independent of network size.


Avalanche's latency is only slightly affected by misbehaving clients, as shown in Figure 23.


For emulated georeplication, measurements show an average throughput of 1312 tps, with a standard deviation of 5 tps, and the median transaction latency is 4.2 seconds, with a maximum latency of 5.8 seconds.

The paper includes a comparison paragraph to Algorand and Bitcoin:
Algorand uses a verifiable random function to elect committees, and maintains a totally-ordered log while Avalanche establishes only a partial order. Algorand is leader-based and performs consensus by committee, while Avalanche is leaderless. Both evaluations use a decision network of size 2000 on EC2. Our evaluation uses c5.large with 2 vCPU, 2 Gbps network per VM, while Algorand uses m4.2xlarge with 8 vCPU, 1 Gbps network per VM. The CPUs are approximately the same speed, and our system is not bottlenecked by the network, making comparison possible. The security parameters chosen in our experiments guarantee a safety violation probability below 10−9 in the presence of 20% Byzantine nodes, while Algorand's evaluation guarantees a violation probability below 5 × 10−9 with 20% Byzantine nodes. 
The throughput is 3-7 tps for Bitcoin, 364 tps for Algorand (with 10 Mbyte blocks), and 159 tps (with 2 Mbyte blocks). In contrast, Avalanche achieves over 1300 tps consistently on up to 2000 nodes. As for latency, finality is 10–60 minutes for Bitcoin, around 50 seconds for Algorand with 10 Mbyte blocks and 22 seconds with 2 Mbyte blocks, and 4.2 seconds for Avalanche.

MAD questions

1. Is this a new protocol family?

Yes. Nakamato consensus used PoW to choose leaders. Other protocols uses PoX (e.g., proof-of-lottery, proof-of-stake, PoW) to choose committees which then run PBFT.  Traditional consensus protocols require known membership.

In contrast, Avalanche is a leaderless protocol family that works in open/permissionless setting. It doesn't use any PoX scheme, but uses randomized sampling and metastability to ascertain and persist transactions.

The analysis of the protocols are very strong, and discuss phase-shift point and point-of-no-return for these protocols. This is a very interesting approach to think about consensus. This is also a very fresh approach to thinking about self-stabilization as well. I have a good understanding of self-stabilization literature but I haven't seen this approach in that domain either. I would say the approach would also see interest from the broad self-organizing systems area.

The DAG analysis in the implementation section is also interesting. I don't know much about the hashgraph-based solutions so I don't know how this DAG construction relates to those.

2. What is the incentive to participate?

The paper already discussed a cryptocurrency implementation using Avalanche. But minting, staking, credit distribution, etc, was left for future work. The incentive to participate would come from the cryptocurrency minting and staking. The credit assignment would be interesting and probably would involve several new research problems as well.

3. Where does the Sybil attack tolerance of Avalanche come from?

Avalanche tolerates Byzantine nodes using a tunable parameter to increase/decrease tolerance factor. The paper also reports results with 20% Byzantine nodes.

However, if participation is very inexpensive, Sybil attack is possible, where large number of Byzantine sock-puppets can be introduced to the system violating the BFT ratios. I guess a proof-of-stake based approach can be used in Avalanche to prevent the introduction of an enormous number of Byzantine nodes to the network.

Making Sybil nodes a bit costly help, and that can be complemented with keeping the number of correct nodes high. If the protocol can be really resource light, people wouldn't mind having this in the background in their laptops the same way they don't mind background Dropbox synchronization open. With some incentive it is possible to have many many many participants which also increase tolerance against Sybil attack.

4. What is the resource (computation and storage) requirements at the participants?

On the topic of resource-lightness of the protocol, the paper mentions that transaction validation is the performance bottleneck: "To test the performance gain of batching, we performed an experiment where batching is disabled. Surprisingly, the batched throughput is only 2x as large as the unbatched case, and increasing the batch size further does not increase throughput. The reason for this is that the implementation is bottlenecked by transaction verification. Our current implementation uses an event-driven model to handle a large number of concurrent messages from the network. After commenting out the verify() function in our code, the throughput rises to 8K tps, showing that either contract interpretation or cryptographic operations involved in the verification pose the main bottleneck to the system."

In Avalanche, each participant node needs to maintain the DAG. But since the DAG is a pretty flexible data structure in Avalanche, I think it shouldn't be hard to shard the DAG across groups of participants.

I also wonder about the cost of "conflict set" maintenance as I didn't get a good understanding of how the conflict sets are maintained. The paper mentions an optimization for conflict set maintenance in the implementation section: "A conflict set could be very large in practice, because a rogue client can generate a large volume of conflicting transactions. Instead of keeping a container data structure for each conflict set, we create a mapping from each UTXO to the preferred transaction that stands as the representative for the entire conflict set. This enables a node to quickly determine future conflicts, and the appropriate response to queries."

5. Can some of the assumptions be used for constructing an attack?

The paper says: "The analysis assumes a synchronous network, while the deployment and evaluation is performed in a partially synchronous setting. We conjecture that the results hold in partially synchronous networks, but the proof is left to future work."

I think I buy this. The protocols coming after Slush weakened the synchronicity assumption. The epidemic random sampling mechanism help propagate transactions to the network. So, with enough number of correct nodes, and some weak guarantees about processing speed, I think this can work. Well, we should see the proof.

The epidemic random sampling mechanism requires a decentralized service so that the node to connect with sufficiently many correct nodes to acquire a statistically unbiased view of the network. I guess peer-to-peer finger tables would be sufficient to achieve that. This service should also be guarded against Byzantine nodes as this can be used to affect consensus results by routing nodes to Byzantine pools for sampling. I am wondering if asynchronicity can also be used to introduce/reinforce network partitioning.