Wednesday, July 4, 2018

If You’re Not Writing a Program, Don't Use a Programming Language

This article by Leslie Lamport has appeared recently at the Distributed Computing & Education Column.

This article focuses more on the distinction between the TLA+ modeling language versus programming languages. Earlier I had written a blog post about more general benefits of modeling and model-checking.

This is not a technical article so instead of trying to summarize/review the main points, I just include some choice quotes from the article below. The most important takeout message is at the end, which I emphasize with the bold font.

Algorithms are not programs, and they can be expressed in a simpler and more expressive language. That language is the one used by almost every branch of science and engineering to precisely describe and reason about the objects they study: the language of mathematics.

Programming is too often taken to mean coding, and the algorithm is almost always developed along with the code. To understand why this is bad, imagine trying to discover Euclid's algorithm by thinking in terms of code rather than in terms of mathematics.

The [TLA+] abstraction helped a lot in coming to a much cleaner architecture (we witnessed first-hand the brainwashing done by years of C programming). One of the results was that the code size is about 10× less than in [the previous version].  
(Eric Verhulst, Raymond T. Boute, José Miguel Sampaio Faria, Bernard H. C. Sputh, and Vitaliy Mezhuyev. Formal Development of a Network-Centric RTOS. Springer, New York, 2011.)
The common obsession with languages might lead readers to think this result was due to some magical features of TLA+. It wasn’t. It was due to TLA+ letting users think mathematically.

A correctness property of an algorithm is often best expressed as a higher-level algorithm. Proving correctness then means proving that the original algorithm refines the higher-level one. This usually involves both data refinement and step refinement.

I expect that this kind of refinement sounds like magic to most readers, who won't believe that it can work in practice. I will simply report that among the refinement proofs I have written is a machine-checked correctness proof of the consensus algorithm at the heart of a subtle fault-tolerant distributed algorithm by Castro and Liskov that uses 3F + 1 processes, up to F of which may be malicious (Byzantine). The proof shows that the Castro-Liskov consensus algorithm refines a version of the 2F + 1 process Paxos consensus algorithm that tolerates F benignly faulty processes. Steps of malicious processes, as well as many steps taken by the good processes to prevent malicious ones from causing an incorrect execution of Paxos, refine stuttering steps of the Paxos algorithm. I found that viewing the Castro-Liskov algorithm as a refinement of Paxos was the best way to understand it.

Today, programming is generally equated with coding. It's hard to convince students who want to write code that they should learn to think mathematically, above the code level, about what they’re doing. Perhaps the following observation will give them pause. It's quite likely that during their lifetime, machine learning will completely change the nature of programming. The programming languages they are now using will seem as quaint as Cobol, and the coding skills they are learning will be of little use. But mathematics will remain the queen of science, and the ability to think mathematically will always be useful.

Related posts on TLA+

Hillel maintains a nice tutorial for TLA+. He has an upcoming book on TLA+.

I have written up a lot on TLA+. Here are two gentle introduction posts I had written in 2014 and 2015. I gave many examples of modeling with TLA+, as I think lack of examples is one of the biggest obstacles before TLA+ achieving wider adoption. Some include modeling of 2-phase commit transactionssynchronized round consensuschain replication, hygenic dining philosophers, and Paxos and Flexible Paxos. I also use TLA+ in my research papers to provide an unambiguous pseudocode for algorithms, as well as a way to model check their correctness. 

This summer, I will be joining Microsoft Azure Cosmos DB team for my sabbatical for 6 months. I will get a chance to use TLA+ in practice in the context of a very large-scale globally distributed, multi-model database service. So you can expect many more TLA+ posts in this space soon.

MAD questions

1. Instead of the succinctness distinction between a programming language and a specification language, maybe the difference in benefits comes more from the process. You should approach programming in a two phase manner: think before you code. Before you start writing any pretensibly executable code, think first and write a design/algorithm/model first using that programming language as pseudocode.

But then, wouldn't this have worked for the RTOS project cited above? They could have written the design/algorithm in pseudocode C first, and would have achieved the same benefits. But maybe trying to write the design/algorithm in the programming language bring a lot of baggage with it (brainwashing as the quotation above mentions) that this is impossible. Maybe this is the curse of the Whorfian syndrome Lamport wrote about in another article. "Computer scientists collectively suffer from what I call the Whorfian syndrome -- the confusion of language with reality."

2. Would a succinct programming language nullify (or weaken) the arguments in this article? What is your favorite succinct programming language?

Paul Graham swears by the benefits of Lisp. He quotes ESR there: "Lisp is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days, even if you never actually use Lisp itself a lot."

Maybe the benefits come from declarative versus operational thinking? So declarative languages would have an advantage over operational languages.

I am not a programming language guy. And I don't want to start a PL frame war, so maybe I should stop. (This may be the most dangerous MAD question I had written :-)

Monday, July 2, 2018

Blockchain consensus and ubiquitous computing: match made in heaven?

Today I mused a bit on a futuristic research outlook for blockchain/decentralized consensus. Here is what occurred to me: a merger of blockchain consensus with ubiquitous/spatial computing.

Partially-replicated decentralized consensus

Here is the first piece of the puzzle.

A drawback with current blockchain/decentralized consensus work is that they require full-state-replication (of the entire chain or DAG) at each participating node. This is a big scalability bottleneck.

Replication should be on a need to know basis. What is the point of replicating transactions/state that is relevant for Buffalo, NY to nodes in Delhi, India.

Assume we fix that. (That will involve a good amount of research on graph algorithms.) And we use lightweight energy-efficient decentralized consensus algorithms, such as Avalanche, so that these can work on lightweight portable renewable power devices.

Now on to the next piece of the puzzle.

Ubiquitous/spatial computing

Think about next generation IoT. A lot of systems problems need to be solved there to fully realize Mark Weiser's ubiquitous computing vision, but we are getting there. Let's assume we have these smart devices, and even smart-material ubiquitously. Let's assume these devices/material can act as a smart witness to a transaction/state. 
The imagery I have of this is the biblical judgment day, where even the nonliving things (walls, rocks, shoes) witness against people for some of their sins.

On-prem witnessing can be useful for providing an unchangeable record of things. (Of course there should be privacy protection mechanisms. Only God can judge me, and I don't like to have my wall tell on some of my private actions.)

Of course, if it is worth recording, it is worth screwing with as well. A Byzantine adversary may want to change the record to falsify a story, to provide an alibi, to get out of a handshake contract, etc.

Initially, you like to keep the consensus replicated in a vicinity (house or block) but if there is attack, an on-demand replication to other areas would be performed to strengthen consensus and tolerate more attack.

With partial/local replication, another issue becomes important as well:
indexing and search functionality. Search is not supported sufficiently in decentralized systems, and there is need for research on providing good/fast search capability in this model.

MAD questions

I told you this is a futuristic research agenda. If you think this is interesting, if you have things to add,  drop a comment, or email me.

What would be some concrete applications?
I haven't given a concrete application here. But this type of platform (built on a convergence on ubiquitous computing and decentralized consensus) may be useful for building crowdsourced sensing and coordination applications in open adversarial environments. An example may be the Darpa red balloon challenge.

What are more practical applications? What is something you couldn't do with today's platforms? Where does the Byzantine tolerance requirement come from?

Tuesday, June 19, 2018

Academic/research impact

impact (n)
1. the action of one object coming forcibly into contact with another
2. the effect or influence of one person, thing, or action, on another

Impact (with the second meaning :-) is an often talked about desirable thing in academic and research world. Impact is especially important for promoting an associate professor to a full professor status.

Unfortunately there is no well-defined description or an objective way to quantify impact. There are approximated metrics, each with significant flaws:
  • Citation count is easily quantifiable --Google Scholar shows your citation count and h-index. However, this can be tricked. Some journals have boosted their impact factors by asking accepted articles to cite other articles from the journal. Publishing in popular/crowded areas (e.g., wireless networks) can also boost citation count. It is possible to have high citation count with little impact, and little citation count with high impact. As a personal anectode, many of the work I am most proud of has low citation count, and I have not anticipated how some of my quick "low-hanging-fruit" papers gathered a lot of citations.
  • Name recognition/identity is not quantifiable and varies depending on who you ask. Also this can be tricked by owning a very tiny corner without having much impact.
  • Number of PhD students graduated may not have a strong correlation with impact.
  • Best paper awards are great, but let's be cynical here. The awards show you know how to write good papers: one may even argue, you are a crowd-pleaser, you are too much into group-think/community-consensus. Truly innovative papers fail to get best paper awards, heck, they get rejected for many years. They only get test-of-time and hall-of-fame awards in retrospective. 
  • Grants are nice, but being cynical you can argue, what they truly measure is how good you are at telling what the reviewers want to hear.

OK, enough of being cynical. Of course these metrics capture a lot of value and approximate impact in most cases. But my point is, it is easy to fake impact-potential. It is easy to play the game and improve those metrics, without having impact.

Recognizing impact

OK, it may be hard to truly recognize impact except in retrospect. I will now argue that even in retroscpect, there is contention in recognizing impact. What are the dimensions of impact, which dimensions are more important? There is disagreement even about that.

I recently performed a series of twitter polls on this. Here are the results with my commentary.

Although majority votes favored improving a huge fraction of lives in tiny amount over improving a tiny fraction of lives in huge amount, it is easy to make a case for the latter as well. In fact, I was thinking that would be the winning option---I would have voted for that. Many web services improve huge number of lives in a tiny way, but how much does that matter? How bad would you be off with a little bit of inconvenience. On the other hand a huge improvement in a tiny fraction of people's lives (disabled, poor, sick, etc.) is a big deal for them. Isn't that more impactful? (I tear up watching these videos every time. Even the colorblind seeing color the first time... It is a big deal for them.)

This poll returned an even split between small current improvement over a bigger improvement potential. I would have voted for big improvement potential: the power of transformational is orders of magnitude more than the incremental. However, I don't deny that small incremental improvements may also form critical mass and stepping stones and can lead to transformational improvements as well.

These two showed strong preference for novelty over completeness. It is possible to make a case the other way as well. Let's say the first/simple solution gets stuck 1% of the case and the improved solution covers that. This may make or break the practicality of the solution at large scale. Corner cases and performance problems are deal breakers at large scale.

This surprised me in a good way. There was a great appreciation that defining/identifying a problem is very important, and with that out of the way the solution becomes possible and often easy.

This also showed there is good appreciation of theory that enables later practical applications.

This one was a bit disappointing for me. Atomic clocks enabled GPS, so I think it had more impact than GPS. Atomic clocks is a general tool and enables other applications applications: long-baseline interferometry in radioastronomy, detecting gravity-waves, and even used in georeplicated distributed databases.

DARPA has been boasting about its funding atomic clocks research as a use-inspired basic research project. At the time of funding, the applications of precise clocks was unclear, but DARPA funded physicists to work on it. And once atomic clocks were built, applications started to emerge. Around 2000-2010 DARPA has been mentioning this in most of their presentations (I haven't been to DARPA presentations recently). DARPA has been proud of this, and wanted to solicit other work like this. These presentations included this diagram and urged for other use-inspired basic research.

PurityThere is something to be said for purity/abstractness. Abstractness can imply generality and may mean more impact. On the other hand, there may also be a phase transition in terms of impact/practicality in the abstractness spectrum. Sometimes things get too abstract and abstruse. But it may be hard to differentiate when that happens and draw a line. Maybe the solution is use-inspired research/theory. It may help to have some practical relevance considerations while doing theory. It may help to do theory that can help as a tool for enabling other work.

Is it possible to optimize/plan for impact?

After my twitter poll frenzy, a friend forwarded me this talk: why greatness cannot be planned: the myth of the objective. I highly recommend the talk. While I don't agree with all conclusions there, it has merit, and gets several things right.

The talk (and I guess the book as well) argues that having research objectives does not help in achieving them and can even be harmful, because the stepping stones don't resemble the end objective. So the advice given is: don't plan for impact/objectives, just do interesting stuff!

That has also been my conclusion thinking about research impact many years in the back of my mind. Instead of trying to chase impact, it is more impactful to do interesting work that you enjoy, and to do what only you can do. I wrote this in 2014:

So publishing less is bad, and publishing more does not guarantee you make an impact. Then what is a good heuristic to adopt to be useful and to have an impact? 
I suggest that the rule is to "be daring, original, and bold". We certainly need more of that in the academia. The academia moves more like a herd, there are flocks here and there mowing the grass together. And staying with the herd is a conservative strategy. That way you avoid becoming an outlier, and it is easier to publish and get funded because you don't need justify/defend your research direction; it is already accepted as a safe research direction by your community. (NSF will ask to see intellectual novelty in proposals, but the NSF panel reviewers will be unhappy if a proposal is out in left field and is attempting to break new ground. They will find a way to reject the proposal unless a panelist champions the proposal and challenges the other reviewers about their concocted reasons for rejecting. As a result, it is rare to see a proposal that suggests truly original/interesting ideas and directions.) 
To break new ground, we need more mavericks that leave the herd and explore new territory in the jungle. Looking at the most influential names in my field of study, distributed systems, I see that Lamport, Dijkstra, Lynch, Liskov were all black sheep. 
The interesting thing is, to be the black sheep, you don't need to put on an act. If you have a no bullshit attitude about research and don't take fashionable research ideas/directions just by face value, you will soon become the black sheep. But, as Feynman used to say "what do you care what other people think?". Ignore everybody, work on what you think is the right thing.
Word of caution!! While being different and being black sheep can speed up your way to the top, it works only after you paid the price. It is a very steep climb up there, and you need many grueling years of preparation.

Of course, doing stuff that is interesting is a very subjective thing.  That may be ok, when choosing what to work on. But evaluating "interesting/novelty" is too subjective. It takes us back in full circle in admitting there is no well-defined description or an objective way to quantify impact.  As I speculate in MAD question 3, I believe we deserve better metrics and more research into impact metrics.

MAD questions

1. What is the most impactful research you know?  What are its prominent characteristics?  Is it an enabling tool for other work? Is it a simple idea (paradigm shift)? Is it a synthesis of other ideas? Is it a tour-de-force complex technical solution?

You can vote/comment on this by Wednesday June 20.
2. What heuristics do you employ for increasing impact in your research?

3. In team sports, we select an mvp as the player with a high profile/visibility, but in reality the real mvp could have low profile in terms of scoring and the metrics tracks but still provides more value. When I was a child, assists were not tracked in soccer, maybe it was too much of a hassle. What about a defender that played strategically and blocked attacks before they developed? How do we credit that? Moneyball (Michael Lewis) shed some light on this.
The central premise of Moneyball is that the collective wisdom of baseball insiders (including players, managers, coaches, scouts, and the front office) over the past century is subjective and often flawed. Statistics such as stolen bases, runs batted in, and batting average, typically used to gauge players, are relics of a 19th-century view of the game and the statistics available at that time. Before sabermetrics was introduced to baseball, teams were dependent on the skills of their scouts to find and evaluate players. Scouts are those who are experienced in the sport, usually having been involved as players or coaches. The book argues that the Oakland A's' front office took advantage of more analytical gauges of player performance to field a team that could better compete against richer competitors in Major League Baseball (MLB).

Coming back to research/academic impact, don't we need/deserve a better more informed approach for tracking and identifying impact? Are there severely underrated work/researchers that have had huge impact that went unnoticed?

Wednesday, June 13, 2018

About self-stabilization and blockchain consensus

This is a half-baked exploratory piece about how some self-stabilization ideas may relate with blockchain consensus in open/permissionless environments. I will write about some similarities and some major differences between the self-stabilizing distributed algorithms work and blockchain consensus deployments.


Let's start with a brief review self-stabilization.

Stabilization is a type of fault-tolerance that handles faults in a principled unified manner instead of on a case-by-case basis. Instead of trying to figure out how much faults can disrupt the system's operation, stabilization assumes arbitrary state corruption, which covers all possible worst-case collusions of faults and program actions. Stabilization then advocates designing recovery actions that takes the program back to invariant states starting from any arbitrary state. This makes stabilization suitable for dealing with unanticipated faults.

The most distinct property of self-stabilization is its emphasis on liveness and providing eventual safety. When the invariant is violated, liveness still holds, and the system makes progress from faulty-states moving closer to invariant states until eventually safety is reestablished. In that sense stabilization is related to eventual-consistency approaches as well.

If you like to see an example, here is Dijkstra's self-stabilizing token ring program. In this problem, the processes are arranged in a ring fashion and there is a unique token circulating in this ring. (You can think of the token may be providing mutual exclusion to the processes; whichever process has the token can access the critical-section/shared-resource).

The similarity between self-stabilization and blockchain consensus is that they are both OK with finite-duration divergence of state/consistency.  In (most) blockchain consensus protocols, you may have a fork, which is resolved eventually. In Bitcoin the fork is resolved with addition of new blocks and using the longest-chain rule with time. Avalanche uses DAG not a chain, so the "fork" is resolved using increased confidence values with time. (In other words, the unpopular branches in DAG atrophy over time.)

In stabilization literature, there are distantly related approaches for blockchain consensus. I would say two important/early ones are Error-detecting codes and fault-containing self-stabilization (2000) and Probabilistic self-stabilization (1990).

Also, the undecided-state dynamics, population protocols by Aspnes, and
self-* properties through gossiping are somewhat related work for Avalanche family consensus protocols.

State-management perspective

There are big differences in self-stabilization and blockchain attitude for state.

Stabilization likes to use soft-state and minimal state. That helps with efficient/lightweight eventual consistency.

Blockchain achieves eventual-consistency with hard-state, and lots of it. This is achieved through full replication at each node and using error-checking codes. While the stabilization approach does not like keeping history (because it can be corrupted), blockchain approach embraces history. However, blockchain maintains history in such a way that corruption is evident and can be weeded out! The error-checking codes (or confidences in Avalanche) are chained to provide increasing toughness/tolerance to tampering for older entries.

Another difference is in terms of partial/local state versus global state: In stabilization nodes often have partial/local state, and the global state is composition of these local stated. In decentralized consensus, each node has full state, the entire chain or the entire DAG. So in some sense stabilization acts distributedly across nodes, and blockchain consensus acts in a decentralized manner per node.

Dynamic networks and churn perspective

I like to also take some time to discuss how some self-stabilization work and blockchain consensus deal with open environments.

Self-stabilizing graph algorithms provide some tolerance/cushion for dynamic (time-varying) network. The edge costs may change (rewiring the graph) and the stabilizing algorithm reacts/adapts to the new graph. Some examples are  stabilizing clustering and shortest path tree algorithms. While self-stabilization tolerates some churn, if the churn is high, the self-stabilizing algorithm may not be able to catch up.

In contrast, dynamic networks is not a big problem for blockchain consensus, especially since communication is often done via an epidemic broadcast/gossip or pull-based gossip as in Avalanche. The blockchain consensus protocols just need a way to propagate the transactions and chain state for replication at the nodes.

There is also the issue of Sybil-resistance. If participation is very inexpensive, Sybil attack is possible in an open/permissionless environment. The attack works by introducing a large number of Byzantine sock-puppets to the system and by violating the byzantine nodes ratio in the system. Sybil resistance is orthogonal to what I discussed and can be achieved for both approaches by incorporating PoW or PoS techniques.

MAD questions

1. Is it possible to use blockchain full-state replication as composing block for building larger-scale stabilizing systems?
Maybe the blockchain full-state replication may emulate a Virtual Node (VN) in an open/trustless region/zone, and you can have a stabilizing solution built/deployed over these unfallable/uncrashable VNs.

2. Is open/permissionless environment overrated? If we have federation like systems hierarchically-arranged, and quick reconfiguration/stabilization to choose suitable quorums for consensus, would that be enough?

Sunday, June 10, 2018

Book Review. Endurance: A year in space, a lifetime of discovery

I didn't know what to expect when I picked this book from the library. A book by an astronaut could turn out to be boring and mundane for me. The thing is, I am interested in space, but I wouldn't say I am passionate about it. I never wished I could be an astronaut as a child (or as an adult). I guess I wanted to be the engineer that designed those systems, rather than the astronaut that piloted them.

Long story short, I enjoyed reading this book a lot. I never got bored, on the contrary I was very engaged. The book interleaved Scott Kelly's growing up and his 1 year stay at the International Space Station (ISS) at every other chapter. At the end of the book, Scott's life timeline has caught up to the beginning of his year-long ISS stay, and his ISS stay timeline concluded with the Soyuz return capsule entering the atmosphere.

I learned a lot about the space program. It still weirds me out that while we are very much earth-bound with our lives, technology, perspectives, problems, and dreams/aspirations, and yet there are some dozen people that get to inhabit space and look onward to Mars. The future is not uniformly distributed.

The book also included a lot of life lessons and reflections on human relationships. I guess, being in space and looking down to earth, would give a great vantage point on such reflection.

I had also read Seveneves by Neal Stephenson and The Martian by Andy Weir and loved those as well. I guess I should be looking for more books about space. Please recommend some.

In this book, Scott Kelly credits in several places "The Right Stuff" book by (the recently deceased) Tom Wolfe to have turned around his life and got him on track to being an astronaut despite all odds lined against him. So I should also plan on reading some work by Tom Wolfe.

From the book

Here are some random interesting passages from the book. Bring your own context. I will start skipping passages when I am tired of typing.

Page 30:
After a while the bus slows, then comes to a stop well before the launchpad. We nod at one another, step off, and take up our positions. We've all undone the rubber-band seals [on our Sokol suits] that had been so carefully and publicly leak-checked just an hour before. I center myself in front of the right rear tire and reach into my Sokol suit. I don't really have to pee, but it's a tradition: When Yuri Gagarin was on his way to the launchpad for his historic first spaceflight, he asked to pull over---right about here--- and peed on the right rear tire of the bus. Then he went to space and came back alive. So now we all must do the same. The tradition is so well respected that women space travelers bring a bottle of urine or water to splash on the tire rather than getting entirely out of their suits.

[ I guess the danger and unpredictability of the situation is enough to make to make scientist/engineered minded people very superstitious. ]

Page 40-41:
In my freshman year, I started out with great hope that I could turn things around and be a good student, as I had every previous school year. This determination always lasted just a few days, until I realized once again that it was impossible for me to concentrate in class or to study on my own. Soon I was waking up each morning and struggling to think of a reason to go to class, knowing I wouldn't absorb any of the professor's lecture. Often, I didn't go. How was I going to graduate, let alone do well enough to be accepted by any medical school?

Everything changed that afternoon when I picked up The Right Stuff. I'd never read anything like it before. I'd heard the word "voice" used to describe literature, but this was something I could actually hear in my head. Even out in the middle of the swamp, Wolfe wrote in this rot-bog of pine truncks, scum slicks, dead dodder vines, and mosquito eggs, even out in this great overripe sump, the smell of "burned beyond recognition" obliterated everything else. I felt the power of those words washing over me, even if some of the words I had to look up in the dictionary. Perilous, neophyte, virulent. I felt like I had found my calling. I wanted to be like the guys in this book, guys who could land a jet on an aircraft carrier at night and then walk away with a swagger. I wanted to be a naval aviator. I was still a directionless, undereducated eighteen-year-old with terrible grades who knew nothing about airplanes. But The Right Stuff had given me the outline of a life plan.

Page 50:
Unlike the early days of spaceflight, when piloting skill was what mattered, twenty-first-century astronauts are chosen for our ability to perform a lot of different jobs and to get along well with others, especially in stressful and cramped circumstances for long periods of time. Each of my crewmates is not only a close coworker in an array of different high-intensity jobs but also a roommate and a surrogate for all humanity.

Page 140:
The increased fluid pressure may squish our eyeballs out of shape and cause swelling in the blood vessels of our eyes and optic nerves. ... It's possible, too, that high CO2 is causing or contributing to changes in our vision ... High sodium in our diets could also be a factor ... Only male astronauts have suffered damage to their eyes while in space, so looking at the slight differences in the head and neck veins of male and female astronauts might also help scientists start to nail down the causes. If we can't we just might have to send an all-women crew to Mars.

Page 150:
Launch time comes and goes. Shortly after, my laptop's internet connection starts working again. I look up the video for the SpaceX launch, but the connection isn't strong enough to stream the video. I get a jerky, frozen image. then my eye stops on a headline: "SpaceX Rocket Explods During Cargo Launch to Space Station."
You've got to be fucking kidding me.
The flight director gets on a privatized space-to-ground channel and tells us the rocket has been lost.

Page 159: [ Upon getting a second chance after being disqualified from flying F14s ]
"You know, you can fly the airplane okay, but you're not flying it all the time," he told me. "You're on altitude and airspeed, but you're not on top of it." I had been trained to keep my altitude within a 200 foot range, so I didn't worry if I was 10 feet off the precise altitude, or 20, or 50. But Scrote pointed out that this imprecision in the end would lead me far from where I needed to wind up, and fixing it would take a lot of my attention. I had to always be making small, constant corrections if I wanted to make the situation better. He was right. My flying got better, and I've been able to apply what I learned from him to a lot of other areas of life as well.

Page 161:
Being in an F-14 squadron in the 1990s was like a cross between playing a professional sport and being in a rock-and-roll band. The movie Top Gun didn't quite capture the arrogance and bravado of it all. The level of drunkenness and debauchery was unbelievable (and is, thankfully, no longer the standard).

Page 164: [ A Marine saying about failures and mistakes ]
There are those who have, and those who will.

Page 290:
This wasn't my first time training with the Russians, of course... By now, I was intimately familiar with the way the Russian space agency handles training similarly to NASA, such as an emphasis on simulator training, and the way they don't, like their emphasis on the theoretical versus the practical---to an extreme. If NASA were to train an astronaut how to mail a package, they would take a box, put an object in the box, show you the route to the post office, and send you on your way with postage. The Russians would start in the forest with a discussion on the species of tree used to create the pulp that will make up the box, then go into excruciating detail on the history of box making. Eventually you would get to the relevant information about how the package is actually mailed, if you didn't fall asleep first. It seems to me this is part of their system of culpability---everyone involved in training needs to certify that the crew was taught everything they could possibly need to know. If anything should go wrong, it must then be the crew's fault.

Page 294: [ Real artists The Russians ship! ]
Once Sasha was back in his seat and it seemed clear we weren't going to catch fire, we talked about our predicament. I decided not to voice concern about the flammability risk.
"It's too bad we won't launch today," I said.
"Da," Sasha agreed. "We will be the first crew to scrub after strapping in since 1969." This is an incredible statistics, considering how often the space shuttle used to scrub, right up to the seconds before launch, even after the main engines had lit.
A voice for the control center interrupted us. "Guys, start your Sokol suit leak checks."
What? Sasha and I looked at each other with identical What-the-fuck? expressions. We were now inside five minutes to launch. Sasha raced to get strapped back into his seat properly.

Page 300-301: [January 8, 2011 During Scott's second ISS mission]
Mission control told me that the chief of the Astronaut Office, Peggy Whitson, needed to talk to me and would be calling on a private line in five minutes. I had no idea why, but I knew the reason wouldn't be anything good. Five minutes is  along time to think about what emergency might have occurred on the ground. Maybe my grandmother had died. Maybe one of my daughters had been hurt.
Peggy came on the line. "I don't know how to tell you this," she said, "so I'm just going to tell you. Your sister-in-law, Gabby was shot."
[ His sister-in-law is Gabrielle Giffords, the former congresswoman from Arizona. ]

Page 311: [ Upon being disqualified from the year-long ISS mission ]
When I got home that night, I told Amiko about being medically disqualified. Rather than looking disappointed, as I expected, she looked puzzled.
"So they are going to send someone who has been on two long flights and has not suffered vision damage?" she asked.
"Right," I said.
"But if the point of this mission is to learn more about what happens to your body on a long mission," she asked, "why would they send someone who is known to be immune to one of the things they intend to study?"
This was a good point.

... I decided to present my case to management. They listened, and to my surprise they reversed their decision.

When I was preparing for the press conference to announce Misha and me as the one-year crew members, I asked what I thought was an innocent question about genetic research. I mentioned something we haven't previously discussed: Mark would be a perfect control study throughout the year. [ Scott's identical twin brother Mark Kelly is also an astronaut. ]
It turns out my mentioning this had enormous ramifications. Because NASA was my employer, it would be illegal for them to ask me for my genetic information. But once I had suggested it, the possibilities of studying the genetic effects of spaceflight transformed the research. The Twins Study became an important aspect of the research being done on station. A lot of people assumed that I was chosen for this mission because I have an identical twin, but that was just serendipitous.

Page 350:
I've been thinking about the whole arc of my life that had brought me here, and I always think about what it meant to me to read The Right Stuff as a young man. I feel certain that I wouldn't have done any of things I have if I hadn't read that book---if Tom Wolfe hadn't written it. On a quiet Saturday afternoon, I call Tom Wolfe to thank him. He sounds truly amazed to hear from me. I tell him we're passing over the Indian Ocean, how fast we're going, how our communication system works. We talk about books and about New York and about what I plan to do first when I get back (jump into my swimming pool). We agree to have lunch when I'm back on Earth, and that's now one of the things I'm looking forward to most.

Page 360:
As much as I worked on scientific experiments, I think I learned at least as much about practical issues of how to conduct a long-range exploration mission. This is what crew members on ISS are always doing---we are not just solving problems and trying to make things better for our own spaceflights, but also studying how to make things better for the future. ... And the larger struggles of my mission---most notably, CO2 management and upkeep of the Seedra--- will have a larger impact on future missions on the space station and future space vehicles. NASA has agreed to manage CO2 at a much lower target level, and better versions of CO2 scrubbers are being developed that will one day replace the Seedra...

MAD questions

1. What type of computer/software systems should we have in space?
The RAM is exposed to radiation, so memory can be corrupted. A self-stabilizing program that tolerates memory corruption could be useful in some situations. I also heard triple modular redundancy and replicas are preferred for some computer/sensor systems. How about Byzantine tolerant systems? Would that be useful for computers in space?

The reliability and assurance needed for computers in space stations should be in a league of their own. I wonder if there is a nice description of modern programming techniques/styles for computers deployed in space.

2. Could there be a book out there for you that could change your life? Give your life a whole new meaning/purpose?

3. Or could something you write/contribute can help change someone's life?

Sunday, June 3, 2018

Snowflake to Avalanche: A Novel Metastable Consensus Protocol Family for Cryptocurrencies

This paper is by Team-Rocket ---the authors are pseudonymous, I presume a guy, a gal, and a cat is involved. I learned about the paper when Emin Gun Sirer announced it on Twitter. I started speculating about the authors last week, and this is my latest guess.

OK, back to the facts. Here is the synopsis from the paper. "This paper introduces a brand new family of consensus protocols suitable for cryptocurrencies, based on randomized sampling and metastable decision. The protocols provide a strong probabilistic safety guarantee, and a guarantee of liveness for correct clients."

Below I first summarize the protocols and at the end I will provide my comments/evaluations. The analysis of the paper is very strong and span 8 pages, but I skip that section in my review.


Traditional consensus protocols incur high communication overhead and accurate knowledge of membership. So while they work great for less than 20 participants, they do not work for large numbers of participants and open/permissionless environments. Nakamoto consensus family protocols address that problem, but they are costly and wasteful. Bitcoin currently consumes around 63.49 TWh/year, about twice as all of Denmark.

This paper introduces a new family of consensus protocols, inspired by gossip algorithms: The system operates by repeatedly sampling the participants at random, and steering the correct nodes towards the same consensus outcome. The protocols do not use proof-of-work (PoW) yet achieves safety through an efficient metastable mechanism. So this family avoids the worst parts of traditional and Nakamoto consensus protocols.

Similar to Nakamoto consensus, the protocols provide a probabilistic safety guarantee, using tunable security parameters to make the possibility of a consensus failure arbitrarily small.

The protocols guarantee liveness only for virtuous transactions. Liveness for conflicting transactions issued by Byzantine clients is not guaranteed. This point is important, here is the explanation:
"In a cryptocurrency setting, cryptographic signatures enforce that only a key owner is able to create a transaction that spends a particular coin. Since correct clients follow the protocol as prescribed, they are guaranteed both safety and liveness. In contrast, the protocols do not guarantee liveness for rogue transactions, submitted by Byzantine clients, which conflict with one another. Such decisions may stall in the network, but have no safety impact on virtuous transactions. This is a very sensible tradeoff for building cryptocurrency systems." (While this is OK for cryptocurrency systems, it would be a problem for general consensus where conflicting requests from clients will be present.)

OK, then, does this well-formed non-conflicting transactions make consensus trivial in cryptocurrency systems? Would nonconflicting transactions reduce consensus to plain gossip? Then, what does the paper contribute? With plain gossip, the Byzantine client can first introduce one transaction, and then introduce another transaction. With plain gossip the last write wins and double-spending would ensue. So plain gossip won't do; the protocol needs to sample and establish/maintain some sort of communal memory of transactions such that an established transaction should be impossible to change. Nakamoto uses PoW-based chain-layering/amberification to achieve that. This paper shows how this amberification can be achieved via sampling-based gossip and DAG-layering without PoW! Avalanche is a nice name to denote this irreversible process.

The model

The paper adopts Bitcoin's unspent transaction output (UTXO) model: The clients issue cryptographically signed transactions that fully consume an existing UTXO and issue new UTXOs. Two transactions conflict if they consume the same UTXO and yield different outputs. Correct clients never issue conflicting transactions. It is also impossible for Byzantine clients to forge conflicts with transactions issued by correct clients.

On the other hand, Byzantine clients can issue multiple transactions that conflict with one another, and the correct clients should only consume at most one of those transactions. The goal of the Avalanche family of consensus protocols is to accept a set of non-conflicting transactions in the presence of Byzantine behavior.

The Avalanche family of protocols provide the following guarantees with high probability (whp):

  • Safety. No two correct nodes will accept conflicting transactions.
  • Liveness. Any transaction issued by a correct client (aka virtuous transaction) will eventually be accepted by every correct node.

Slush: Introducing Metastability

The paper starts with a non-Byzantine protocol, Slush, and then builds up Snowflake, Snowball, and Avalanche, with better Byzantine fault-tolerance (BFT) and irreversibility properties.

Slush is presented using a decision between two conflicting colors, red and blue. A node starts out initially in an uncolored state. Upon receiving a transaction from a client, an uncolored node updates its own color to the one carried in the transaction and initiates a query.

To perform a query, a node picks a small, constant-sized ($k$) sample of the network uniformly at random, and sends a query message. Upon receiving a query, an uncolored node adopts the color in the query, responds with that color, and initiates its own query, whereas a colored node simply responds with its current color.

Once the querying node collects k responses, it checks if a fraction $\alpha*k$ are for the same color, where $\alpha > 0.5$ is a protocol parameter. If the $\alpha*k$ threshold is met and the sampled color differs from the node's own color, the node flips to that color. It then goes back to the query step, and initiates a subsequent round of query, for a total of $m$ rounds. Finally, the node decides the color it ended up with at time $m$. The paper shows in the analysis that m grows logarithmically with $n$.

This simple protocol illustrates the basic idea but it has many shortcomings. It assumes synchronized rounds available to all. In Line 15, the "accept color" comes at the end of m rounds; there are no early accepts. Finally, Slush does not provide a strong safety guarantee in the presence of Byzantine nodes, because the nodes lack state: Byzantine nodes can try to flip the memoryless nodes to opposite colors.

Snowflake: BFT

Snowflake augments Slush with a single counter that captures the strength of a node's conviction in its current color. In the Snowflake protocol in Figure 2:

  1. Each node maintains a counter $cnt$;
  2. Upon every color change, the node resets $cnt$ to 0;
  3. Upon every successful query that yields $\geq \alpha*k$ responses for the same color as the node, the node increments $cnt$.

Here the nodes can accept colors in an asynchronous manner, not all at the end of $m$ rounds. Each can accept when its own counter exceeds $\beta$. When the protocol is correctly parameterized for a given threshold of Byzantine nodes and a desired $\epsilon$ guarantee, it can ensure both safety and liveness.

Things already got interesting here. The analysis shows that there exists a phase-shift point after which correct nodes are more likely to tend towards a decision than a bivalent state. Further, there exists a point-of-no-return after which a decision is inevitable. The Byzantine nodes lose control past the phase shift, and the correct nodes begin to commit past the point-of- no-return, to adopt the same color, whp.

Snowball: Adding confidence

Snowflake's notion of state is ephemeral: the counter gets reset with every color flip. That is too much history to forget based on one sampling result. Snowball augments Snowflake with momentum by adding confidence counters that capture the number of queries that have yielded a threshold result for their corresponding color (Figure 3):

  1. Upon every successful query, the node increments its confidence counter for that color.
  2. A node switches colors when the confidence in its current color becomes lower than the confidence value of the new color.

Avalanche: Adding a DAG

Adding a DAG improves efficiency, because a single vote on a DAG vertex implicitly votes for all transactions on the path to the genesis vertex. Secondly, it also improves security, because the DAG intertwines the fate of transactions, similar to the Bitcoin blockchain. This makes past decisions (which are buried under an avalanche) much harder to undo.

When a client creates a transaction, it names one or more parents in the DAG. This may not correspond to application-specific dependencies: e.g., a child transaction need not spend or have any relationship with the funds received in the parent transaction. The paper includes a detailed discussion of parent selection in the implementation section.

In the cryptocurrency application, transactions that spend the same funds (double-spends) conflict, and form a conflict set, out of which only a single one can be accepted. As we mentioned above, a conflict set is disparate from how the DAG is constructed, yet the protocol needs to maintain and check for conflict sets for the safety of consensus.

Avalanche embodies a Snowball instance for each conflict set. While Snowball used repeated queries and multiple counters to capture the amount of confidence built in conflicting transactions (colors), Avalanche takes advantage of the DAG structure and uses a transaction's progeny/descendents. When a transaction T is queried, all transactions reachable from T by following the DAG edges are implicitly part of the query. A node will only respond positively to the query if T and its entire ancestry are currently the preferred option in their respective conflict sets. If more than a threshold of responders vote positively, the transaction is said to collect a chit, cT=1, otherwise, cT=0. Nodes then compute their confidence as the sum of chit values in the progeny of that transaction. Nodes query a transaction just once and rely on new vertices and chits, added to the progeny, to build up their confidence.

As Figure 5 shows, when node u discovers a transaction T through a query, it starts a one-time query process by sampling $k$ random peers. A query starts by adding $T$ to Transaction set, initializing $cT=0$, and then sending a message to the selected peers. Each correct node u keeps track of all transactions it has learned about in set $T_u$, partitioned into mutually exclusive conflict sets $PT$, $T \in T_u$. Since conflicts are transitive, if $T_i$ and $T_j$ are conflicting, then $PT_i = PT_j$.

Figure 6 shows what happens when a node receives a query for transaction T from peer j. The node determines if T is currently strongly preferred and returns a positive response to peer j. The transaction T is strongly preferred, if every single ancestor of T is  preferred among its competing transactions (listed in its corresponding conflict set).

Note that the conflict set of a virtuous transaction is always a singleton.  Figure 7 illustrates a sample DAG built by Avalanche, where the shaded regions indicate conflict sets. Sampling in Avalanche will create a positive feedback for the preference of a single transaction in its conflict set. For example, because T2 has larger confidence than T3, its descendants are more likely collect chits in the future compared to T3. So T9 would have an advantage over T6 and T7 in its conflict set.

Figure 4 illustrates the Avalanche protocol main loop, executed by each node. In each iteration, the node attempts to select a transaction T that has not yet been queried. If no such transaction exists, the loop will stall until a new transaction is added. It then selects k peers and queries those peers. If more than $\alpha*k$ of those peers return a positive response, the chit value is set to 1. After that, it updates the preferred transaction of each conflict set of the transactions in its ancestry. Next, T is added to the set Q so it will never be queried again by the node.

Similar to Bitcoin, Avalanche leaves determining the acceptance point of a transaction to the application. Committing a transaction can be performed through a safe early commitment. For virtuous transactions, T is accepted when it is the only transaction in its conflict set and has a confidence greater than threshold $\beta_1$. Alternatively, T can also be accepted after a $\beta_2$ number of consecutive successful queries. If a virtuous transaction fails to get accepted due to a liveness problem with parents, it could be accepted if reissued with different parents.


The Team Rocket implemented a bare-bones payment system by porting Bitcoin transactions to Avalanche. They say: "Deploying a full cryptocurrency involves bootstrapping, minting, staking, unstaking, and inflation control. While we have solutions for these issues, their full discussion is beyond the scope of this paper."

As I mentioned before, this section talks in depth about parent selection:
The goal of the parent selection algorithm is to yield a well-structured DAG that maximizes the likelihood that virtuous transactions will be quickly accepted by the network. While this algorithm does not affect the safety of the protocol, it affects liveness and plays a crucial role in determining the shape of the DAG. A good parent selection algorithm grows the DAG in depth with a roughly steady "width". The DAG should not diverge like a tree or converge to a chain, but instead should provide concurrency so nodes can work on multiple fronts. 
Perhaps the simplest idea is to mint a fresh transaction with parents picked uniformly at random among those transactions that are currently strongly preferred. But this strategy will yield large sets of eligible parents, consisting mostly of historical, old transactions. When a node samples the transactions uniformly that way, the resulting DAG will have large, ever-increasing fan-out. Because new transactions will have scarce progenies, the voting process will take a long time to build the required confidence on any given new transaction.
Not only are the ancestors important, progeny is also important for low-latency transaction acceptance. The best transactions to choose lie somewhere near the frontier, but not too far deep in history. The adaptive parent selection algorithm chooses parents by starting at the DAG frontier and retreating towards the genesis vertex until finding an eligible parent.


The basic payment system is implemented in 5K lines of C++ code. Experiments are conducted on Amazon EC2 by running from hundreds to thousands of virtual machine instances using c5.large instances.

For throughput, maintaining a partial order (DAG) that just captures the spending relations allows for more concurrency in processing than a classic BFT log replication system where all transactions have to be linearized. Also, the lack of a leader in Avalanche helps prevent bottlenecks.

Figure 21 shows that all transactions are confirmed within approximately 1 second. Figure 22 shows transaction latencies for different numbers of nodes and that the median latency is more-or-less independent of network size.

Avalanche's latency is only slightly affected by misbehaving clients, as shown in Figure 23.

For emulated georeplication, measurements show an average throughput of 1312 tps, with a standard deviation of 5 tps, and the median transaction latency is 4.2 seconds, with a maximum latency of 5.8 seconds.

The paper includes a comparison paragraph to Algorand and Bitcoin:
Algorand uses a verifiable random function to elect committees, and maintains a totally-ordered log while Avalanche establishes only a partial order. Algorand is leader-based and performs consensus by committee, while Avalanche is leaderless. Both evaluations use a decision network of size 2000 on EC2. Our evaluation uses c5.large with 2 vCPU, 2 Gbps network per VM, while Algorand uses m4.2xlarge with 8 vCPU, 1 Gbps network per VM. The CPUs are approximately the same speed, and our system is not bottlenecked by the network, making comparison possible. The security parameters chosen in our experiments guarantee a safety violation probability below 10−9 in the presence of 20% Byzantine nodes, while Algorand's evaluation guarantees a violation probability below 5 × 10−9 with 20% Byzantine nodes. 
The throughput is 3-7 tps for Bitcoin, 364 tps for Algorand (with 10 Mbyte blocks), and 159 tps (with 2 Mbyte blocks). In contrast, Avalanche achieves over 1300 tps consistently on up to 2000 nodes. As for latency, finality is 10–60 minutes for Bitcoin, around 50 seconds for Algorand with 10 Mbyte blocks and 22 seconds with 2 Mbyte blocks, and 4.2 seconds for Avalanche.

MAD questions

1. Is this a new protocol family?

Yes. Nakamato consensus used PoW to choose leaders. Other protocols uses PoX (e.g., proof-of-lottery, proof-of-stake, PoW) to choose committees which then run PBFT.  Traditional consensus protocols require known membership.

In contrast, Avalanche is a leaderless protocol family that works in open/permissionless setting. It doesn't use any PoX scheme, but uses randomized sampling and metastability to ascertain and persist transactions.

The analysis of the protocols are very strong, and discuss phase-shift point and point-of-no-return for these protocols. This is a very interesting approach to think about consensus. This is also a very fresh approach to thinking about self-stabilization as well. I have a good understanding of self-stabilization literature but I haven't seen this approach in that domain either. I would say the approach would also see interest from the broad self-organizing systems area.

The DAG analysis in the implementation section is also interesting. I don't know much about the hashgraph-based solutions so I don't know how this DAG construction relates to those.

2. What is the incentive to participate?

The paper already discussed a cryptocurrency implementation using Avalanche. But minting, staking, credit distribution, etc, was left for future work. The incentive to participate would come from the cryptocurrency minting and staking. The credit assignment would be interesting and probably would involve several new research problems as well.

3. Where does the Sybil attack tolerance of Avalanche come from?

Avalanche tolerates Byzantine nodes using a tunable parameter to increase/decrease tolerance factor. The paper also reports results with 20% Byzantine nodes.

However, if participation is very inexpensive, Sybil attack is possible, where large number of Byzantine sock-puppets can be introduced to the system violating the BFT ratios. I guess a proof-of-stake based approach can be used in Avalanche to prevent the introduction of an enormous number of Byzantine nodes to the network.

Making Sybil nodes a bit costly help, and that can be complemented with keeping the number of correct nodes high. If the protocol can be really resource light, people wouldn't mind having this in the background in their laptops the same way they don't mind background Dropbox synchronization open. With some incentive it is possible to have many many many participants which also increase tolerance against Sybil attack.

4. What is the resource (computation and storage) requirements at the participants?

On the topic of resource-lightness of the protocol, the paper mentions that transaction validation is the performance bottleneck: "To test the performance gain of batching, we performed an experiment where batching is disabled. Surprisingly, the batched throughput is only 2x as large as the unbatched case, and increasing the batch size further does not increase throughput. The reason for this is that the implementation is bottlenecked by transaction verification. Our current implementation uses an event-driven model to handle a large number of concurrent messages from the network. After commenting out the verify() function in our code, the throughput rises to 8K tps, showing that either contract interpretation or cryptographic operations involved in the verification pose the main bottleneck to the system."

In Avalanche, each participant node needs to maintain the DAG. But since the DAG is a pretty flexible data structure in Avalanche, I think it shouldn't be hard to shard the DAG across groups of participants.

I also wonder about the cost of "conflict set" maintenance as I didn't get a good understanding of how the conflict sets are maintained. The paper mentions an optimization for conflict set maintenance in the implementation section: "A conflict set could be very large in practice, because a rogue client can generate a large volume of conflicting transactions. Instead of keeping a container data structure for each conflict set, we create a mapping from each UTXO to the preferred transaction that stands as the representative for the entire conflict set. This enables a node to quickly determine future conflicts, and the appropriate response to queries."

5. Can some of the assumptions be used for constructing an attack?

The paper says: "The analysis assumes a synchronous network, while the deployment and evaluation is performed in a partially synchronous setting. We conjecture that the results hold in partially synchronous networks, but the proof is left to future work."

I think I buy this. The protocols coming after Slush weakened the synchronicity assumption. The epidemic random sampling mechanism help propagate transactions to the network. So, with enough number of correct nodes, and some weak guarantees about processing speed, I think this can work. Well, we should see the proof.

The epidemic random sampling mechanism requires a decentralized service so that the node to connect with sufficiently many correct nodes to acquire a statistically unbiased view of the network. I guess peer-to-peer finger tables would be sufficient to achieve that. This service should also be guarded against Byzantine nodes as this can be used to affect consensus results by routing nodes to Byzantine pools for sampling. I am wondering if asynchronicity can also be used to introduce/reinforce network partitioning.

Tuesday, May 29, 2018

Some blockchain applications and Reviewcoin

Conjecture 34: Forall x, we have "x on blockchain".
I don't have a proof but I have seen dental on blockchain, kiwis on blockchain, shoes on blockchain, etc.

Apart from the many silly x-on-blockchain attempts, I have heard some serious and promising applications. The Brave browser basic attention token seems to be a serious effort and can help redefine the ad-economy on the browsers. I also recently heard about Abra digital currency bankteller, which is another solid company and application.

To continue, Jackson Palmer said these are his favorite decentralized technology projects: WebTorrentDat / BeakerMastodon, and Scuttlebutt. Finally, there are well circulated requests for apps, which I think can make some uptick in the blockchain applications game.

My blockchain app suggestions

I didn't want to be left out of the action. Ideas are free. Here are some things that occurred to me.

Medical school students can do an ICO and fund their schooling

This will be a utility token, where the owner can get 30 minutes visitation/consulting worth per token. Of course as the student becomes a more experienced and skilled doctor, the token will be more valuable, because 30 minutes of the doctor is now more valuable. Of course with the token comes a token economy, where you can buy/sell tokens for doctors. 

The same idea can apply to some other graduate schools as well, including electrical engineering, computer science, etc. Maybe some labs can raise funding for students/projects using this model.

Being a philanthropist never was so profitable/greedy. And being greedy was never more philanthropic.

But come to think of it, I don't like this. Would this lead to maintaining a stock ticker for people in your life? (Hmm, his grades this semester are not that good, time to short his tokens.) People into stock trading all day may be OK with this, but definitely not for me.

Review coin

This idea occurred to me recently: We should build a blockchain solution for conference reviewer crediting system. If you don’t help with reviewing, you don’t get to have your papers reviewed.

The biggest advantage of doing this in blockchain rather than using a traditional centralized databases would be that now this will belong to everyone in the academic community rather than a specific organization. IEEE versus ACM is a thing. There are many other organizations. Even some universities may not want to submit to a system if it is maintained by a "rival" institution. (Though I must say the owned by Cornell gained adoption from all/most institutions as a repository of electronic preprints, so it looks like it can be done.)

With blockchains technology it may be possible to get the incentive model right as well, since similar problems have been studied in blockchains for cryptocurrencies.

There are many challenges here though. We like to keep the reviewers anonymous. We also want to keep the power of the anonymous reviewers in check and avoid bad/empty/trivial reviews and enforce a certain quality-level in the reviews.

To keep the review quality in check we can require multisig on a review. The extra endorsements may come from PC chairs or some assigned reviewer-reviewers to keep to a more decentralized system.

The participants (authors, reviewers, reviewer-reviewers) can be anonymous. The reviews do not need to go into blockchain/dag, only hash of the reviews can go in there for verification while maintaining some privacy. The participants will have private key and public keys, so it is not possible to initiate a transaction on behalf of another as in Bitcoin. On the other hand, we are still worried about double-spending and attacks on the blockchain.

To fend off attacks, the Reviewcoin blockchain could be powered with PoW, PoS, or leaderless-gossip-based (as in the Avalanche paper). This is a crazy thought but would it at all be possible to make this blockchain reviewer-work powered? There are some works that look at human-work puzzles and proof-of-useful work for blockchains. I speculate how a review-work powered method could be adopted for this problem in MAD questions below.

Reviewcoin can enable a better compensated yet more open model for reviewing papers. The model can also be applied to reviewing whitepapers, component designs, or even code-reviewing. The party that needs to get something reviewed should hodl some Reviewcoin, which could be earned by either doing meaningful reviews, or helping with Reviewcoin blockchain infrastructure running strong, or via purchasing from Reviewcoin owners.

By offering more Reviewcoins it can be possible to recruit more skilled/capable/experience reviewers for the work. Offering more coins should not get you more favorable reviews though, and the reviewer-reviewers should be the enforcer for fairness appointed by the system.

MAD questions

1. Is linking reviews with compensation opening a can of worms?
Probably... But the existing system is not great either. Academics volunteer for technical program committees of prestigious conferences (1) to help, (2) to get some name recognition as your name gets listed on conference website, and (3) to keep current with new research in the field. This is a good-will based model; these reviewers would consult to outside starting from $200 an hour yet they spend tens of hours for free to review for a conference. Inevitably this good-will based approach is starting to face many problems. Freeloading is a problem, as we are seeing some conferences getting 100s and even 1000s of paper submissions. Incentives is becoming a problem: for journals and weak conferences, it is hard to find reviewers, because the name recognition part is not that strong. Finally, experts are overstretched with volunteering.

So maybe we should consider opening a can of worms, if it can help us build a more sustainable system going forward.

2. Human proof-of-work for Reviewcoin
To certify that a review is valid and not random junk to get some Reviewcoins, the review can be checked and signed by program committee chairs if this is for a conference. We can have authenticated academics serving for PC chairs, so their signature certifies the review is correct and the reviewer gets some Reviewcoin for the service. To extend to non-conference unorganized reviews,  the review certification can be done through a multisig from a randomly selected set of participants. If the Byzantine nodes (Sybil/bot participants) are in the minority, they will not be able to issue false certification of reviews. It may be possible to use a proof-of-stake of participants to limit the number of Sybil nodes. Also the Reviewcoin stake/credibility of participants can be used in a weighted manner for reliability.

3. Where did May go?
Time is an illusion. End-of-semester doubly so. (With apologies to Douglas Adams.)

Wednesday, May 23, 2018

TUX2: Distributed Graph Computation for Machine Learning

The TUX2 paper appeared in NSDI 17 and was authored by Wencong Xiao, Beihang University and Microsoft Research; Jilong Xue, Peking University and Microsoft Research; Youshan Miao, Microsoft Research; Zhen Li, Beihang University and Microsoft Research; Cheng Chen and Ming Wu, Microsoft Research; Wei Li, Beihang University; Lidong Zhou, Microsoft Research.

TUX2 introduces some new concepts to graph process engines to adapt them better for machine learning (ML) training jobs. Before I can talk about the contributions of TUX2, you need to bear with me as I explain how current graph processing frameworks fall short for ML training.

Background and motivation

Graph processing engines often takes a "think like a vertex" approach. A dominant computing model in "think like a vertex" approach is the Gather-Apply-Scatter (GAS) model. You can brush up on graph processing engines by reading my reviews of Pregel and Facebook graph processing.

Modeling ML problems as bipartite graphs

Many ML problems can be modeled with graphs and attacked via iterative computation on the graph vertices. The Matrix Factorization (MF) algorithm, used in recommendation systems, can be modeled as a computation on a bipartite user-item graph where each vertex corresponds to a user or an item and each edge corresponds to a user's rating of an item.

A topic-modeling algorithm like Latent Drichlet Allocation (LDA) can be modeled as a document-word graph. If a document contains a word, there is an edge between them; the data on that edge are the topics of the word in the document.  Even logistic regression can be modeled as a sample-feature graph.

Gaps in addressing ML via graph processing

1. The graphs that model ML problems often have bi-partite nature and heterogeneous vertices, with distinct roles (e.g., user vertices and item vertices). However, the standard graph model used in graph processing frameworks assumes a homogeneous set of vertices.

2. For ML computation, an iteration of a graph computation might involve multiple rounds of propagation between different types of vertices, rather than a simple series of GAS phases. The standard GAS model is unable to express such computation patterns efficiently.

3. Machine learning frameworks have been shown to benefit from the Stale Synchronous Parallel (SSP) model, a relaxed consistency model with bounded staleness to improve parallelism. The graph processing engines use Bulk Synchronous Parallel (BSP) model by default.

TUX2 Design

To address the gaps identified above, TUX2
1. supports heterogeneity in the data model,
2. advocates a new graph model, MEGA (Mini-batch, Exchange, GlobalSync, and Apply), that allows flexible composition of stages, and
3. supports SSP in execution scheduling.

Next we discuss the basic design elements in TUX2, and how the above 3 capabilities are built on them.

The vertex-cut approach 

TUX2 uses the vertex-cut approach (introduced in PowerGraph), where the edge set of a high-degree vertex can be split into multiple partitions, each maintaining a replica of the vertex. One of these replicas is designated the master; it maintains the master version of the vertex's data. All the remaining replicas are called mirrors, and each maintains a local cached copy.

Vertex-cut is very useful for implementing the parameter-server model: The master versions of all vertices' data can be treated as the distributed global state stored in a parameter server. The mirrors are distributed to workers, which also has the second type of vertices and use the mirror vertices to iterate on these second type of vertices.

Wait, the second type of vertices? Yes, here we harken back to the bipartite graph model. Recall that we had bipartite graph with heterogeneous vertices, with some vertices having higher degrees. Those higher degree vertices are master vertices and held at the server, and the low degree vertices are data/training for those master vertices and they cache the master vertices as mirror vertices and train on them. And, in some sense, the partitions of low-order vertex type in the bipartite graph corresponds to mini-batch.

The paper has the following to say on this. In a bipartite graph, TUX2 can enumerate all edges by scanning only vertices of one type. The choice of which type to enumerate sometimes has significant performance implications. Scanning the vertices with mirrors in a mini-batch tends to lead to a more efficient synchronization step, because these vertices are placed contiguously in an array. In contrast, if TUX2 scans vertices without mirrors in a mini-batch, the mirrors that get updated for the other vertex type during the scan will be scattered and thus more expensive to locate. TUX2 therefore allows users to specify which set of vertices to enumerate during the computation.

Each partition is managed by a process that logically plays both

  • a worker role, to enumerate vertices in the partition and propagate vertex data along edges, and 
  • a server role, to synchronize states between mirror vertices and their corresponding masters. 
Inside a process, TUX2 uses multiple threads for parallelization and assigns both the server and worker roles of a partition to the same thread. Each thread is then responsible for enumerating a subset of mirror vertices for local computation and maintaining the states of a subset of master vertices in the partition owned by the process.

Figure 3 illustrates how TUX2 organizes vertex data for a bipartite graph, using MF on a user-item graph as an example. Because user vertices have much smaller degree in general, only item vertices are split by vertex-cut partitioning. Therefore, a master vertex array in the server role contains only item vertices, and the worker role only manages user vertices. This way, there are no mirror replicas of user vertices and no distributed synchronization is needed. In the worker role, the mirrors of item and user vertices are stored in two separate arrays.

In each partition, TUX2 maintains vertices and edges in separate arrays. Edges in the edge array are grouped by source vertex. Each vertex has an index giving the offset of its edge-set in the edge array. Each edge contains information such as the id of the partition containing the destination vertex and the index of that vertex in the corresponding vertex array. This graph data structure is optimized for traversal and outperforms vertex indexing using a lookup table. Figure 2 shows how data are partitioned, stored, and assigned to execution roles in TUX2.

Scheduling minibatches with SSP

TUX2 executes each iteration on a minibatch with a specified size. Each worker first chooses a set of vertices or edges as the current minibatch to execute on. After the execution on the mini-batch finishes, TUX2 acquires another set of vertices or edges for the next minibatch, often by continuing to enumerate contiguous segments of vertex or edge arrays.

TUX2 supports SSP in the mini-batch granularity. It tracks the progress of each mini-batch iteration to enable computation of clocks. A worker considers clock t completed if the corresponding mini-batch is completed on all workers (including synchronizations between masters and mirrors) and if the resulting update has been applied to and reflected in the state. A worker can execute a task at clock t only if it knows that all clocks up to t−s−1 have completed, where s is the allowed slack.

The MEGA model 

TUX2 introduces a new stage-based MEGA model, where each stage is a computation on a set of vertices and their edges in a graph. Each stage has user-defined functions (UDF) to be applied on the vertices or edges accessed during it. MEGA defines four types of stage: Mini-batch, Exchange, GlobalSync, and Apply.

MEGA allows users to construct an arbitrary sequence of stages. Unlike GAS, which needs to be repeated in order (i.e., GAS-GAS-GAS-GAS), in MEGA you can flexibly mix and match (e.g., E-A-E-A-G). For example, in algorithms such as MF and LDA, processing an edge involves updating both vertices. This requires two GAS phases, but can be accomplished in one Exchange phase in META. For LR, the vertex data propagations in both directions should be followed by an Apply phase, but no Scatter phases are necessary; this can be avoided in the MEGA model because MEGA allows an arbitrary sequence of stages.

Below are examples of  Matrix factorization (MF) and Latent Dirichlet Allocation (LDA) programmed with the META model. (LDA's stage sequence is the same as MF's.)

Implementation and Evaluation

TUX2 is implemented in ~12,000 lines of C++ code. TUX2 takes graph data in a collection of text files as input. Each process picks a separate subset of those files and performs bipartite-graph-aware algorithms to partition the graph in a distributed way. Each partition is assigned to, and stored locally with, a process. Unfortunately the evaluations with TUX2 do not take into account graph partitioning time, which can be very high. 

The evaluations show that data layout matters greatly in the performance of ML algorithms. Figure 8 compares the performance of BlockPG, MF, and LDA with two different layouts: one an array-based graph data layout in TUX2 and the other a hash-table-based lay-out often used in parameter-server-based systems (but implemented in TUX2 for comparison). The y-axis is the average running time of one iteration for BlockPG, and of 10 iterations for MF and LDA to show the numbers on a similar scale. These results show that the graph layout improves performance by up to 2.4× over the hash-table-based layout.

The paper also includes a comparison with Petuum, but the evaluations have several caveats. The evaluations do not include comparison of convergence/execution time; execution time per iteration does not always determine the convergence time. The evaluations do not take into account the partitioning time of the graph for TUX2. And finally, some comparisons used early unpatched version of Petuum MF algorithm whose data placement issues are resolved later.

MAD questions

1. What is the net gain here?
I like this paper; it made me ask and ponder on many questions, which is good.

I don't think TUX2 pushes the state of the art in ML. ML processing frameworks are already very efficient and general with the iterative parameter-server computing model, and they are getting better and more fine grained.

On the other hand, I think TUX2 is valuable because it showed how the high-level graph computing frameworks can be adapted to implement the low-level parameter-server approach and address ML training problems more efficiently. This may provide some advantages for problems that are/need to be represented as graphs, such as for performing ML training on Sparql data stores.

Moreover by using higher-level primitives, TUX2 provides some ease of programmability. I guess this may be leveraged further to achieve some plug and play programming of ML for certain class of programs.

So I find this to be a conceptually very satisfying paper as it bridges the graph processing model to parameter-server model. I am less certain about the practicality part.

2. How does graph processing frameworks compare with dataflow frameworks?
There are big differences between dataflow frameworks and graph processing frameworks. In the dataflow model, there is a symbolic computation graph, the graph nodes represent mathematical operations, while the graph edges represent the data that flow between the operations. That is a very different model than the graph processing model here.

In MEGA, there are only 4 stages, where the Apply stage can take in user defined functions. This is higher-level (and arguably more programming friendly) than a dataflow framework such as TensorFlow which has many hundreds of predefined operators as vertices.

3. How does TUX2 apply to Deep Learning (DL)?
The paper does not talk about whether TUX2 can apply to DL and how it can apply.

It may be possible to make DL fit the TUX2 model with some stretching. Deep neural network (DNN) layers (horizontally or vertically partitioned) could be the high-rank vertices hold in the servers. And the images are low-ranked vertices hold in partitions.

But this will require treating the DNN partions as a meta-vertex and schedule executions for each sub-vertes in the meta-vertex in one cycle. I have no clue about how to make backpropagation work here though.

Moreover, for each image, the image may need to link to entire NN, so the bipartite graph may collapse into a trivial one and trivial data-parallelism. It may be possible to make the convolutional layers can be distributed. It may even be possible to insert early exits and train that way.

So, it may be possible but it is certainly not straightforward. I am not even touching the subject of the performance of such a system.