Avalanche is a descendent of Texel

When I first skimmed the Texel paper (which was released on August 28 2019), I could see parallels between Texel and Avalanche, and noted those in the third paragraph of my review. But I had missed the footnote in the Texel paper which said that Texel was originally written in 2010. I thought Texel was new, and was speculating that it may be a more deterministic version of Avalanche, that is applied to crash tolerant distributed consensus. After writing two more blog posts on modeling Texel in TLA+ and understanding it better, I now think Texel formed a basis that Avalanche descended from.

Texel provided an asynchronous and truly-leaderless solution to consensus. Instead of appointing a leader to bring nodes to consensus (as in Paxos), Texel shows how each node can make its own mind and still achieve consensus in an asynchronous system. By adopting a leaderless solution to asynchronous consensus, Texel avoids the disadvantages of solutions that appoint a leader for achieving consensus. In a leader-based solution, what if the leader fails? In order to avoid getting stuck forever, the nodes should use a failure detector to suspect if the leader is unavailable. Failure detectors are a liability in large-scale systems with a lot of turn-over. Another big drawback with having a leader is that for large-scale systems, the leader becomes a performance bottleneck.

Avalanche operationalized Texel's leaderless approach for large-scale decentralized consensus. It extended Texel's leaderless consensus approach in terms of scalability and quick finality (by using sampling and metastability), and applied the resultant decentralized algorithm in the blockchain domain.

But Texel did not consider Byzantine faults

Avalanche considers Byzantine faults which Texel did not consider. The question is, what can a Byzantine node do in blockchains? Answer: it can try to perform double-spending. That translates to the node proposing two different transactions with the same UTXO for itself (the transactions need to be signed by the private-key of the initiator).

The safety (i.e., agreement) property of the Texel approach says that no node in the system can decide different things for the same value (transaction). This, translated to Avalanche terms, means that no two correct nodes will decide two different transactions with the same UTXO. And this rules out double-spending for a Byzantine initiator. Even when other Byzantine nodes in the system may try to conspire with the Byzantine initiator and push some correct nodes to adopt different supporting values, with the threshold for supporting value adoption higher than the number of possible Byzantine nodes, Texel approach and its respective adaptation in Avalanche can avoid this problem.

Avalanche also does a very clever judo move on Texel's liveness problem and turns it into a feature. In my reviews for Texel, I mentioned that liveness (termination of consensus) is a problem for Texel. In the blockchain domain, Avalanche adopts a similar approach to supporting value selection, and runs into liveness problem when two different values are competing to be decided on for the same consensus instance. In the blockchain domain, this corresponds to a Byzantine node pushing two different transaction with the same UTXO. And in this case the liveness violation is a feature not a bug. Since the correct clients follow the protocol as prescribed (and avoid double-spending), they are guaranteed both safety and liveness. In contrast, the protocols do not guarantee liveness (but still guarantees safety) for double-spending transactions submitted by Byzantine clients, which conflict with one another. As the Avalanche paper says "such decisions may stall in the network, but have no safety impact on virtuous transactions."

What about the Sybil nodes problem? Avalanche deals with Sybil problem  using a PoS solution. It can even adopt a PoW solution as well, because dealing with Sybil nodes is an orthogonal problem to solving consensus.

Scaling Texel

In Texel for adopting a supporting value, you need to read it from more than F nodes. In the decentralized consensus setting Avalanche considers, N and F can be huge, thousands of nodes. So for finding a supporting value, a node in Avalanche samples a bunch of nodes, which is much smaller than F nodes. But the random sampling of nodes still enables tolerating the F faulty nodes. Since F is a fraction of N, it cannot have too much effect in the sampling based selection of a supporting value for a node in Avalanche.

How does Avalanche deal with the cancellation of experimentation problem in Texel? Again sampling and the use of metastability concept helps with this. Having a large scale system becomes an advantage here because the likelihood/risk of reading from inconsistent cuts of each other from overlapping experiments and getting affected by this diminishes. This way Avalanche avoids the agreement violation problem due to inconsistent snapshot read (if concurrent and overlapping experiments are not canceled).

Avalanche also applies the metastability concept to make the consensus finalization decision faster, and without the need to contacting N-F nodes.

Closing the loop

I will assign Texel as part of a TLA+ modeling project in my distributed systems class. My distributed systems class topics are as follows:
  1. Introduction, 2 phase-commit
  2. Reasoning about distributed programs, safety/progress
  3. Consensus, Paxos
  4. Failure detectors, Faults and fault-tolerance
  5. Time: logical clocks, State: distributed snapshots
  6. Datacenter computing, Cloud computing
  7. NoSQL databases, CAP theorem, Distributed databases
  8. Big data, Big data analytics
  9. Decentralized ledgers and blockchains
The course starts with consensus and ends with blockchains. Showing a Texel to Avalanche transition is a good way to tie consensus (where Texel shows an alternative solution to leader based consensus as in Paxos) to blockchain (where Avalanche shows how to scale and operationalize Texel to large-scale decentralized environments).

MAD questions

1. Could it be that Avalanche and Texel are unrelated?
No. Absolutely not!

Tobler's first law of geography says "everything is related to everything else, but near things are more related than distant things."

"Everything is related", so Avalanche and Texel are related :-)

"Near things are more related than distant things". Since both Texel an Avalanche have strong Cornel links, they are even more related.

Banter aside, I noticed that Texel in turn has several parallels to Ben-Or. Nothing comes out of void. So you can also make an argument that Avalanche is a descendent of Ben-Or as well. But, as the law said "everything is related", so I am still in the right. Here are the similarities I see between Texel and Ben-Or.
  1. Texel does not use rounds, but consistent-cuts. Ben-Or uses rounds, but the rounds in Ben-Or are leaderless rounds: they are non-client-restricted rounds, in contrast to client-restricted rounds in leader-based solutions.
  2. Similar to a node experimenting in Texel to find a supporting value by talking to F+1 nodes, in Ben-Or a node goes through the first phase of a round to identify a supporting value by talking to F+1 nodes.
  3. Similar to the node finalizing a decision by finding it at N-F nodes in Texel, in Ben-Or a node finalizes its decision by finding it at N-F nodes.
As one difference, upon failure to get a decision, Ben-Or makes some nodes to change their vote before the next experimenting phase. This helps jolt/tilt the system toward a decision, so that eventually probabilistically the system converges to consensus.

Comments

Popular posts from this blog

Learning about distributed systems: where to start?

Hints for Distributed Systems Design

Foundational distributed systems papers

Metastable failures in the wild

The demise of coding is greatly exaggerated

Scalable OLTP in the Cloud: What’s the BIG DEAL?

The end of a myth: Distributed transactions can scale

SIGMOD panel: Future of Database System Architectures

Why I blog

There is plenty of room at the bottom