About self-stabilization and blockchain consensus

This is a half-baked exploratory piece about how some self-stabilization ideas may relate with blockchain consensus in open/permissionless environments. I will write about some similarities and some major differences between the self-stabilizing distributed algorithms work and blockchain consensus deployments.

Self-stabilization

Let's start with a brief review self-stabilization.

Stabilization is a type of fault-tolerance that handles faults in a principled unified manner instead of on a case-by-case basis. Instead of trying to figure out how much faults can disrupt the system's operation, stabilization assumes arbitrary state corruption, which covers all possible worst-case collusions of faults and program actions. Stabilization then advocates designing recovery actions that takes the program back to invariant states starting from any arbitrary state. This makes stabilization suitable for dealing with unanticipated faults.

The most distinct property of self-stabilization is its emphasis on liveness and providing eventual safety. When the invariant is violated, liveness still holds, and the system makes progress from faulty-states moving closer to invariant states until eventually safety is reestablished. In that sense stabilization is related to eventual-consistency approaches as well.

If you like to see an example, here is Dijkstra's self-stabilizing token ring program. In this problem, the processes are arranged in a ring fashion and there is a unique token circulating in this ring. (You can think of the token may be providing mutual exclusion to the processes; whichever process has the token can access the critical-section/shared-resource).

The similarity between self-stabilization and blockchain consensus is that they are both OK with finite-duration divergence of state/consistency.  In (most) blockchain consensus protocols, you may have a fork, which is resolved eventually. In Bitcoin the fork is resolved with addition of new blocks and using the longest-chain rule with time. Avalanche uses DAG not a chain, so the "fork" is resolved using increased confidence values with time. (In other words, the unpopular branches in DAG atrophy over time.)

In stabilization literature, there are distantly related approaches for blockchain consensus. I would say two important/early ones are Error-detecting codes and fault-containing self-stabilization (2000) and Probabilistic self-stabilization (1990).

Also, the undecided-state dynamics, population protocols by Aspnes et.al., and
self-* properties through gossiping are somewhat related work for Avalanche family consensus protocols.

State-management perspective

There are big differences in self-stabilization and blockchain attitude for state.

Stabilization likes to use soft-state and minimal state. That helps with efficient/lightweight eventual consistency.

Blockchain achieves eventual-consistency with hard-state, and lots of it. This is achieved through full replication at each node and using error-checking codes. While the stabilization approach does not like keeping history (because it can be corrupted), blockchain approach embraces history. However, blockchain maintains history in such a way that corruption is evident and can be weeded out! The error-checking codes (or confidences in Avalanche) are chained to provide increasing toughness/tolerance to tampering for older entries.

Another difference is in terms of partial/local state versus global state: In stabilization nodes often have partial/local state, and the global state is composition of these local stated. In decentralized consensus, each node has full state, the entire chain or the entire DAG. So in some sense stabilization acts distributedly across nodes, and blockchain consensus acts in a decentralized manner per node.

Dynamic networks and churn perspective

I like to also take some time to discuss how some self-stabilization work and blockchain consensus deal with open environments.

Self-stabilizing graph algorithms provide some tolerance/cushion for dynamic (time-varying) network. The edge costs may change (rewiring the graph) and the stabilizing algorithm reacts/adapts to the new graph. Some examples are  stabilizing clustering and shortest path tree algorithms. While self-stabilization tolerates some churn, if the churn is high, the self-stabilizing algorithm may not be able to catch up.

In contrast, dynamic networks is not a big problem for blockchain consensus, especially since communication is often done via an epidemic broadcast/gossip or pull-based gossip as in Avalanche. The blockchain consensus protocols just need a way to propagate the transactions and chain state for replication at the nodes.

There is also the issue of Sybil-resistance. If participation is very inexpensive, Sybil attack is possible in an open/permissionless environment. The attack works by introducing a large number of Byzantine sock-puppets to the system and by violating the byzantine nodes ratio in the system. Sybil resistance is orthogonal to what I discussed and can be achieved for both approaches by incorporating PoW or PoS techniques.

MAD questions

1. Is it possible to use blockchain full-state replication as composing block for building larger-scale stabilizing systems?
Maybe the blockchain full-state replication may emulate a Virtual Node (VN) in an open/trustless region/zone, and you can have a stabilizing solution built/deployed over these unfallable/uncrashable VNs.

2. Is open/permissionless environment overrated? If we have federation like systems hierarchically-arranged, and quick reconfiguration/stabilization to choose suitable quorums for consensus, would that be enough?

Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book