Paper review. A secure sharding protocol for open blockchains.

This paper appeared in ACM CCS'16. It is authored by Loi Luu, Viswesh Narayanan, Chaodong Zheng,  Kunal Baweja, Seth Gilbert, and Prateek Saxena.

Here is a video of the conference presentation.

The problem

The Bitcoin transaction throughput does not scale. Bitcoin's PoW blockchain consumes massive computational power yet can only process up to 7 transactions per second.

The paper proposes, Elastico, a new distributed agreement protocol, based on a non-PoW Byzantine consensus protocol, for permissionless blockchains.

The challenge is that classical/non-PoW Byzantine consensus protocols do not work in an open environment:


The main idea

The key idea in Elastico is to partition the network into smaller committees, each of which processes a disjoint set of transactions (or a "shard"). The number of committees grows linearly in the total computational power of the network. Each committee has a reasonably small number of members, around 100, so they can run a classical byzantine consensus protocol to decide their agreed set of transactions in parallel.

Sharding protocols are commonly used in distributed databases and in cloud infrastructure in trusted environments. Elastico provides the first sharding protocol for permissionless blockchains tolerating a constant fraction of byzantine network nodes.

The model


The agreement property is a relaxation of the original byzantine consensus problem. The "agreement" property allows the honest processors to be in "probabilistic agreement" such that processors agree on a value with some high probability, rather than being in exact agreement.

This probabilistic part comes from step 5 of the algorithm below. There is still a proof of work component in Elastico, hence, the agreement is probabilistic.

The Elastico protocol


  1. Identity establishment and committee formation: each node uses IP, public key and PoW to locally generate an identity. 
  2. Overlay setup for committees: Nodes communicate to discover identities of other nodes in their committee. A directory committee is formed to do this more efficiently, which entails more details.
  3. Intra-committee consensus: Nodes run a standard byzantine agreement protocol, PBFT, within their assigned committees to agree on a single set of transactions.
  4. Final consensus broadcast: The final committee computes a final block from all the values received from other committees by running PBFT and broadcasts it to the network.
  5. Epoch randomness generation: the final committee runs a distributed commit-and-xor scheme to generate an exponential based but bounded set of random values. These are broadcast and used in the PoW in the next epoch.

The results

Due to its use of committees, Elastico is expected to scale transaction rates almost linearly with available computation for mining: the more the computation power in the network, the higher the number of transaction blocks selected per unit time. The scalability experiments are done on Amazon EC2 with up to 1,600 nodes and confirm the theoretical scaling properties:

"With the same network implementation as in Bitcoin, the scale up (blocks per epoch) for 100, 200, 400, 800 and 1,600 nodes with equal computational power 2 are as theoretical expectation, namely 1, 1.89, 3.61, 6.98 and 13.5 times respectively. Finally, Elastico’s clean-slate design decouples the consensus from block-data broadcasts, hence the bandwidth spent by each node remains almost constant, regardless of the size of the network. Our simulations are necessarily on a smaller scale than Bitcoin; however, if we project our results to a full deployment to a network of Bitcoin’s scale, we can expect a scale up of 10,000 in the number of agreed values per epoch. This agreement throughput is 4 orders of magnitude larger than Bitcoin's."

The related work



Recently Bitcoin-NG, Aspen, and ByzCoin are also related work. They also satisfy the same properties with Elastico in that table.

MAD questions

1. Which is more secure: a probabilistic PoW blockchain or Elastico with the "#byzantine<1/3" assumption?

This may be a false dichotomy, because even in the PoW case the byzantine nodes is assumed to be less than 1/3rd of the network to avoid selfish mining attacks. Ok, given that, let's try to analyze further. In either case the leader has limited power: it cannot invent transactions for others but can only decide on whose transactions to include or not. So only double-spending attack can be performed. The PoW does a good job of it if you wait for 6 more blocks to be added to consider a transaction to be finalized/irreversible. But for Elastico, if the "byzantine<n/3" is violated it is easier to do the double spending attack because there is no PoW and 6 blocks rule to guard the chain against it.

2. The agreement in Elastico is probabilistic, because there is still a PoW component in step 5 of the algorithm: Epoch Randomness Generation. Does that mean Elastico does not provide *instant-irreversibility* of the chain and history can rewritten? Even when the byzantine ratio is less than 1/3?

I didn't see this discussed in the paper. Elastico does not use PoW to select leader that adds a block, but rather select committees. So Elastico may indeed be providing instant-irreversibility of an added block, when byzantine<1/3.

On the other hand, maybe there is a slight chance for violating instant-reversbility. What if a different set of nodes are chosen for the committees which do not have a memory of the last block in the log? There may be a slight chance to pull this off by selecting enough number nodes to whom the last block broadcast has not reached yet. The Byzcoin paper, which I will summarize later, provides instant-irreversibility using a more cleaner approach.

3. Why do we need the committees to run PBFT? Aren't they just putting together transactions in a block? You can do that without using PBFT provided that the transactions satisfy some integrity constraints/checks, right?

I guess this is just to make sure that the block produced is the work of a group, and not by just one individual. Otherwise, a byzantine individual node masquerading the group can unduly influence the outcome. So even when byzantine<1/3, with collusion of such byzantine nodes, agreement can be violated.

4. This is more of a nitpick than a question. It looks like the paper could have provided a more clear discussion on the bound on f: byzantine nodes. At one point the paper says: "Here, 1/4 is an arbitrary constant bounded away from 1/3, selected as such to yield reasonable constant parameters."  Then later it amends: "Note that we select f = 1/4 in order to achieve a practical value of committee size. Theoretically, Elastico can work with any f less than 1/3 by increasing the committee size c accordingly to f. The 1/3 bound is because we need to run a consensus protocol (e.g., PBFT) at every committee in Step 3, which can tolerate at most 1/3 fraction of malicious committee members.)"

5. How does Aspen compare with Elastico?
Aspen also has parallel tracks/channels for processing blocks. In Aspen parallel tracks are dedicated to specific channels. In Elastico parallel track sharding is mostly for improving throughput maybe sharding with user-ids. Elastico provides improvements in faster irreversibility. On the other hand, Aspen's sharding protocol is much simpler/cleaner than Elastico's.

Comments

Anonymous said…
It's actually a nice and helpful piece of info. I'm satisfied that you just
shared this helpful information with us. Please
stay us up to date like this. Thanks for sharing.
Anonymous said…
Do you have a spam issue on this site; I also
am a blogger, and I was curious about your situation; we have developed some
nice procedures and we are looking to trade strategies with others, why not shoot me an email if interested.

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book