SOSP19 Day 1, Blockchain session

The second session on Day 1 was on blockchains. There were three papers on this session.

Teechain: A Secure Payment Network with Asynchronous Blockchain Access

This paper is by Joshua Lind (Imperial College London), Oded Naor (Technion), Ittay Eyal (Technion), Florian Kelbert (Imperial College London), Emin Gun Sirer (Cornell University), Peter Pietzuch (Imperial College London).

Bitcoin's throughput is a measly 4 transactions per second as limited by Nakamoto consensus. This leads to work on off-chain scaling via payment networks, e.g. lightning network.  The payment network consists of nodes that establish pairwise connections. To establish a multihop connection these pairwise connections are utilized as intermediaries. There are three phases to a payment network: setup a multihop connection, process payments for some time, perform a settlement where the final balance is written back to the blockchain.

To guard against a roll-back attack and ensure that the correct balance is written, the blockchain is used as the root of trust. If a node misbehaves,  writes an incorrect amount, the other reacts within delta time, corrects by providing proof of lightning transactions, resolves the situation, and  gets incentive for it.

The problem with this is that the reaction time delta requires synchronous access. Spam/congestion attacks can make some transactions take more than 7 days to be written to the chain. So what should be the appropriate value of delta, the reaction time? Security of the payment channel should not rely on read/write latencies.

To address this issue, the paper proposes Teechain, the first asynchronous blockchain access payment network. Teechain removes the blockchain as root-of trust by introducing another root of trust, a treasury. Treasury controls funds, balances, and payments. But how do we realize treasuries for blockchains, avoiding centralization and trust?

Teechain uses committees for realizing treasuries for blockchains. A treasury committee consists of n parties in the network, and requires m out of n parties to agree before accessing funds. But how large should m and n be? Large committees are problematic for scaling. To reduce the size of committees, the paper proposes trusted executions, such as the Intel SGX enclave. The enclaves guards against software hardware attacks, and uses trusted execution environments (tees) to secure committee members. The paper does not take tees as the entire solution, because some attacks, like foreshadow could still be possible with tees. Instead, the paper uses tees to increase the attack costs and combines tees with treasury committees to solve the problem securely.

OK, how does the treasury committee maintain agreement? Chain replication is used for propagating the transactions to the committee. Even though the committee members use tees, they do not rule out that some nodes can still be byzantine, so there is an m-out-of-n agreement at the end. If chain replication is applied naively, some attacks are possible. So Teechain uses a variant called force-freeze chain replication, where if the chain configuration is changed or if there is a fault, the decision is an abort decision, and the state is dumped.

The paper includes evaluations to show that Teechain scales out. Evaluations were done using a complete graph and a hub-spoke graph. For committee sizes n=3 or 4 are used, and it was shown that 1 million tx/sec are possible using 30 machines.

Teechain is available as opensource from

Fast and Secure Global Payments with Stellar

This paper is by Marta Lokhava (Stellar), Giuliano Losa (Galois), David Mazières (Stanford), Graydon Hoare (Stellar), Nicolas Barry (Stellar), Eliezer Gafni (UCLA), Jonathan Jove (Stellar), Rafał Malinowski (Stellar), Jed McCaleb (Stellar).

In 2018, I had written a review of the stellar arxiv paper I read on Stellar. This paper extends that with formal verification, evaluation, and lessons learned from several years in deployment.

However, the presentation was not technical. It was a very high level presentation that avoided the description of the protocol and formal verification. I think the presenter, David, must have done this presentation hundred times before to VCs because the presentation was pitch perfect. Every criticism possible about the protocol was proactively flipped and explained as an advantage. Here is how the presentation went.

Things we take for granted, such as a bank account in a stable currency, access to well-regulated investments, cheap international money transfers, globally accepted fee-free credit cards, are not available in many places. Stellar provides more equitable access to assets. It is the first solution to provide:
  1. open membership
  2. issuer-enforced finality: still need secure servers, but issuer owns or designates them
  3. cross-issuer atomicity
The Stellar transaction model is based on replicated state machines (RSMs). Each RSM executes transactions to keep ledger state. Transactions guarantee atomicity. But now that the RSM is distributed, how does Stellar guarantee ledger integrity? Stellar uses a shared RSM. The idea is to follow the graph transitively until it converges. The hypothesis is that any two nodes transitively follow a common node. This holds true for Internet with its hierarchical domains.

The Byzantine agreement in Stellar follows from this hypothesis. The key idea is that the broadcast protocol steps are conditioned on other nodes' steps: take the step if all nodes are mutually satisfied. For availability must generalize follows to a sets of peers, called quorum slices. A quorum is a set of nodes that contains one slice from nonfaulty server.

Stellar has top tier, middle tier, leaf tier servers. As in Internet, no central authority appoints the top tier. The production network has been running for 4 years.

In the Q&A period, one question was: "How are you dealing with dynamic reconfiguration of quorums?" David said "we get it for free, and can unilaterally change quorum slices at any time". But this answer is not clear to me because reconfiguration could reduce your safety. As far as I remember from the Stellar arxiv report, the onus is on the user to figure out that her quorum slices are set up correctly. So the reconfiguration should be more involved than that.

Another question was: "Can you do smartcontracts where state lies in different slices?" David claimed it is possible, but again without reading the paper I don't understand this answer. The state could be partitioned (and not fully replicated) across all nodes, and ordering across all slices may be involved.

Notary: A Device for Secure Transaction Approval

This paper is by Anish Athalye (MIT CSAIL), Adam Belay (MIT CSAIL), Frans Kaashoek (MIT CSAIL), Robert Morris (MIT CSAIL), Nickolai Zeldovich (MIT CSAIL).

Smartphones suffer from bugs and attacks. So hardware wallets (or cryptocurrency wallets) are adopted for transaction approval for critical financial transactions. The ledger app store contains 50+ third party apps. Unfortunately, existing hardware wallets also have OS bugs and potential hardware bugs as well.

The contribution in this paper is to introduce "notary", a device for secure transaction approval. Notary uses an agent separation architecture. And the authors have developed a physical hardware wallet prototype for notary.

The separation architecture provides isolation: there is a kernel system-on-chip (soc) and a separate agent soc. The kernel soc does not run any third party code. It is the agent soc that runs third-party code. The agent soc does not have access to OS. The agent also does not have full access to hardware, as it lacks access to the persistent storage.

There are only 2 wires across the kernel soc and agent soc: UART and RST. Using RST, the kernel can reset the agent soc. More specifically, the kernel clears the state in agent soc after every transaction. This way notary uses clean-state deterministic start to ensure noninterference across transactions, and avoid any bugs/attacks.

Verilog is used for verifying that notary clears registers under any path. SMT-compatible format is used for symbolic circuit simulation. They developed a RISC-V based prototype, and evaluated with two agents: Bitcoin and web-app approval.


Popular posts from this blog

Foundational distributed systems papers

Your attitude determines your success

My Distributed Systems Seminar's reading list for Fall 2020

Silent data corruptions at scale

I have seen things

Learning about distributed systems: where to start?

Read papers, Not too much, Mostly foundational ones

PigPaxos: Devouring the communication bottlenecks in distributed consensus

Sundial: Fault-tolerant Clock Synchronization for Datacenters

Facebook's software architecture