Compositional Programming and Testing of Dynamic Distributed Systems

This paper is authored by Ankush Desai, Amar Phanishayee, Shaz Qadeer, and Sanjit A. Seshia, and it appeared at OOPSLA 2018.

The paper describes the ModP framework extension over the P language. P is a language developed for safe event-driven programming. P models processes as state machines: the state machines communicate via message passing, and events cause state machines to transition between states. 

The killer feature of P is unifying modeling with programming! P enables the programmers to write specifications and enables systematic testing both via random testing and exhaustive symbolic execution testing.

P is used in event driven systems such as device drivers and robotics. The P GitHub page states that P is used in Microsoft (P# and Coyote) and also used extensively at AWS for model-based testing of complex distributed systems.  

Compositional testing of distributed systems

A problem with systematic testing is that monolithic testing of large systems fail due to combinatorial state space explosion. Achieving full coverage testing of a large monolithic system takes forever. The ModP framework extension in this paper addresses this problem. 

To achieve scalability of systematic testing, *decomposition* is used. ModP shows how to decompose the monolithic system-level testing problem to component-level testing problems, and how to achieve scalable model-based integration testing of distributed systems.

In order to lift the component-level testing to the  whole system-level, ModP employs the theory of assume-guarantee (AG) reasoning (also known as rely-guarantee reasoning). This is a classic idea in formal methods proposed more than 30 years ago. Assume-guarantee starts with each component specifying its assume from its environment and its provided guarantee in return about its transitions. Note that a component's environment is formed by the other components it will be composed with. Thus if a component's assume is satisfied by the combined guarantees provided by the other components, we can bank on the guarantee from that component. Therefore the AG theorem states that it is possible to take the conjunction of all components' AG specifications and derive the system AG specification.

In the ModP testing approach, each component is first automatically tested for showing refinement with respect to the corresponding abstract modules and specifications. Then each component is composed with the abstract versions of the other components (as in mocking but more rigorously now) and is tested for system level specifications. This testing can be done using randomized property testing or via using better coverage symbolic execution testing.

Composing the concrete implementations Two Phase Commit and MultiPaxos SMR and testing them together monolithically would not work due to state space explosion. So ModP composes Two Phase Commit with the abstract version of MultiPaxos SMR, the SMR abstraction, and tests them first. Then ModP composes MultiPaxos SMR with the abstract version of Two Phase Commit, SMR client to make sure there is no incompatibility there either. Finally by using the AG theorem, ModP infers the concrete versions of the two components composed would also provide the same correctness properties. An added bonus is that if we later want to swap Chain Replication with MultiPaxos, we do not need to repeat the Two Phase Commit and SMR abstraction test again. We just perform Chain Replication and SMR client tests. 

The ModP approach adds rigor to the traditional integration testing, and achieves a very good coverage. The cost to the programmer is to write the abstract modules and specifications to test with. It does not provide as good coverage as proof based approaches like Dafny and Armada. However, proof-based approaches incur high overhead for the programmer in coming up with the proof hints and even then do not scale well to large systems, especially to distributed systems, due to the combinatorial state space explosion problem. 

In the context of distributed systems, the AG reasoning also gets somewhat complicated due to the dynamicity involved: the processes are created *dynamically* and the communication topology may also change *dynamically*. Most of the discussion in the paper is about the technical workarounds that were required to make the AG theory work elegantly in this dynamic model. To make this work, ModP introduced interfaces, input-event-hiding rule, and a well-formedness check that reinforces output (send event and interface) disjointness.zoo


The paper demonstrated the power of ModP's model-based integration testing framework by building a transaction-commit service using two phase commit and MultiPaxos or Chain Replication. A significant finding in the paper (Figure 13) is that there is a 100 folds reduction in execution traces for testing when substituting an abstract module for the environment instead of testing against the concrete module.

Fig 14 shows that, by enabling compositional testing, ModP achieves faster bug finding with prioritized random search as compared to monolithic testing. Moreover while the symbolic execution completes in 1.3 hours in the decomposed system, it does not finish on the monolithic even after 10 hours.

The conference presentation from Ankush Desai is excellent, and I refer you to that, the paper pdf, and the GitHub repo if you are interested more in the P language. 


Popular posts from this blog

Foundational distributed systems papers

Your attitude determines your success

Progress beats perfect

Cores that don't count

Silent data corruptions at scale

Learning about distributed systems: where to start?

Read papers, Not too much, Mostly foundational ones

Sundial: Fault-tolerant Clock Synchronization for Datacenters


Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3 (SOSP21)