Armada: Low-Effort Verification of High-Performance Concurrent Programs

- October 22, 2020

This paper appeared in PLDI 2020 and is authored by Jacob R. Lorch, Yixuan Chen, Manos Kapritsos, Bryan Parno, Shaz Qadeer, Upamanyu Sharma, James R. Wilcox, and Xueyuan Zhao.

The paper presents the Armada framework for code-level verification of concurrent (multithreaded rather than distributed) programs. Armada is roughly a stepwise-refinement extended version of Dafny. So that is where I will start explaining.

Dafny

Dafny is an open source iterative compiled language for writing verified code. The code is verified with respect to the specifications the developer writes by adding high-level annotations alongside the code, such as ensures P and invariant P. At https://rise4fun.com/Dafny/tutorial you will find an interactive tutorial, that will get you up and running proving programs with Dafny. It is really fun, you should try this out now!

Dafny leverages Z3 for discharging proof obligations. Ok, so what is Z3?

Z3 is a popular open source Satisfiability modulo theories (SMT) solver. SMT is a decision problem for logical formulas expressed in classical first-order logic with equality (propositional logic+ quantifiers + relations). You can think of Z3 as formalizing constraint programming. Again please give this interactive Z3 tutorial a try at https://rise4fun.com/z3/tutorial.

*check-sat* determines whether the current formulas on the Z3 stack are satisfiable or not.

If the formulas are satisfiable, Z3 returns *sat*
If they are unsatisfiable, Z3 returns *unsat*
Z3 may also return *unknown* when it can't determine whether a formula is satisfiable or not

If you like to use Z3 as part of a higher level language, Z3 bindings are available for Python, C, C++, Java, Rust, etc.

Armada

Armada leverages on Dafny and provides an open source framework for developing verified concurrent systems code via stepwise refinement. It provides 8 verified libraries for performing refinement-based proofs, including region-based pointer reasoning, rely-guarantee, TSO elimination, and reduction.

Armada's goal is to prove that the implementation refines the specification (i.e., all finite behaviors of the implementation simulate the specification). To bridge the semantic gap between the implementation and the specification, the developer writes a series of N Armada programs.

each pair of adjacent levels should be similar enough to facilitate automatic generation of a refinement proof that respects R
the developer supplies a short proof recipe that gives Armada enough information to automatically generate such a proof
Armada uses Dafny to verify all proof material generated

Here is an example of Armada specification and implementation of Traveling Salesman Problem.

And here is an example of an intermediate stepwise refinement for TSP.

Armada considers 8 strategies for the refinement framework: Reduction, Rely-guarantee reasoning, TSO elimination, Weakening, Nondeterministic weakening, Combining, Ghost variable introduction, and Variable hiding.

The implementation of Armada consists of

a state-machine translator to translate Armada programs to state-machine descriptions (built over Dafny's parser and type inference engine)
a framework for proof generation (built over Dafny's abstract syntax tree AST code)
a library of lemmas useful for invocation by proofs of refinement (6KLOC Dafny)
and an extension of Dafny to translate Armada AST into C compatible with CompCertTSO

The evaluation is performed on four examples: Barrier, Pointers, MCSLock, Lock-free queue from liblfds library. The graph shows that after Armada verification the CompCertTSO verified compilation of the code is as efficient as that of compiling with GCC an unverified compiler, which is as efficient as a hand-written version of liblfds implementation (using modulo rather than bitmask).

Discussion

Armada is a solid step in the right direction, but it still has several drawbacks. The framework requires a *competent* human in the loop for performing the verification. The human should decide how to divide the refinement steps and work with the framework to show the refinement proofs for each step. My impression is that Armada is still for small programs. Also Armada can only prove safety properties but not liveness, and it cannot verify hyperproperties due to the first order logic requirement imposed by Z3.

It may be cumbersome trying to feed the proof recipes/hints to the proof system without feedback about why this didn't work. Without feedback from the proof system, how long would a developer be able to continue the process? This is another factor that may make adoption of this difficult by non-expert users.

On the other hand, Armada provides a good option for verifying critical multithreaded programs. Just by using the same language/framework, you can generate provably verified code. It is nice that everything is under the same roof. The trusted code base in this framework is the Armada compiler, Dafny, Z3, Boogie.