My seminar on cloud computing systems has started today with two papers. Here is the summary of the first paper we covered in the seminar.
Designing a computer system is very different from designing an algorithm: The external interface (that is, the requirement) is less precisely defined, more complex, and more subject to change. The system has much more internal structure, and hence many internal interfaces. The measure of success is much less clear. In this 1983 paper, Butler Lampson gives hints for computer system design based on his experience on building several systems in Xerox PARC labs. It is amazing how relevant and how fresh these hints are after 30 years of their publication. While these hints were not specifically targeting distributed systems design, several of these hints are applicable (and widely used) for cloud computing systems.
Most of the text below is verbatim copied from the paper. I omitted about half of the hints, and details/justifications about the hints. The actual paper is 27 pages long, and is well worth a read. A related paper is by Jim Waldo, called "On System Design". Another related paper, targeting cloud computing system design is by James Hamilton, called "On designing and deploying Internet scale services". We will cover that paper at the end of the semester in our seminar.
Hints pertaining to Functionality
"An interface separates an implementation of some abstraction from the clients who use the abstraction." I think most people are familiar with this definition. But, I guess, the next observation may come as a revelation to many (except those that have studied formal methods and program verification). "The interface between two programs consists of the set of assumptions that each programmer needs to make about the other program in order to demonstrate the correctness of his program." The reason this definition of interface is lesser-known could be because this aspect of interfaces do not show up frequently on traditional APIs. But this type of rely-guarantee interfaces is very relevant and important for correct system design. A component relies on some assumptions to be able to provide its guarantees.
Defining interfaces is the most important part of system design. Usually it is also the most difficult, since the interface design must satisfy three conflicting requirements: an interface should be simple, it should be complete, and it should admit a sufficiently small and fast implementation.
Keep it simple: Do one thing at a time, and do it well. An interface should capture the minimum essentials of an abstraction.
Make it fast, rather than general or powerful, and leave it to the client: The Unix system encourages the building of small programs that take one or more character streams as input, produce one or more streams as output, and do one operation. When this style is imitated properly, each program has a simple interface and does one thing well, leaving the client to combine a set of such programs with its own code and achieve precisely the effect desired.
Handle normal and worst cases separately: The requirements for the two are quite different; The normal case must be fast. The worst case must make some progress. Caches and hints are examples of special treatment for the normal case, but there are many others.
Hints pertaining to Speed
Split resources in a fixed way if in doubt, rather than sharing them. Use static analysis if you can. Cache answers to expensive computations, rather than doing them over. Safety first, optimize later.
Use hints to speed up normal execution: A hint, like a cache entry, is the saved result of some computation. It is different in two ways: it may be wrong, and it is not necessarily reached by an associative lookup. Because a hint may be wrong, there must be a way to check its correctness before taking any unrecoverable action. It is checked against the 'truth', information that must be correct but can be optimized for this purpose rather than for efficient execution. Like a cache entry, the purpose of a hint is to make the system run faster. Usually this means that it must be correct nearly all the time. Hints are very relevant for speculative execution, and you can find several examples of hints and speculative execution in computer systems. One example of a hint is in the Ethernet, in which lack of a carrier signal on the cable is used as a hint that a packet can be sent. If two senders take the hint simultaneously, there is a collision that both can detect; both stop, delay for a randomly chosen interval, and then try again.
Shed load to control demand, rather than allowing the system to become overloaded: There are many ways to shed load. An interactive system can refuse new users, or even deny service to existing ones.
Hints pertaining to Fault-tolerance
End-to-end: Error recovery at the application level is absolutely necessary for a reliable system, and any other error detection or recovery is not logically necessary but is strictly for performance. (Caveats: There are two problems with the end-to-end strategy. First, it requires a cheap test for success. Second, it can lead to working systems with severe performance defects that may not appear until the system becomes operational and is placed under heavy load.)
Log updates to record the truth about the state of an object: A log is a very simple data structure that can be reliably written and read, and cheaply forced out onto disk or other stable storage that can survive a crash. Because it is append-only, the amount of writing is minimized. To use the technique, record every update to an object as a log entry consisting of the name of the update procedure (a.k.a. operation) and its arguments. The operation specified by the log entry can be re-executed later, and if the object being updated is in the same state as when the update was first done, it will end up in the same state as after the update was first done. By induction, this means that a sequence of log entries can be re-executed, starting with the same objects, and produce the same objects that were produced in the original execution.
Make actions atomic or restartable: An atomic action (often called a transaction) is one that either completes or has no effect. The advantages of atomic actions for fault-tolerance are obvious: if a failure occurs during the action it has no effect, so that in recovering from a failure it is not necessary to deal with any of the intermediate states of the action.
Butler Lampson (Turing award winner in 1992) is a big proponent of using formal methods in systems design. His ideas there are heavily influenced by Nancy Lynch's work on I/O automata and simulation relations. The basic idea is to model the system at a high level using an I/O automaton and prove safety and progress properties on this model. Then the system is refined gradually by writing more concrete I/O automata and proving that the concrete I/O automata "refine" (a.k.a. implement) the behavior of the abstract I/O automaton using forward and backward simulation techniques. (Refinement means that all the collective behavior of the concrete automata is a behavior of the abstract automaton.) See Lampson's course notes for details.
A related way of looking at system design is by Leslie Lamport. In Lamport's approach, refinement is not proved by simulation relations of I/O automata, rather by mathematical precondition/postcondition reasoning on state machines. Lamport has designed the TLA framework to help in this process. Another highlight of Lamport's approach is its emphasis on invariant-based reasoning/design for taming/overcoming the concurrency problems in distributed systems. Invariant-based design provides a non-operational way to reason about concurrent systems and avoids the complexities and bugs of operational reasoning (a.k.a. handwaving) for concurrent systems. For invariant-based reasoning, it is enough to consider each operation/procedure once (instead of several times in a trace for operational reasoning) and prove that the procedure preserves the invariant. After proving each procedure preserves the invariant, we are guaranteed by induction that regardless of execution sequence of the procedures the invariant continues to hold in the system. So the safety conditions in the invariant are satisfied throughout the execution, and progress conditions can be proved assuming/using this invariant as a starting point and showing a variant function. For details see my Spring'10 distributed systems course notes.
Both approaches are instances of the stepwise refinement idea of Dijkstra (Turing award winner in 1972). The problem in both approaches are, as we move to more concrete models things get very complicated and get unavoidably operational due to the large amount of state, libraries, and context introduced. As a result, these approaches cannot scale without some compiler support. Yet, not all is in vain. The discipline of proving at the design level prevents major design errors. During implementation some implementation level errors could be introduced, however, those are not major errors and easier to fix compared to design errors. I hope to write a couple posts about invariant-based design and stepwise refinement later.