Thursday, November 29, 2012

From Clarity to Efficiency for Distributed Algorithms

Y. A. Liu, S. D. Stoller, B. Lin, M. Gorbovitski, Appeared in OOPSLA'12.

This paper describes a high-level language, DistAlgo, for clear description of distributed algorithms (e.g., Paxos, mutual execution). In contrast to embarrassingly parallel tasks which involve very regular and simple synchronization patterns (e.g., MapReduce tasks), distributed algorithms generally involve complex and tight synchronization patterns. DistAlgo specifies these complex synchronization conditions via using logical quantifications over message histories of processes involved in the algorithm. But, unfortunately, this turns out to be extremely inefficient if executed straightforwardly: each quantifier will cause a linear factor in running time, and any use of the history of messages sent and received will cause space usage to be unbounded.

To address this inefficiency, this paper focuses on optimizing implementations of DistAlgo programs by incrementalization of these expensive synchronization conditions. To this end, their method first transforms these quantifications are into set queries, and then transforms further to minimize their nesting. Finally, from here, the method systematically extracts single quantified order comparisons, and transforms them into efficient incremental updates of auxiliary variables as messages are sent and received. (An earlier paper that focuses more on the syntax of DistAlgo, and uses Paxos as a running example, appeared at SSS'12.).

To give a glance of DistAlgo and how it does this transformations, I am including some figures from the paper as is. The paper uses Lamport's mutual exclusion algorithm as a running example and shows its DistAlgo and automatically optimized DistAlgo implementation. DistAlgo fills a gap and seems like it can be very useful in the coming years for writing distributed programs. Distributed algorithms can be expressed in DistAlgo clearly at a high level, like in pseudocode, but also precisely, like in formal specification languages, and be executed as part of real applications, as in programming languages. DistAlgo extends Python with processes as objects, send/receive of messages, yield points, synchronization conditions as logical quantifications, and configuration of distributed system topology. DistAlgo compiler and optimizer are implemented in Python (building on Python's parser module and AST package) and generates executable Python code.

1 comment:

Hugh said...

Does this help clarify use enabled features as well such as display data such as energy usage, energy output, data rack storage, etc.? We develop specialized IT data center asset management software for enterprises and would love to see how this would be integrated for us.