Eidetic systems

This paper appeared in OSDI'14. The authors are all from University of Michigan: David Devecsery, Michael Chow, Xianzheng Dou, Jason Flinn, and Peter M. Chen.

This paper presents a transformative systems work, in that it introduces a practical eidetic system implementation on a Linux computer/workstation. This paper is a tour de force: It undertakes a huge implementation effort to implement a very useful and novel eidetic memory/system service. The authors should be commended for their audaciousness.

An eidetic computer system can recall any past state that existed on that computer, including all versions of all files, the memory and register state of processes, interprocess communication, and network input. An eidetic computer system can explain the lineage of each byte of current and past state. (This is related to the concept of data provenance, which I mention briefly at the end of my review.)

Motivation

One use case for an eidetic system is to track where/how erroneous information entered to the system. The paper considers tracking down a faulty bibtex reference as a case study. This is done using a backwards query. After tracking down the faulty bibtex reference you can then perform a forward query on the eidetic system, in order to figure out which documents are contaminated with this faulty information and to fix them.

Another use case for an eidetic system is to do postmortem of a hack attack and whether it leaked any important information. In the evaluation section, the paper uses as another case study the heartbleed attack, which occurred during time the authors were testing/evaluating their eidetic system implementation.

With a good GUI for querying, the eidetic system concept can enhance the Mac OSX Time Machine significantly, with data lineage/provenance, backward querying, and forward querying/correction. This can augment time travel with analytics, and you can have a time machine on steroids. (Accomplishment unlocked: +100 points for serious use of time machine and time travel in writing.)

Design and implementation

The authors develop the eidetic system, Arnold, by modifying Linux kernel to record all nondeterministic data that enters a process: the order, return values, and memory addresses modified by a system call, the timing and values of received signals, and the results of querying the system time. Arnold and accompanying eidetic system tools (for replay, etc.) are available as opensource.
The key technologies that enable Arnold to provide the properties of an eidetic system efficiently are deterministic record and replay, model-based compression, deduplicated file recording, operating system tracking of information flow between processes, and retrospective binary analysis of information flow within processes.
Arnold uses deterministic record and replay, and trades storage for recomputation whenever possible. That is, Arnold only saves nondeterministic choices or new input and can reproduce everything else by recomputation. The major space saving technique Arnold uses is model based compression: Arnold constructs a model for predictable operations and records only instances in which the returned data differs from the model. Another optimization is copy on RAW (read-after-write) recording: "To deduplicate the read file data, Arnold saves a version of a file only on the first read after the file is written. Subsequent reads log only a reference to the saved version, along with the read offset and return code." These techniques enable Arnold to fit 4 years of desktop/workstation eidetic system into 4TB of off-the-shelf hard disk (which costs $150).


Querying and Replaying

Arnold uses the replay groups abstraction to perform storing and replaying efficiently. Replay groups consist of frequently communicating processes which can be replayed independently of any other group. Arnold employs "Pin" binary instrumentation to analyze replayed executions and track the lineage of data within a replay group. Inter process communication is tracked with the help of a dependency graph which keeps track of the communications between different replay groups. Bundling frequently communicating processes into a group ensures that a large number of conversations need not be recorded to the dependency graph. As such selection of replay group (and replay group size) gives rise to a tradeoff between storage efficiency and query efficiency. It would be nice if the paper provided the replay groups it used in Arnold as a table. This information would be useful to understand the replay groups concept better.

Arnold records even user propagated lineage, such as a user reading a webpage and entering text into an editor as a result. (Of course this leads to introducing some false positives, as it needs to be done speculatively.) Tracking this actually required a lot of work: "Understanding GUI output turned out to be tricky, however, because most programs we looked at did not send text to the X server, but instead sent binary glyphs generated by translating the output characters into a particular font. Arnold identifies these glyphs as they are passed to standard X and graphical library functions. It traces the lineage backward from these glyphs using one of the above linkages (e.g., the index linkage)."

Finally, for the querying of Arnold, the paper has this to say. "A backward query proceeds in a tree-like search, fanning out from one or more target states. The search continues until it is stopped by the user or all state has been traced back to external system inputs. As the search fans out, Arnold replays multiple replay groups in parallel. In addition, if no lineage is specified, it may test multiple linkages for the same group in parallel, terminating less restrictive searches if a more restrictive search finds a linkage."

Unfortunately, user-friendly GUI-based tools for querying is not available yet. That would be asking too much from this paper which already packed a lot of contributions into a single publication. The evaluation section gives some results about backward and forward querying performance in Arnold.

Related work on data provenance

Data provenance is a topic which has been studied as part of the database field traditionally. However, recent work on data provenance started considering the problem of capturing provenance for applications performing arbitrary computations (not resricted to a small set of valid transformations in database systems). The paper "A primer on provenance" provides a nice accessible survey of data provenance work.

Future work

This paper presents an eidetic system on a single computer. An obvious future direction is to enable building an eidetic distributed system. By leveraging Arnold, such a system also seems to be in reach now. Our work on hybrid logical clocks can also help here by relating and efficiently tracking causality across distributed nodes running Arnold. Since our hybrid logical clocks can work with loosely synchronized time (a la NTP), and is resilient to uncertainty (it enables efficient tracking of causality without blocking for synchronization uncertainties), it can be adopted for implementing a distributed eidetic system in practice.

A remaining kink for a distributed eidetic system could be the cost of querying. Querying and replay is already slow and hard for a single eidetic system, and it is likely to become more complicated for a distributed system since coordination of replay is needed across the machines involved in the replay.

Comments

dmbarbour said…
Eidetic systems sound very powerful for post-mortem. But I suspect it will not hold so well under different application loads, e.g. heavy with video games or streaming video.

Rather than Eidetic memory, simply having a really good memory would be sufficient for a lot of post-mortems. E.g. you could introduce an exponential decay model to preserve a logarithmic history.

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book