Boxwood: Abstractions as the foundation for storage infrastructure

This paper is by Microsoft Research, and appeared in OSDI'04. This review will mostly be a stream of conciousness, because I have not yet understood all of the paper and cannot put it in context as much as I would like to.

While reading in to the Boxwood paper, I started to notice how similar this is getting to the GFS problem and GFS approach. Boxwood appeared at OSDI'04, and GFS appeared at SOSP'03. Boxwood refers to GFS but does not compare or contrast itself with GFS. Maybe the reason is in 2004 the Boxwood authors could not see the similarities. This could be because, as I mentioned in my GFS review, the GFS paper did not talk about the Paxos replication of the master chunk-manager in the 2003 paper; that came a couple years later in the Chubby and Paxos-made-live papers. When citing GFS, the Boxwood authors only state that GFS "will be layered over the facilities of Boxwood". But, that is impractical as it would be duplicating a lot of the services; GFS also has chunk manager, failure detector, lock service, replication, Paxos etc.

Let me give a brief overview of Boxwood design, so that I can compare Boxwood with GFS further. Boxwood is for LANs. It assumes Gigabit Ethernet, and uses synchronous replication of data on two discs (I guess the master and shadow chunk managers).


The Boxwood design section in the paper starts with a description of the Paxos service. Then, it continues with the failure detector service, which is not really very interesting since it is a well-studied and mature topic. Then it describes the distributed lock service. Curiously this service is implemented as a master shadow replication, and Paxos is employed only to keep the id of the master. This is curious because, when implementing the same service, the GFS-Chubby approach was to replicate the master via Paxos to four other replicas, which yields a much simpler design and a robust (masking fault-tolerance to two node failures) system.

Then comes the RLDev (replicated logical device) component. The paper writes: "The list of RLDevs, the segments belonging to them, the identity of machines that host the primary and the secondary segments, and the disks are all part of the global state maintained in Paxos." This state amounts to pretty much what the master chunk-manager should hold. Then the question is again, why not maintain this state simply by replicating the master chunk-manager via Paxos as GFS did.

Next comes the chunk manager. As Figure 3 shows, the chunk manager is replicated with a shadow node. "The chunk manager pair relies on a shared RLDev and RPCs to keep the mapping information consistent." Again, why not do this simply via Paxos replication of the chunk manager as GFS did. What is (if any) the advantage to this approach?


Next comes the transaction and logging service but it is extremely scarce in details. The authors implemented Boxwood, as well as BoxFS, a filesystem using Boxwood B-trees, exported using the NFS v2 protocol. Both are implemented as user level processes. Evaluation results for BoxFS are given, but I am not sure how to interpret those results as BoxFS makes simplifying assumptions (such as 30 second data cache flush) over NFS to achieve acceptable performance.

So, wrapping up, I guess there may be several things I don't understand in this paper. The presentation is unfortunately not clear and simple for me. Instead of telling us what its most significant contribution/lesson is, the paper tells us about everything in the system. The introduction emphasizes one thing (distributed data structure abstractions provided), yet the internal sections of the paper emphasize another (Paxos and lock service, and the chunk manager on top of them). Maybe I should use this paper as a discussion paper alongside GFS in my seminar in Spring'11; as a group we can have a better understanding and comparison of both systems.

Comments

Unknown said…
This comment has been removed by a blog administrator.

Popular posts from this blog

The end of a myth: Distributed transactions can scale

Hints for Distributed Systems Design

Foundational distributed systems papers

Learning about distributed systems: where to start?

Metastable failures in the wild

Scalable OLTP in the Cloud: What’s the BIG DEAL?

SIGMOD panel: Future of Database System Architectures

The demise of coding is greatly exaggerated

Dude, where's my Emacs?

There is plenty of room at the bottom