Refuse to Crash with Re-FUSE

This paper appeared in Eurosys'10. This is a well written paper: the paper holds your hand, and takes you for a walk in the park. At each step of the path, you can easily predict what is coming next. I like this kind of easy-reading papers, compared to the cryptic or ambiguous papers which make you wander around or try to guess which paths to take through a jungle of junctions.

The goal of this work is to provide support for restartable user-level filesystems. But, before I can tell you more about that, we first need to discuss user-filesystems. User-filesystems provides a way to add custom features (such as encryption, deduplication, access to databases, access to Amazon S3, etc.) on top of existing kernel-level filesystems. FUSE is a popular software that facilitates building user-filesystems on top of kernel-level filesystems. FUSE is available for Linux, FreeBSD, NetBSD, and MacOSX, and more than 200 userfilesystems have already been implemented using FUSE. GlusterFS, HDFS, ZFS are some of the well-known user-level filesystems implemented on top of FUSE.

FUSE works by wrapping the virtual filesystem (VFS) layer in UNIX systems at both sides. FUSE has a kernel file-system module (KFM) below the VFS layer that acts as a pseudo filesystem and queues application requests that arrive through the VFS layer. FUSE also has a libfuse module that exports a simplified filesystem interface between the user-level filesystem and the KFM.

Re-FUSE modifies FUSE to enable support for transparent restartability of the user-filesystem. The fault-model considered is transient fail-stop failures of the user-filesystem. Re-FUSE is based on three basic principles: request-tagging, system-call logging, and non-interruptible system calls. After a crash of the user-filesystem, Re-FUSE does not attempt to roll it back to a consistent state, but rather continues forward from the inconsistent state towards a new consistent state. Re-FUSE does so by enabling partially-completed requests to continue executing from where they were stopped at the time of crash.

Re-FUSE is tested with 3 popular representative user-filesystems implemented on top of FUSE. For testing robustness fault-injection (both controlled and random) is used; Re-FUSE enables the user-filesystem to mask failure and carry-on uninterrupted after a crash. Re-FUSE at most around 2-3% overhead in the normal operation, and recovers the filesystem in 10-100ms after a crash.


Popular posts from this blog

Graviton2 and Graviton3

Foundational distributed systems papers

Learning a technical subject

Learning about distributed systems: where to start?

Strict-serializability, but at what cost, for what purpose?

CockroachDB: The Resilient Geo-Distributed SQL Database

Amazon Aurora: Design Considerations + On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes

Warp: Lightweight Multi-Key Transactions for Key-Value Stores

Anna: A Key-Value Store For Any Scale

Your attitude determines your success