Refuse to Crash with Re-FUSE

This paper appeared in Eurosys'10. This is a well written paper: the paper holds your hand, and takes you for a walk in the park. At each step of the path, you can easily predict what is coming next. I like this kind of easy-reading papers, compared to the cryptic or ambiguous papers which make you wander around or try to guess which paths to take through a jungle of junctions.

The goal of this work is to provide support for restartable user-level filesystems. But, before I can tell you more about that, we first need to discuss user-filesystems. User-filesystems provides a way to add custom features (such as encryption, deduplication, access to databases, access to Amazon S3, etc.) on top of existing kernel-level filesystems. FUSE is a popular software that facilitates building user-filesystems on top of kernel-level filesystems. FUSE is available for Linux, FreeBSD, NetBSD, and MacOSX, and more than 200 userfilesystems have already been implemented using FUSE. GlusterFS, HDFS, ZFS are some of the well-known user-level filesystems implemented on top of FUSE.

FUSE works by wrapping the virtual filesystem (VFS) layer in UNIX systems at both sides. FUSE has a kernel file-system module (KFM) below the VFS layer that acts as a pseudo filesystem and queues application requests that arrive through the VFS layer. FUSE also has a libfuse module that exports a simplified filesystem interface between the user-level filesystem and the KFM.

Re-FUSE modifies FUSE to enable support for transparent restartability of the user-filesystem. The fault-model considered is transient fail-stop failures of the user-filesystem. Re-FUSE is based on three basic principles: request-tagging, system-call logging, and non-interruptible system calls. After a crash of the user-filesystem, Re-FUSE does not attempt to roll it back to a consistent state, but rather continues forward from the inconsistent state towards a new consistent state. Re-FUSE does so by enabling partially-completed requests to continue executing from where they were stopped at the time of crash.

Re-FUSE is tested with 3 popular representative user-filesystems implemented on top of FUSE. For testing robustness fault-injection (both controlled and random) is used; Re-FUSE enables the user-filesystem to mask failure and carry-on uninterrupted after a crash. Re-FUSE at most around 2-3% overhead in the normal operation, and recovers the filesystem in 10-100ms after a crash.

Comments

Popular posts from this blog

Hints for Distributed Systems Design

My Time at MIT

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Learning about distributed systems: where to start?

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Foundational distributed systems papers

What I'd do as a College Freshman in 2025

Distributed Transactions at Scale in Amazon DynamoDB