Paper review. IPFS: Content addressed, versioned, P2P file system

This week we discussed the IPFS whitepaper by Juan Benet in my Distributed Systems Seminar.

Remember peer-to-peer systems? IPFS is "peer-to-peer systems reloaded" with improved features. IPFS is a content-addressed distributed file system that combines Kademlia + BitTorrent + Git ideas. IPFS also offers better privacy/security features: it provides cryptographic hash content addressing, file integrity and versioning, and filesystem-level encryption and signing support.

The question is will it stick? I think it won't stick, but this work will still be very useful because we will transfer the best bits of IPFS to our datacenter computing as we did with other peer-to-peer systems technology. The reason I think it won't stick has nothing to do with the IPFS development/technology, but has everything to do with the advantages of centralized coordination and the problems surrounding decentralization. I rant more about this later in this post. Read on for the more detailed review on IPFS components, killer app for IPFS, and MAD questions.

IPFS components

Identities

Nodes are identified by a NodeId, the cryptographic hash3 of a public-key, created with S/Kademlia’s static crypto puzzle. Nodes store their public and private keys (encrypted with a passphrase).

Network

Transport: IPFS can use any transport protocol, and is best suited for WebRTC DataChannels(for browser connectivity) or uTP.
Reliability: IPFS can provide reliability if underlying networks do not provide it, using uTP or SCTP.
Connectivity: IPFS also uses the ICE NAT traversal techniques.
Integrity: IPFS optionally checks integrity of messages using a hash checksum.
Authenticity: IPFS optionally checks authenticity of messages using HMAC with sender’s public key.

Routing

To find other peers and objects, IPFS uses a DSHT based on S/Kademlia and Coral. Coral DSHT improves over by Kademlia based on the three rules of real-estate: location, location, location. Coral stores addresses to peers who can provide the data blocks taking advantage of data locality. Coral can distribute only subsets of the values to the nearest nodes avoiding hot-spots. Coral organizes a hierarchy of separate DSHTs called clusters depending on region and size. This enables nodes to query peers in their region first, "finding nearby data without querying distant nodes" and greatly reducing the latency of lookups.

Exchange

In IPFS, data distribution happens by exchanging blocks with peers using a BitTorrent inspired protocol: BitSwap. Unlike BitTorrent, BitSwap is not limited to the blocks in one torrent. The blocks can come from completely unrelated files in the filesystem. In a sense, nodes come together to barter in the marketplace. BitSwap incentivizes nodes to seed/serve blocks even when they do not need anything in particular. To avoid leeches (freeloading nodes that never share), peers track their balance (in bytes verified) with other nodes, and peers send blocks to debtor peers according to a function that falls as debt increases. For bartering, potentially, a virtual currency like FileCoin (again by Juan Benet) can be used.

Objects

IPFS builds a Merkle DAG, a directed acyclic graph where links between objects are cryptographic hashes of the targets embedded in the sources. (This video explains Merkle Trees superbly.) Merkle DAGs provide IPFS many useful properties:
1. Content addressing: All content is uniquely identified by its multihash checksum.
2. Tamper resistance: all content is verified with its checksum.
3. Deduplication: all objects that hold the exact same content are equal, and only stored once.

Files

IPFS also defines a set of objects for modeling a versioned filesystem on top of the Merkle DAG. This object model is similar to Git’s:
1. block: a variable-size block of data.
2. list: an ordered collection of blocks or other lists.
3. tree: a collection of blocks, lists, or other trees.
4. commit: a snapshot in the version history of a tree.

Naming

IPNS is the DNS for IPFS. We have seen that NodeId is obtained by hash(node.PubKey). Then IPNS assigns every user a mutable namespace at: /ipns/<NodeId>. A user can publish an Object to this /ipns/<NodeId> path signed by her private key. When other users retrieve the object, they can check the signature matches the public key and NodeId. This verifies the authenticity of the Object published by the user, achieving mutable state retrieval.

Unfortunately since <NodeId> is a hash, it is not human friendly to pronounce and recall. For this DNS TXT IPNS Records are employed. If /ipns/<domain> is a valid domain name, IPFS looks up key ipns in its DNS TXT records:
ipfs.benet.ai. TXT "ipfs=XLF2ipQ4jD3U ..." 
# the above DNS TXT record behaves as symlink
ln -s /ipns/XLF2ipQ4jD3U /ipns/fs.benet.ai

There is even the Beaker browser to help you surf IPFS. But its usability is not great.  If IPFS wants to manage the web, it should further improve its IPNS and content discovery game. Where is the search engine for IPFS content? Do we need to rely on links from friends like the 1993's Web?

What is the killer app for IPFS?

The introduction of the paper discusses HTTP and Web, and then says:
"Industry has gotten away with using HTTP this long because moving small files around is relatively cheap, even for small organizations with lots of traffic. But we are entering a new era of data distribution with new challenges: (a) hosting and distributing petabyte datasets, (b) computing on large data across organizations, (c) high-volume high-definition on-demand or real-time media streams, (d) versioning and linking of massive datasets, (e) preventing accidental disappearance of important files, and more. Many of these can be boiled down to "lots of data, accessible everywhere." Pressed by critical features and bandwidth concerns, we have already given up HTTP for different data distribution protocols. The next step is making them part of the Web itself. 
What remains to be explored is how [Merkle DAG] data structure can influence the design of high-throughput oriented file systems, and how it might upgrade the Web itself. This paper introduces IPFS, a novel peer-to-peer version-controlled filesystem seeking to reconcile these issues."
How common are petabyte or even gigabyte files on the Internet? There is definitely an increase in size trend due to the popularity of the multimedia files. But when will this become a pressing issue? It is not a pressing issue right now because CDNs help a lot for reducing traffic for the Internet. Also bandwidth is relatively easy to add compared to latency improvements. Going for a decentralized model globally comes with several issues/headaches, and I don't know how bad the bandwidth problems would need to get before starting to consider that option. And it is not even clear that the peer-to-peer model would provide more bandwidth savings than CDNs at the edge model.

I am not convinced that the Web is the killer application for IPFS, although at the end, the paper gets ambitious:
"IPFS is an ambitious vision of new decentralized Internet infrastructure, upon which many different kinds of applications can be built. At the bare minimum, it can be used as a global, mounted, versioned filesystem and namespace, or as the next generation file sharing system. At its best, it could push the web to new horizons, where publishing valuable information does not impose hosting it on the publisher but upon those interested, where users can trust the content they receive without trusting the peers they receive it from, and where old but important files do not go missing. IPFS looks forward to bringing us toward the Permanent Web."
Decentralization opens a Pandora's box of issues. Centralized is efficient and effective. Coordination wants to be centralized. A common and overhyped misconception is not centralized is not scalable and centralized is a single point of failure. After close to two decades of work in cluster computing and cloud computing, we have good techniques in place for achieving scalability and fault-tolerance for centralized (or logically centralized, if you like) systems. For scalability, shard it, georeplicate it, and provide CDNs for reading. For fault-tolerance, slap Paxos on it, or use chain replication systems (where Paxos guards the chain configuration), or use the globe-spanning distributed datastores available today. Case in point, Dropbox is logically-centralized but is very highly available and fault-tolerant, while serving to millions of users. Facebook is able to serve billions of users.

If you want to make the natural disaster tolerance argument to motivate the use of IPFS, good luck trying to use IPFS over landlines when power and ISPs are down, and good luck trying to form a multihop wireless ad hoc network over laptops using IPFS. Our only hope in a big natural disaster is cell towers and satellite communication. Disaster tolerance is serious work and I hope governments around the world are funding sufficient research into operational, planning, and communications aspects of that.

In Section 3.8, the whitepaper talks about the use cases for IPFS:
1. As a mounted global filesystem, under /ipfs and /ipns.
2. As a mounted personal sync folder that automatically versions, publishes, and backs up any writes.
3. As an encrypted file or data sharing system.
4. As a versioned package manager for all software.
5. As the root filesystem of a Virtual Machine.
6. As the boot filesystem of a VM (under a hypervisor).
7. As a database: applications can write directly to the Merkle DAG data model and get all the versioning, caching, and distribution IPFS provides.
8. As a linked (and encrypted) communications platform.
9. As an integrity checked CDN for large files (without SSL).
10. As an encrypted CDN.
11. On webpages, as a web CDN.
12. As a new Permanent Web where links do not die.
I don't think any of these warrant going full peer-to-peer. There are centralized solutions for them, or centralized solutions are possible for them.

An important use case for IPFS is to circumvent government censorship. But isn't it easier to use VPNs then to use IPFS for this purpose. (Opera browser comes with VPN build-in, and many easy to use VPN apps are available.) If the argument is that the governments can ban VPNs or prosecute people using VPN software, those issues also apply to IPFS unfortunately. Technology is not always the solution especially when dealing with big social issues.

IPFS may be a way of sticking it to the man. But the invisible hand of the free market forces also help here; when one big corporation starts playing foul and upsets the users, new companies and startups quickly move in to disrupt the space and fill in the void.

Again, I don't want to come across wrong. I think IPFS is great work, and Juan Benet and IPFS contributors accomplished a gigantic task, with a lot of impact on future systems (I believe the good parts of IPFS will be "adopted" to improve Web and datacenter/cloud computing). I just don't believe dialing the crank to 11 on decentralization is the right strategy for wide adoption. I don't see the killer application that makes it worthwhile to move away from the convenience of the more centralized model to open the Pandora's box with a fully-decentralized model.

MAD questions 

1) Today's networking ecosystem evolved for the client-server model, what kind of problems could this create for switching to peer-to-peer model? As a basic example, the uplink at residential (or even commercial) spaces is an order of magnitude less than downlink assuming they are consumers of traffic not originators of traffic. Secondly, ISPs (for good or bad) evolved to take on traffic shaping/engineering responsibilities peering with other ISPs. It is a complex system. How does popular IPFS use interact with that ecosystem.

2) As a related point, smartphones gained primary citizenship status in today's Internet. How well can peer-to-peer and IPFS get along with smartphones? Smartphones are very suitable to be thin clients in the cloud computing model, but they are not suitable to act as peers in a peer-to-peer system (both for battery and connection bandwidth reasons). To use a technical term, the smartphones will be leeches in a peer-to-peer model. (Well unless there is good token/credit system in place, but it is unrealistic to expect that soon.)

3) On the academic side of things, designing a decentralized search engine for IPFS sounds like a great research problem. Google had it easy in the datacenter but can you design a decentralized keyword/content based search engine (or one day old indexes) maintained in a P2P manner over IPFS nodes? Popularity of a file in the system (how many copies it has in the system) can play a role in its relevance ranking for the keyword. Also could a bloom filter like data structure be useful in a p2p search?

4) Here are some more pesky problems with decentralization. I am not clear if satisfactory answers exist on these. Does IPFS mean I may be storing some illegal content originated by other users?
How does IPFS deal with the volatility? Just closing laptops at night may cause unavailability under an unfortunate sequence of events. What is the appropriate number of replicas for a data to avoid this fate? Would we have to over-replicate to be conservative and provide availability?
If IPFS is commonly deployed, how do we charge big content providers that benefit from their content going viral over the network? Every peer chips in distributing that content, but the content generator benefits let's say by way of sales. Would there need to be a token economy that is all seeing and all fair to solve this issue?

5) Is it possible to use Reed-Solomon erasure coding with IPFS? Reed-Solomon codes are very popular in the datacenters as they provide great savings for replication.

6) IPFS does not tolerate Byzantine behavior, right? The crypto puzzle needed for Node Id can help reduce the false spammers, as it makes them do some work. But after joining, there is no guarantee that the peers will play it fair: they can be Byzantine to wreak havoc on the system. But how much problems can they cause? Using cryptos and signatures prevent many problems. But can the Byzantine nodes somehow collude to cause data loss in the system, making the originator think the data is replicated, but then deleting this data? What other things can go wrong?

Comments

Unknown said…
This comment has been removed by a blog administrator.
Rodrigo said…
Great writeup Murat. One thing I haven't seen discussed much is the intersection of IPFS with the original Oceanstore design. I haven't looked deeply enough, and maybe the differences are obvious. Do you know?
troymc said…
I wonder why IPFS uses the hash of the PubKey, rather than the PubKey itself, as the NodeId.
Tristan said…
Since the PubKey is a pair of numbers, you'd need some way to convert that into a single ID. Hashing is probably just as easy as serializing the public key and it lets you specify the number of bits. The public key length is probably much too long for practical usage.
Anonymous said…
Just speculating (haven't read the paper), but using a hash of the PubKey for the NodeId would allow them to work with multiple key sizes by keeping the NodeId constant.
Anonymous said…
The author has answered some of your concerns regarding this post, which was posted on HackerNews. I am just posting it here for yours and your readers’ convenience. You can access it at https://news.ycombinator.com/item?id=16432502
@Rodrigo
IPFS and Oceanstore have some design commonalities, such as nodes having their IDs generated from the pubkey (see SFS paper about why this is done) or the fact that hashing is used to address its contents. The way data is modelled in IPFS is much more similar to git, even though versioning is still not working very well (there are some open issues in their github repo about this).
Oceanstore data model is a bit simpler and does not allow branching, so versions grow linearly. This is achieved by using a Byzantine agreement protocol among the primary replica nodes, which agree on what update to accept and then propagate the changes to the other nodes in the network.

Oceanstore has also a multi-layer architecture --> clients, primary replica, secondary-replica nodes, archival nodes; while IPFS treats all nodes as the same as in a regular P2P structured overlay network (based on S/Kademlia).

By the way, you might also be interested in other systems such as DAT, Perkeep, Tahoe-LAFS, or MaidSAFE.

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Designing Data Intensive Applications (DDIA) Book