Onix: A Distributed Control Platform for Large-scale Production Networks

The Onix work (OSDI'10) builds on Nox. Essentially, Onix takes Nox and distributes over multiple servers.

Let me start with a brief refresher on Nox. (Or you can read my previous post on Nox) The main idea in Nox (and openflow) was to facilitate innovation by separating the control plane from the forwarding (data) plane. (In the current networking architecture, control and data planes are both implemented in the same place, the routers.) Nox introduced "software defined networking" (SDN): Nox uses a centralized controller to make the decisions (i.e., control plane); The routers implement only the data plane, and just follow directions from the controller while forwarding data. A drawback with Nox was that since it uses a single controller, it is prone to a single point of failure. Although the Nox work pointed out how this single controller can be distributed, it didn't pursue it further.

Enter Onix. Onix investigates how to distribute that single controller. In Onix the controller consists of a network information base (nib), and two other components: switch import/export and distribution import/export. All active network elements are stored in nib in key value pairs. Nib is a decentralized component, distributed over several Onix nodes. Once you change your local nib instance on one Onix node, those modifications are propogated to other nibs. The switch import/export component talks to switches to configure them according the instructions from the Onix node (it sort of acts as an interpreter). The distribution import/export component makes multiple nibs consistent with each other in an asynchronous manner.


Developers only see nib, exclusive locks on nib can only be attained on local instances. All the nib operations are asynchronous. No distributed locking mechanism is provided in the API. The limitation of this API is that it relies on application-specific logic to detect and provide conflict resolution of the network state.

For scalability, Onix let's you have hierarchies. Onix node C can coordinate Onix node A and B; In this case C's nib has two elements A and B. It is possible that A and B may be sharing some switches they are responsible with (those switches appears in nibs of both A and B).

As for reliability, Onix can detect link failures, then using the user written code, Onix fixes the link failures accordingly. Onix provides two data stores: a transactional data store (for durability of the local storage) and a one-hop DHT simple consistent hashing (for holding volatile network state in a fast manner).

The paper discusses 4 applications being built with Onix: a network management application, distributed virtual switch application, multi-tenant virtualized data centers (VPN implementation), and scale-out carrier-grade IP router. The paper also includes evaluation experiments.

The Onix paper ends on a cautionary note: "What we should make clear, however, is that Onix does not, by itself, solve all the problems of network management. The designers of management applications still have to understand the scalability implications of their design. Onix provides general tools for managing state, but it does not magically make problems of scale and consistency disappear. We are still learning how to build control logic on the Onix API, but in the examples we have encountered so far management applications are far easier to build with Onix than without it."

There are a lot of smart people pushing for creating a very programmable network. Hellerstein's work on declarative network programming is a nice complement/addition to Nox, Onix. I certainly appreciate that Nox and openflow makes evaluation and development of new protocols possible in the network. That is certainly very useful for network researchers. Even at the enterprise level, some programmability over the network would be welcomed, I guess. (Right now, the only control we have over networks is via the BGP level rules, and there is no control at the LAN level.) But, the problem is, it is very hard to find/hire people with expertise to program the enterprise network in the very fine grain programming environment these groups are providing. Maybe it is better to provide a network that just works as plug and play (like in the VL2 approach), rather than trying to provide complete control over every aspect of the network. It is unclear which approach is better.

Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book