Hybrid clocks crate in Rust

I have been learning Rust in my spare time. It has a very steep learning curve, but in return you get to use an elegant, fast, and safe programming language you can use anywhere from embedded systems to datacenter computing. Rust stole like an artist the best concepts from several languages and combined them under one roof with style. Rust has seen a lot of traction recently, and I think Rust is here to stay for a long time. 

In the last couple months, I have been writing small programs to practice Rust, and recently I decided I should go bigger and consider reading a real code base, but something with manageable size. I chose to focus on the hybrid clocks crate because it was the right size, and it solved an important problem. 

The hybrid clocks crate showcases many of the best practices of Rust. It has a thoughtful design. It shows practical use of struct and trait implementations and trait objects. It was initially hard for me to wrap my mind around the design, because the design has a couple nested layers. After I understood it,  I decided to put my notes out here, so it could serve as documentation. 


The design 

The crate implements hybrid logical clocks and provides a strictly-monotonic clock that can be used to determine if one event happens-before another.

The way you use the crate is to first create a clock by calling a Clock implementation/subtype, and then by getting a timestamp by calling the now method on the clock. 

The now() method is provided after a level of indirection, dispensed by one of the underlying three clocksources.  I explain this in the trait objects heading below.


Timestamp

Let's start with the Timestamp struct that the now() method returns. This is defined in lib.rs, as that is where most of the action is. 

The epoch here is a counter like the Raft epoch. If a node likes to reset the clock, and continue, it can increase the epoch, and start a new era. The crate does not prescribe a way to use this. 


Clock 

Clock uses the timestamp struct via saving the last observed time as the last_observed field. The Clock also has an epoch, for incrementing the epoch, and timestamping with that. 

Finally the src field is the clocksource. Clock takes any component S that implements the ClockSource trait, and assigns it to src. There are three different implementations of ClockSource in the crate:

  • Manual Clock is what you set and update, it doesn't use a physical system clock.
  • WallNS is the system clock UNIX time in nanoseconds. 
  • WallMS is the same thing but for microseconds I think.


ClockSource Trait

The ClockSource Trait, in mod.rs, is the high level trait.

(The type Delta is used later in the crate for OffSetLimiter interface for setting the max_offset for limiting the window of observed timestamps to update the clock with. My first thought about it was for comparison, but then I realized that, just by deriving from Ord, the Timestamp already allows comparisons with <, =, >.)

The now method in this trait is routed to the underlying three clocksource implementations in lib.rs. 

The now method in lib.rs first calls the read_pt to read the time and then calls do_observe to update the clock. There is another method called observe that takes a message and calls do_observe the same way to get the clock updated. I think that is really clever and efficient code, combining two cases in to one.  

This gives monotonicity for a single clock source as well, because do_observe is used with respect to the read_pt of physical time. Even when physical time from read_pt would go back the result would not go back. This is made possible because we are keeping the last result in last_observed as part of the clock struct.


Do observe provides a very nice use of match. It combines cases cleverly, and updates the clock with minimal checks. Here is what we had in the paper to do that. And the next one is in TLA+.

I do have a nit pick about do_observe when adapting observation.clone() at line 142 though. In order to make the new hybrid clock greater than the observed, the code should increment the count by one. But, this is not a bug, because clock.observe doesn't return a value. Instead after do_observe, clock.now() needs to be called to get the value, and that ensures the new clock is strictly larger than the observed.


Trait objects

Let's dig down to the trait object use case. Let's look at the Clock<WallNS> and Clock<S: ClockSource> relation. The other two clocks Clock<WallMS> and Clock<ManualClock> have similar relation. 

Clock<WallNS> has a Clock::new(WallNS) to return a clock. The statement "let mut clock_a = Clock::wall_ns()?;" is piped to this in lib.rs:



The Clock<WallNS> implementation, calls the Clock struct implementation with WallNS as the src. So it is piped to the "impl<S: ClockSource> Clock<S>, ::new()" function,  which sets the last_observed in the Clock struct the first invocation of src.now(), calling back wall_ns.now to get the physical time. Similarly read_pt method also calls the now method which is dispensed by the underlying correct clock source.

By the way, WallNS an empty struct type "pub struct WallNS;", which just has a ClockSource implementation in wall_ns.rs: 


OffsetLimiter

The crate has another struct OffsetLimiter that builds on ClockSource. OffsetLimiter is used for implementing  wrapper around Clock that will refuse updates outside of the epsilon tolerance window of the clock update.


Testing

I will talk about the tests in the crate, and how I wrote some other use case programs for testing hybrid clocks at a later post. 

Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book