Monday, October 24, 2016

It is about time

A couple months ago, I had visitors from SpectraCom. They gifted me a GPS timeserver. I am now the guy with the most accurate time in the department.

We met with these engineers for a couple hours discussing about time synchronization. It was a very enjoyable discussion and time flew by. We talked about how to achieve time synchronization at a box, how to distribute it, and its applications. I had taken some notes, and thought it would be useful to share them.

Estimated reading time: 6 minutes.
Suggested song to accompany (right-click and open link in new tab): Time from Pink Floyd

Precise clocks in a box

Atomic clocks often use Ribidium, which has a drift of only 1 microsecond a day. OCXO ovenized oscillators are the second best way to have precise clocks in a box. Temperature change has the most effect in crystal oscillation rate, which results in clock drift from "True Time". Ovenizing the oscillators provides a way to control/compensate temperature change. OCXO ovenized oscillators have a drift of 25 microsecond a day.

GPS time (box/distribution)

There are 4 big satellite  systems. GPS is the biggest and is maintained by US. Then comes GLONASS (yeah I know) by Russia,  Galileo by Europe, and  Beidou by China. India also has some regional satellites as well. Today all smartphones have support for GPS, and some recent ones also support GLONASS as well.

The satellites, which are up there around 20,000km altitude, have Ribidium based atomic clocks and they distribute time sync information. That looks like an awfully long distance, but that doesn't stop the satellites to serve as the most prominent time sync solution. Why? The answer has to do with distribution of time sync. Distribution of time sync over many hops/routers on the  Internet degrades the precision of the time sync. When distributing with wires, you have to relay: it is infeasible to have one long physical cable. And thus relaying/switching/routing adds nondeterministic delays to the time sync information. For the satellites, we have wireless distribution. Albeit the long distance, distribution from satellite is still one hop. And the distance delay is deterministic, because it can be calculated precisely by dividing the distance to the satellite with the speed of light. The accuracy of GPS time signals is ±10 ns.

GPS is an engineering marvel. Here you can read more (and admire) about GPS. Here are some interesting highlights about GPS synchronization. Constant ground-based correction is issued to the satellites to account for relativistic effects and other effects. Ground stations (US naval observatory NIST) transmit to satellites periodically to update/correct their atomic clocks for  Coordinated Universal Time (UTC). GPS is weatherproof. Even big storms would not degrade GPS signals significantly. However, jamming is more of a problem, since GPS is a very low power signal.

Assisted GPS helps smartphones lock on to the low power GPS signals. Celltowers provide smartphones with approximate time and position information. And also include GPS constellation information. So the smartphone knows where/which signals to lock to. There are also Pseudolites. These are stable on-the-ground satellite beacons. They simulate satellites and are now being considered for indoor localization systems. They spoof GPS, their signal overloads smartphones GPS chipsets.

Time sync distribution

NTP is by far the most popular time sync distribution protocol on the Internet. However, NTP clock sync errors can be amount to tens of milliseconds. The biggest source of problem for NTP distribution is the asymmetry in the links. Consider 100 mbps link feeding into 1 Gbps link. One way there is no delay, but coming back the other way there is queuing delay. This asymmetry introduces errors into time sync. NTP is also prone to security attacks. Having your timeservers are good for increased security against NTP attacks. Finance sector doesn't use NIST public source NTP servers since men-in-the-middle attack is possible. (Of course, it is also not that difficult to spoof GPS.) That all being said, I have great respect for the engineering efforts went into NTP, and what all NTP has provided for distributed systems over Internet. David Mills is a hero.

PTP IEEE 1588 is another time sync distribution protocol. PTP stands for precision time protocol PTP comes from industrial networking where it started as a multicast protocol. PTP enables hardware timestamping and has measures to eliminate link delay asymmetry. The time provider sends MAC-stamped time to the client, so the client can measure in-flight-time between time-server and itself. (In NTP time provider does not know the client, and is stateless with respect to the client. In NTP, the client asks and gets response from the NTP server which is oblivious to the client.) PTP does not have a standard reference implementation.

Applications of time sync

A big customer of time synchronization systems is power grids which use time synchronization to manage load balancing, distribution, and load shedding. Celltowers are also big customers of time synchronization. Celltowers used to have on-the-wire proprietary synchronization updated with sync-e or PTP. GPS-based synchronization has been replacing those quickly. As I mentioned earlier finance industry is a big client for time synchronization systems.

Time sync also have emerging applications in cloud/datacenter computing. The most prominent is probably Google Spanner which uses atomic clocks and GPS clocks to support externally-consistent distributed transactions at global scale.

I have been working on better clocks for distributed systems, and hybrid logical clocks and hybrid vector clocks resulted from that work. I am continuing that work to further explore the use of clocks for improving auditability of large scale distributed systems, as part of our project titled "Synchrony-aware Primitives for Building Highly Auditable, Highly Scalable, Highly Available Distributed Systems" (funded by NSF XPS, from 2015-2019, PI: Murat Demirbas and coPI: Sandeep Kulkarni):

"Auditability is a key property for developing highly scalable and highly available distributed systems; auditability enables identifying performance bottlenecks, dependencies among events, and latent concurrency bugs. In turn, for the auditability of a system, time is a key concept. However, there is a gap between the theory and the practice of distributed systems in terms of the use of time. The theory of distributed systems shunned the notion of time and considered asynchronous systems, whose event ordering is captured by logical clocks. The practical distributed systems employed NTP synchronized clocks to capture time but did so in ad hoc undisciplined ways. This project will bridge this gap and provide synchrony-aware system primitives that will support building highly auditable, highly scalable, and highly available distributed systems. The project has applications to cloud computing, distributed NewSQL databases, and globally distributed web services."

No comments:

Two-phase commit and beyond

In this post, we model and explore the two-phase commit protocol using TLA+. The two-phase commit protocol is practical and is used in man...