NTP: The Protocol That Keeps Time from Collapsing


Most engineers think about time as a number returned by now().

This is acceptable until your TLS certificates appear to be from the future, your database replicas reject writes, your cron jobs run twice, and your forensic timeline reads like fiction.

Time is infrastructure. NTP is the part of that infrastructure that keeps distributed systems from arguing with physics.

The Supreme Leader classifies NTP as a strategic protocol: invisible when correct, catastrophic when neglected.

I. The Problem: Every Clock Lies Differently

Quartz drift is real. Thermal effects are real. Virtualized clocks are chaotic under load. Suspend/resume events produce nonsense. Hardware clocks and kernel clocks do not agree by default.

Without synchronization, machines in the same fleet diverge. With enough divergence, “before” and “after” become undecidable at incident scale.

NTP addresses this by estimating offset and network delay across multiple sources, then disciplining local time gradually.

It does not make clocks perfect. It makes clocks coherent enough for civilization.

II. Official History

NTP was developed by David L. Mills. Early work appears in the 1980s, with RFC 958 (1985) and RFC 1059 (1988), followed by later revisions culminating in NTPv4 (RFC 5905, 2010).

The protocol survived the transition from research networks to global commercial Internet because it solved a universal systems problem: independent machines need a common time base without trusting a single box blindly.

NTP’s longevity is not nostalgia. It is proof that the failure model was understood correctly.

III. How NTP Works in Practice

NTP runs primarily over UDP/123 and organizes sources by strata.

StratumMeaningTypical source
0Reference clocks (not directly network-served)GPS, atomic clocks, radio time receivers
1Directly attached to stratum 0Time servers in labs, IXPs, providers
2+Downstream synchronized serversEnterprise, ISP, cloud, campus servers

A client usually queries multiple upstreams, then applies selection and filtering to reject outliers and converge on stable offset.

Useful telemetry fields include:

  • Offset: how far local clock is from selected source
  • Delay: round-trip network latency
  • Jitter: variation in timing measurements
  • Root distance/dispersion: uncertainty budget through upstream chain

You do not want “the lowest latency server.” You want stable, sane, and diverse sources.

IV. Minimal Sane Configuration

A practical chrony baseline:

# Use diverse pools/sources
pool pool.ntp.org iburst maxsources 4
server time.cloudflare.com iburst nts

# Step on boot if offset is large, then slew
makestep 1.0 3

# Keep RTC in sync with system clock
rtcsync

Operational checks:

chronyc tracking
chronyc sources -v

Interpretation discipline matters more than green status icons. A source list is not health if all sources share one hidden failure domain.

V. Incidents That Taught the Industry to Respect Time

DateIncidentMechanismImpact
2012-06-30Leap second eventKernel/userspace timing bugs triggered high CPU loops on many Linux systemsWidespread service instability across major platforms
2016-12-31 / 2017-01-01Cloudflare leap-second DNS incidentEdge time handling bug around leap second processingPartial DNS resolution failures until mitigation
Ongoing (2010s)NTP amplification abuseAbusable query modes used for DDoS reflectionMajor volumetric attacks, hardening campaigns followed

These were not “NTP is bad” stories. They were “time handling assumptions were naive” stories.

The Supreme Leader notes that temporal bugs are politically similar to paperwork bugs: harmless until suddenly constitutional.

VI. Leap Seconds: Tiny Unit, Large Blast Radius

UTC occasionally inserts leap seconds to remain aligned with Earth’s rotation.

Software stacks historically assume time moves forward at uniform cadence. A leap second violates that assumption. If components disagree on whether to step, slew, smear, or ignore, event ordering can break.

Common strategies:

  • Step: apply one-second correction abruptly
  • Slew: adjust gradually over time
  • Leap smear: spread the correction over a window to avoid discontinuity

None is universally “correct” in isolation. Consistency across your own fleet matters more than ideological preference.

If half your systems smear and half step without design intent, you have built a temporal split-brain.

VII. Security and Trust in Time Distribution

Classic NTP was not designed for today’s threat assumptions.

Security improvements now include:

  • Better default hardening in modern daemons
  • Restrictive query modes to reduce reflection abuse
  • NTS (Network Time Security, RFC 8915) for authenticated time exchange over TLS-based key establishment

Reality check:

  • Authenticated time still depends on PKI, network reachability, and sane upstream selection.
  • If your time sources are all in one provider and that provider has a routing/control event, your trust chain narrows dangerously.

Time diversity is as important as DNS diversity and BGP policy hygiene.

VIII. Why Product Teams Should Care

NTP errors leak directly into product behavior:

  • JWT and OAuth token validity windows fail unexpectedly
  • Certificate checks break
  • Log ordering becomes unreliable for incident response
  • Cache eviction and TTL behavior drift
  • Distributed consensus and leader election become unstable

“It’s just one second” is an amateur statement in distributed systems.

One second is enough to violate invariants you did not know you had.

The Decree

NTP is not optional plumbing. It is temporal governance.

If you cannot answer these quickly, your platform is operating on hope:

  • Which daemons run time sync on each tier?
  • What is our leap-second policy (step/slew/smear) and is it consistent?
  • Do we have source diversity across providers and paths?
  • Are we monitoring offset/jitter/root distance, not just process up/down?

The Internet tolerates many kinds of sloppiness. It does not tolerate clocks that disagree for long.

Tomorrow: we can shift to virtio or Unicode history, unless you want the filesystem branch first.

— Kim Jong Rails, Supreme Leader of the Republic of Derails