Saving Time,

Saving the Internet

Susan Sons
ISO Emeritus, NTPSec

Hacker-in-Chief, ICEI
Senior Systems Analyst, IU CACR

“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”

― Margaret Mead

Where did the internet come from?




Graphical Browsing, Mosaic (NCSA, with Fed $)

W3C (DARPA, European Commission)

First Public TCP/IP Code (AT&T)

Golden Gate Bridge Maintenance

  • 13 ironworkers
  • 3 pusher ironworkers
  • 28 painters
  • 5 painter laborers
  • 1 chief bridge painter

What ISPs maintain:

  • Customer Support Staff
  • Network Administration Staff
  • Cables
  • Routing Hardware
  • Power For the Hardware
  • Connectivity Through Peer & Upstream Providers
  • Advertising
  • Lobbyists
  • Billing and Payment Infrastructure

Code rot is real.

  • ISPs
  • Online Retailers
  • Online Marketplaces
  • Everyone selling SaaS or PaaS
  • Computing Companies
  • Educational Institutions
  • Banks & Finance
  • Cellular Carriers
  • Libraries
  • Disaster Relief

Powered by Open Source Infrastructure Software

  • Media & Entertainment
  • Medicine
  • Shipping
  • Military & Government
  • Science
  • Companies With Remote Workers
  • Everyone Who Needs GPS
  • The Power Grid
  • The Repair Industry
  • Travel and Travelers

...and, I'm willing to bet, at least 90% of the people in this room.

NTP: a case study in digital firefighting

What is NTP?

NTP is...

  • Network Time Protocol: the primary way most computers throughout the world find out what time it is, and maintain synchronization with one another and the actual passage of time.

  • The reference implementation, in software, of that protocol: both the server and client side, plus the algorithms that use that information to regulate system clocks.

In February 2015, NTP was also a gigantic mess.

The Security Nightmare

NTP was critical.

NTP was insecure.

“given enough eyeballs,
all bugs are shallow”

--Linus Torvalds

I learned how deep the rabbit hole went...

No OSS gets broken to the point of crisis without a driving set of systemic social problems.


If these are not addressed, any improvement to the code will be short-term.

To his credit...

NTP's maintainer asked for help.

In NTP's Case:

  • Poor resource allocation
  • Hostility to new contributors
  • Clinging to broken process and
    tooling as a mechanism of control.

The Rescue

Step Zero:

Decide that you are going to be responsible.

Facts of Life:

Software Rescue Edition

A clear, concrete, finite scope:

necessary, not optional

Expect and forgive drama.

Spend time with people.

The purpose of a rescue is long-term sustainability.

How do you set a scope when you know there are unseen bugs lurking everywhere, and you are not deeply familiar with the code base?

The code is the easy part.

  • Fixing bugs is temporary.

  • Make bugs easier to fix.

  • Eliminate or prevent classes of bugs.

  • Rescue should result in a long tail of bug fixing.

High-Return Technical Improvements:

  • Code Access
  • Build Process
  • Testing Infrastructure and Automation
  • Documentation
  • Refactors that accomplish:
    • Major code reduction
    • Major improvements in internal compartmentation
    • Major tightening of internal APIs
    • Migration away from dangerous dependencies
  • ​Bugs that are immediate security crises.

What this meant for the NTP rescue's technical goals:

  • Migrate from Bitkeeper to git
  • Replace brittle build system with a modern, WAF-based build.
  • Update documentation enough to start onboarding new developers.
  • Fix as many security problems as possible before our time and money ran out.

Code Longevity:

  • Repository & Access
  • Build System
  • Tests
  • Documentation
  • Communication Channels
  • Personnel

People, Drama, and Sustainability

I got lucky.

We needed programmers


  • Familiar with ancient C code

  • Experienced in Linux/UNIX systems programming

  • Capable of working on highly critical code

  • With some idea how time works

  • Who care about open source and security

  • Who can spend a lot of time on this.

We also needed:

  • A way to keep those programmers fed

  • Help with documentation and toolchain work

  • Means to demonstrate to the existing NTP community that we weren't abandoning them

  • An understanding of the existing install base that we didn't have

  • The means to maintain the code, documentation, and community post-rescue

  • Some way to convince people to actually deploy the thing

NTP Classic

Two administrative staff.

One fundraiser.

One developer.


2-4 semi-active community members.

Rescue Team

Susan Sons,  PM / ISO

Eric Raymond,  lead dev

Gary Miller,  developer

NaLette Brodnax,  docs

Amar Takhar.  tools dev


...and a handful of concerned community members.

Much to my personal disappointment...

...I didn't find myself writing code on this one.

Managing a critical software rescue:

  • Deep understanding of
    • the problem domain
    • software engineering process
    • people

      The worst mistake one can make is to misidentify the problem.
  • Relationships
  • Resilience and Calm
  • Coding and Software Architecture Expertise

I can't teach you my whole process in this talk, but...

So, how did the story end?

As of June 2017...

  • NTPSec has a healthy and growing team.
  • Due to a reduction in C code of over 75% (from 227kLOC to 56kLOC), (with about 7k Python added), NTPSec was immune to over 75% of NTP Classic vulns BEFORE discovery in the last year.
  • NTPSec patches security vulnerabilities, on average, within less than 12 hours after discovery.  Note that publication is sometimes slowed to coordinate with NTP Classic releases.
  • NTPSec's vulnerability response has pressured NTP Classic to speed up their response from months-to-years to days-to-weeks upon threats of funders pulling out.

NTPSec's core team has been through a lot, but we still meet up about once a year and hang out, because it was a wild ride with good people.  I was given an emeritus title when I stepped down last spring, in the hope that I'd remain "part of the family".

I moved on...

Pony Factor

How many currently active committers account for >50% of the code base?

Based on research by Daniel Gruno of

Why does it matter?

  • NTP
  • OpenSSL (think Heartbleed)
  • Bash (think Shellshock)
  • Costs of personnel turnover
  • Costs of neglect
  • Risk of malicious compromise

Pony Factor of some widely used OSS projects:

As of Fall 2016                               Image credit: Dave Nalley

What happens when OSS infrastructure fails?

Do something about crumbling, insecure internet infrastructure

Questions/comments/etc welcome:

Many Thanks!

To Wikimedia Foundation for their awesome library of freely reusable media, which spared you from my toddler-like drawing ability.

To Indiana University's Center for Applied Cybersecurity Research, and specifically the NSF-funded Center for Trustworthy Scientific Cyberinfrastructure, who funded the NTP Rescue project.  Also to the Internet Civil Engineering Institute, who aided with organization and developer resources.

To Cornerstones of Trust, for bringing me here to tell you this story.

To the NTP Security Project team, who made sure the rescue effort didn't go to waste.  NTPSec is poised to replace NTP classic in the coming year in installations around the world.

To the countless individual humans along the way who did NOT say

"this is somebody else's problem".

Using and Sharing This Work:

Creative Commons License  "Social Engineering: Hacking Humans" by Susan Sons is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


Please credit Susan Sons and the Internet Civil Engineering Institute (ICEI) when using this presentation.

Permissions beyond the scope of this license may be available; send inquiries to


The most current version of this presentation is available from

Saving Time, Saving the Internet

By Susan Sons

Saving Time, Saving the Internet

Keynote from Cornerstones of Trust 2017

  • 2,206