Saving the Internet
ISO Emeritus, NTPSec
Senior Systems Analyst, IU CACR
“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”
― Margaret Mead
Where did the internet come from?
CSNET & NSFNET (NSF)
HTTP & HTML (CERN)
Graphical Browsing, Mosaic (NCSA, with Fed $)
W3C (DARPA, European Commission)
First Public TCP/IP Code (AT&T)
Golden Gate Bridge Maintenance
- 13 ironworkers
- 3 pusher ironworkers
- 28 painters
- 5 painter laborers
- 1 chief bridge painter
What ISPs maintain:
- Customer Support Staff
- Network Administration Staff
- Routing Hardware
- Power For the Hardware
- Connectivity Through Peer & Upstream Providers
- Billing and Payment Infrastructure
Code rot is real.
- Online Retailers
- Online Marketplaces
- Everyone selling SaaS or PaaS
- Computing Companies
- Educational Institutions
- Banks & Finance
- Cellular Carriers
- Disaster Relief
Powered by Open Source Infrastructure Software
- Media & Entertainment
- Military & Government
- Companies With Remote Workers
- Everyone Who Needs GPS
- The Power Grid
- The Repair Industry
- Travel and Travelers
...and, I'm willing to bet, at least 90% of the people in this room.
NTP: a case study in digital firefighting
What is NTP?
- Network Time Protocol: the primary way most computers throughout the world find out what time it is, and maintain synchronization with one another and the actual passage of time.
- The reference implementation, in software, of that protocol: both the server and client side, plus the algorithms that use that information to regulate system clocks.
In February 2015, NTP was also a gigantic mess.
The Security Nightmare
NTP was critical.
NTP was insecure.
“given enough eyeballs,
all bugs are shallow”
I learned how deep the rabbit hole went...
No OSS gets broken to the point of crisis without a driving set of systemic social problems.
If these are not addressed, any improvement to the code will be short-term.
To his credit...
NTP's maintainer asked for help.
In NTP's Case:
Poor resource allocation
Hostility to new contributors
- Clinging to broken process and
tooling as a mechanism of control.
Decide that you are going to be responsible.
Facts of Life:
Software Rescue Edition
A clear, concrete, finite scope:
necessary, not optional
Expect and forgive drama.
Spend time with people.
The purpose of a rescue is long-term sustainability.
How do you set a scope when you know there are unseen bugs lurking everywhere, and you are not deeply familiar with the code base?
The code is the easy part.
Fixing bugs is temporary.
Make bugs easier to fix.
Eliminate or prevent classes of bugs.
Rescue should result in a long tail of bug fixing.
High-Return Technical Improvements:
- Code Access
- Build Process
- Testing Infrastructure and Automation
Refactors that accomplish:
- Major code reduction
- Major improvements in internal compartmentation
- Major tightening of internal APIs
- Migration away from dangerous dependencies
- Bugs that are immediate security crises.
What this meant for the NTP rescue's technical goals:
- Migrate from Bitkeeper to git
- Replace brittle build system with a modern, WAF-based build.
- Update documentation enough to start onboarding new developers.
- Fix as many security problems as possible before our time and money ran out.
- Repository & Access
- Build System
- Communication Channels
People, Drama, and Sustainability
I got lucky.
We needed programmers
Familiar with ancient C code
Experienced in Linux/UNIX systems programming
Capable of working on highly critical code
With some idea how time works
Who care about open source and security
Who can spend a lot of time on this.
We also needed:
A way to keep those programmers fed
Help with documentation and toolchain work
Means to demonstrate to the existing NTP community that we weren't abandoning them
An understanding of the existing install base that we didn't have
The means to maintain the code, documentation, and community post-rescue
Some way to convince people to actually deploy the thing
Two administrative staff.
2-4 semi-active community members.
Susan Sons, PM / ISO
Eric Raymond, lead dev
Gary Miller, developer
NaLette Brodnax, docs
Amar Takhar. tools dev
...and a handful of concerned community members.
Much to my personal disappointment...
...I didn't find myself writing code on this one.
Managing a critical software rescue:
- Deep understanding of
- the problem domain
- software engineering process
The worst mistake one can make is to misidentify the problem.
- Resilience and Calm
- Coding and Software Architecture Expertise
I can't teach you my whole process in this talk, but...
So, how did the story end?
As of June 2017...
NTPSec has a healthy and growing team.
Due to a reduction in C code of over 75% (from 227kLOC to 56kLOC), (with about 7k Python added), NTPSec was immune to over 75% of NTP Classic vulns BEFORE discovery in the last year.
NTPSec patches security vulnerabilities, on average, within less than 12 hours after discovery. Note that publication is sometimes slowed to coordinate with NTP Classic releases.
- NTPSec's vulnerability response has pressured NTP Classic to speed up their response from months-to-years to days-to-weeks upon threats of funders pulling out.
NTPSec's core team has been through a lot, but we still meet up about once a year and hang out, because it was a wild ride with good people. I was given an emeritus title when I stepped down last spring, in the hope that I'd remain "part of the family".
I moved on...
How many currently active committers account for >50% of the code base?
Breakdown by Dave Nalley:
Why does it matter?
- OpenSSL (think Heartbleed)
- Bash (think Shellshock)
- Costs of personnel turnover
- Costs of neglect
- Risk of malicious compromise
Pony Factor of some widely used OSS projects:
As of Fall 2016 Image credit: Dave Nalley
What happens when OSS infrastructure fails?
Do something about crumbling, insecure internet infrastructure
This deck is at: https://slides.com/hedgemage/cot2017
To Wikimedia Foundation for their awesome library of freely reusable media, which spared you from my toddler-like drawing ability.
To Indiana University's Center for Applied Cybersecurity Research, and specifically the NSF-funded Center for Trustworthy Scientific Cyberinfrastructure, who funded the NTP Rescue project. Also to the Internet Civil Engineering Institute, who aided with organization and developer resources.
To Cornerstones of Trust, for bringing me here to tell you this story.
To the NTP Security Project team, who made sure the rescue effort didn't go to waste. NTPSec is poised to replace NTP classic in the coming year in installations around the world.
To the countless individual humans along the way who did NOT say
"this is somebody else's problem".
Using and Sharing This Work:
"Social Engineering: Hacking Humans" by Susan Sons is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Please credit Susan Sons and the Internet Civil Engineering Institute (ICEI) when using this presentation.
Permissions beyond the scope of this license may be available; send inquiries to firstname.lastname@example.org.
The most current version of this presentation is available from
Saving Time, Saving the Internet
By Susan Sons