Succession and Endings

planning for endings before they're emergencies

Learning Objectives

Before

At Tahia's annual review, she mentioned that she was applying for faculty positions, and expected to hear back within six months. Jess went home that evening and tried to answer a simple question: if Tahia left tomorrow, what could the project still do?

The answer was sobering. Tahia was the only person who knew how to package a release. She held the credentials for the Zenodo account that issued DOIs. She had written the data pipeline that fed the simulator's climate inputs, and while the code was in the repository, the configuration files that made it work with the university's HPC cluster were in a directory on Tahia's laptop. The simulator's automated tests passed on the CI server and on Tahia's machine, but not on anyone else's.

Preparing for Endings Before They Happen (10 min)

The lottery factor is a rough measure of project fragility: how many people would need to be unexpectedly unavailable before the project couldn't function? A lottery factor of one means the project is one departure, illness, or winning ticket away from a crisis. Most research software projects have a lottery factor of one for at least one critical function, and most project leads don't find out until the person has left.

Preparing for a transition does not mean planning for failure. It means making the project's health independent of any individual's continued participation. The closure workshop covers this in depth; the minimum version for an ongoing project is three things:

A shared credentials document
Every account, service, and resource the project depends on must be stored in a password manager that at least two people can access. This takes you from "we use Zenodo" to "the Zenodo account is managed through the lab's shared 1Password vault, the recovery key is with the department administrator."
A release checklist
The exact steps to make a release—including how to update the version number following semantic versioning—with enough detail that someone who has never done it can follow the checklist and succeed. Run through it with a second person or an LLM before you need to. The moment you need to use it for real is not the moment to discover that it is incomplete and out of date..
An advance directive
A one-page document that set out the conditions under which this project should be wound down, who has the authority to make that decision, and what must be preserved before it can be considered closed. Writing this when the project is healthy is not morbid: it is the same reasoning that makes you test your backup before you lose data.

"Document everything" is not a succession plan. People reliably short-change documentation in favor of doing other work, and what they do document usually doesn't address tomorrow's actual needs. A succession plan is a specific document about a specific transition: who does what, in what order, and how to verify it worked. It is written with the person who will use it in mind, not the person who already knows how everything works.

After

Jess spent two afternoons with Tahia documenting the release process and the HPC configuration. They discovered that two steps in the process required a personal account rather than a shared project account. They also discovered that one of the configuration files had been modified six months ago in a way that hadn't been committed to the repository. The documentation surfaced both problems before they became emergencies. Tahia ended up staying for another year, but the project was no longer fragile in the same way.

Exercises

Lottery Factor Audit (10 min)

Answer these questions for your project:

  1. Who can create a release?
  2. Which credentials exist only on someone's personal account rather than a shared project account?
  3. What would your project be unable to do if you were unavailable for three months?
  4. What would be unrecoverable if a key team member left without any notice?

Write Your Advance Directive (10 min)

Write a one-page document covering:

Give the document to an LLM and prompt it to identify three realistic scenarios it doesn't cover.