Security
Most security failures in research software projects are not the result of sophisticated attacks. They happen because someone reused a password, clicked a link they shouldn't have, or gave a service more access than it needed. The good news is that a small number of practices eliminates most of the risk. The bad news is that most projects never adopt them.
Threat Models
Before deciding what to protect, you need to know what you are protecting against. The three most common threats for research software projects are:
Casual threats are opportunistic. Someone notices that your project uses a public API key checked into GitHub and uses it to run compute jobs at your expense. Someone tries a list of common passwords against every account on your server. The attack isn't targeted; your project just happened to be vulnerable.
Intimate threats come from people who have legitimate access but abuse it. A departing collaborator copies data they shouldn't. A disgruntled contributor alters code in a way that's hard to detect. These are rarer but tend to cause more damage because the attacker already knows the systems.
Insider threats from institutions—a university IT department with broad access, a cloud provider that can read your data—are usually covered by contract and policy rather than technical controls. Know what your institution's policies say.
Most research software projects face casual threats almost exclusively. Design for those, and you will also reduce your exposure to the other two.
Authentication
Weak passwords are the single most common point of failure. A few rules that actually help:
- Use a password manager.
- Generate a unique random password for every service. Yes, this means you won't be able to remember them. That's the point. If you can remember a password, an attacker can probably guess it.
- Use a passphrase for anything you have to type from memory.
- Four unrelated words strung together ("correct horse battery staple") are both more memorable and harder to crack than a short password with substitutions.
- Enable two-factor authentication everywhere it is available.
- An attacker who gets your password still can't log in without your second factor. Use an authenticator app rather than SMS, which can be intercepted.
- Recognize phishing.
- A common attack is to trigger a password reset and then call the target pretending to be IT support—asking them to read back the confirmation code to "verify their identity." If you give them the code, they own your account. Never share confirmation codes with anyone who contacts you.
As a developer, require strong passwords and support two-factor authentication for any system where users have accounts. Use a well-tested library; don't build authentication yourself.
Least Privilege
The principle of least privilege says that each component of a system should have access to only what it needs and nothing more. In practice this means:
-
API keys and credentials should be stored in environment variables or a secrets manager, never in source code or configuration files checked into version control. If a key is accidentally committed, rotate it immediately— assume it has been compromised, because automated scanners watch public repositories and will find it.
-
Service accounts for automated processes (CI runners, deployment scripts) should have the minimum permissions needed. A CI runner that only needs to run tests shouldn't have write access to production infrastructure.
-
When someone leaves the project, revoke their access promptly. Credentials that nobody is monitoring are credentials that will eventually be misused.
Protecting Research Data
Research data often has privacy or regulatory implications that general software advice doesn't address. A few things to know:
- Don't store what you don't need.
- If your software processes data that contains personal information, consider whether you actually need to retain it. Data you don't have can't be breached.
- Encrypt sensitive data at rest and in transit.
- Cloud storage is not encrypted by default on most platforms. Check your configuration. Use HTTPS for anything transmitted over a network.
- Know your institution's requirements.
- IRB approvals, data use agreements, and export control rules may constrain where data can be stored and who can access it. These are legal obligations, not suggestions. If you aren't sure what applies to your project, ask your institution's research data office before setting up your infrastructure.
Security Theater
Much of what gets called security is actually security theater: practices that create the appearance of protection without meaningfully reducing risk. Requiring passwords to be changed every 90 days makes people choose worse passwords. Mandatory complex password rules ("must include a number, a capital letter, and a hieroglyph") produce passwords like "Password1!" that are in every attacker's dictionary. The predictable result is security fatigue: people stop caring about security precisely because the demands are unreasonable.
Real security comes from a small number of practices applied consistently: unique passwords via a password manager, two-factor authentication, least privilege, and encryption. Everything else is mostly noise [Smalls2021].
What to Do When Something Goes Wrong
If credentials are compromised or a breach is suspected:
-
Rotate affected credentials immediately, even before you understand the full scope of the problem.
-
Notify affected users promptly and specifically. "We have become aware of a security incident" is not useful. "On Tuesday, an attacker gained access to our user database. Email addresses were exposed. Passwords were stored as hashes and were not directly readable, but you should change your password on any other service where you used the same one" is useful.
-
Figure out how it happened before restoring normal operations. Patching the symptom without addressing the cause means it will happen again.
-
Document what happened and what you did about it. This is both good practice and often legally required.
The hardest thing about security incidents is that you need to have thought through your response before they happen. A breach at 2 a.m. is not the moment to figure out who has the credentials to rotate, or how to contact all your users.