Pkg: attack vectors

security
package-manager

#1

Pkg ecosystem: Learning from other's mistakes has gotten a bit long and wandering, as these discussions tend to. I’d like to have a very focused thread about attack vectors against the security of the package ecosystem. To that end, here are the attack vectors I’ve come up with so far. What are some other attack vectors?


Attack: find an existing bug in some package that you can exploit

Mitigation:

  • fix the bug
  • yank versions that have it

Prevention:

  • testing
  • fuzzing
  • basically anything that improves program correctness in general

Attack: create a back door through normal development process

Mitigation:

  • close the back door
  • yank versions that have it
  • blacklist the person who created it

Prevention:

  • identify risky changes and bring more attention to them to check if they’re malicious
  • use signatures to hold people responsible

Attack: introduce a back door by replacing an innocent package version with a malicious one

Mitigation:

  • see prevention
  • is there any other mitigation step here?

Prevention:

  • serve code from trusted servers over secure protocols (e.g. HTTPS)
  • identify versions by permanent secure hashes and don’t allow them to be changed
  • verify that code has correct hashes on installation
  • be ready to use newer hashes when old ones reach end of life (e.g. SHA1)

Attack: typo squatting

Mitigation:

  • delete squatting packages from registries

Prevention:


Attack: package deletion

Mitigation:

  • find a fork of the package
  • make the fork the official repo

Prevention:

  • automatically fork all registered packages
  • allow installation of packages from the automatic forks

Pkg ecosystem: Learning from other's mistakes
#2

For those wondering what “yanking” a version means; see https://doc.rust-lang.org/cargo/reference/publishing.html#cargo-yank which we have just mimicked with https://github.com/JuliaLang/Pkg.jl/pull/726


#3

Attack: Yank package with lots of downstream in a temper tantrum, effectively leading to a lot of immediate downstream trouble (pkg cannot install many packages until they update). Aka leftpad.

Mitigation:

  • Restore the package in the registry, through human intervention
  • Fix all downstream to either inline the functionality or instead rely on a fork
  • If large parts of the ecosystem are downstream of leftpad-style packages, nuke it from orbit. It’s the only way to be sure

Prevention:

  • Don’t give authors unilateral power to yank packages they own
  • Don’t let large parts of the ecosystem sit downstream of semi-maintained packages that nobody knows about
  • Decide on official recommendations / defaults with regards to version pinning

#4

The recent NPN attack would probably fall into the second category (introduce a backdoor through normal development), but the listed prevention steps wouldn’t prevent such an attack on a Julia package.

Maybe an additional prevention item should be to provide tight compatibility requirements for dependencies? Unfortunately, that would mean no carat or tilde version specifiers, and a whole lot more work for package authors and registry maintainers.


#5

Good point. Authors already don’t have unilateral ability to yank anything—the only way to yank anything currently is make a manual change to the registry, which only people with permission to update the registry can do. What package owners can do, however, is delete repositories on GitHub. We do need a backup mechanism where we maintain permanent forks of all packages.


#6

The idea is that yanking is a tool for the registry maintainer, not the package author. The package author can of course create a PR to yank a release, but it is then up to the maintainer to merge.


#7

It would—that was the category of attack I saw the NPM situation falling into.

but the listed prevention steps wouldn’t prevent such an attack on a Julia package.

It would, given the right interpretation of the admittedly vague prevention:

  • identify risky changes and bring more attention to them to check if they’re malicious

In the NPM situation, the risky changes were the ones introduced by a previously unknown maintainer which people had no reason to trust. The main failure was that hardly anyone was aware of the transition of maintainership; had more people been made aware, they could have looked at the changes and, given enough eyes, someone might have caught it.

Accordingly, a good trust system to protect against this class of attacks, is one which would bring new versions of packages authored by unknown individuals that have no systemic trust to the attention of many people, giving them the option to:

  1. Choose not to use the new versions.
  2. Choose to trust the new versions blindly.
  3. Review the new versions and, if they look ok, vouch for them.

Of course, the value of such a voucher depends on whether you trust the person reviewing. Review vouchers should be sharable so that everyone benefits.

Given such a system, the process in the NPM situation would be:

  • New maintainer release a new version
  • People updating get prompted when they upgrade
  • The default is “don’t use”—most people do that
  • Some people review the new code
  • Given enough reviewers, someone discovers the backdoor.

In this workflow, the only people who get compromised are people who either actively opted to trust blindly or did a review and missed the backdoor. In either case, those people can really only hold themselves responsible in a very direct way.

In the much more common situation of a new maintainer who is not malicious, the workflow would be this instead:

  • New maintainer release a new version
  • People updating get prompted when they upgrade
  • The default is “don’t use”—most people do that
  • Some people check out the new versions
  • Given enough reviewers, who vouch for it, the code release becomes trusted
  • Making a release that becomes trusted increases trust in the new maintainer
  • After a while their code no longer needs to go through this process.

Maybe an additional prevention item should be to provide tight compatibility requirements for dependencies? Unfortunately, that would mean no carat or tilde version specifiers, and a whole lot more work for package authors and registry maintainers.

I don’t see how compatibility requirements are relevant here. NPM uses exact dependency versions and that didn’t help.


#8

Additional prevention steps for bugs and backdoors (I guess long discussions of pro/contra for each of them is out-of-scope for this laudably focused thread, but listing them as options to maybe explore should be ok?):

  • Make it easier for downstream to figure out which packages/versions are “well-maintained”. E.g. by automated processes (I dunno, people voting on packages? github stars?) or by shipping an opinionated smaller registry alongside the large registry (my personal favorite).
  • Have some semi-formal “declaration of intent” with respect to maintenance for packages. Idea is that prospective downstream can easily see that relying on a package is a questionable idea, if the package authors/owners themselves claim that their package is probably not going to be maintained because they moved on or this was a one-shot PoC. Additionally, this can give guidance for package authors on how to pass on the torch (try not violate downstream trust into your own words; but life happens).
  • Have a way to trigger a “cry for help” from package owners: If they decide that they can’t keep up, then their downstream needs to be notified in order to either help out, fork, or switch out for a different provider of the functionality.
  • Attack surface reduction: Introduce a new type of “header package” that cannot contain executable code or exports, only abstract type and function declaration, plus possibly constants out of a very small selection of white-listed types. Reason is that optional dependencies in julia require import of the analog of RecipesBase in order to play with Plots. Header packages have no attack surface, even if they become malicious, and therefore reduce the amount of upstream code that potentially needs to be audited. @sbromberger always has this issue. This would e.g. allow LightGraphs.jl to ship plotting code for Plots.jl without risking anything for users that don’t plot, even in the unlikely case that the entire plotting ecosystem gets backdoored to hell. This would require a hardened parser for header packages (always assumed to be malicious).

#9

Thanks for the explanation re yanking, I misunderstood that.

Regarding forks: Would it be OK to automatically fork everything from the central registry and have package installs from the central registry be served from the forks?

That way, the registry maintainers can always override anything that package authors do (with respect to users who install from the registry).

Regarding current state of multiple registry support: How is that resolved? E.g. I have two registries; both contain a package called “MyExample.jl”; which one gets precedence? Is the case where both registries contain the identical package resolved painlessly (i.e. without additional user intervention)? Can a package have deps that are in a different registry, and what happens then?

Sorry for the somewhat naive questions regarding current pkg design.

At the risk of pointing out the obvious: (1) there is such a thing as “warning fatigue”. Asking end-users to review code that is 3 levels upstream of the package they use is questionable: “MBedTLS.jl has a new maintainer! update, ignore, review diff? [uIr]” for people who just want to plot something in ijulia. The ijulia maintainers are the people who are qualified to review. (2) There is a trade-off between backdoors (default: don’t update) and ordinary security bugs (install all critical security updates as quick as possible, bad guys are warming up their port-scanners NOW).


#10

While Pkg may do this, I’m not certain that the BinDeps actually does. A few months ago, I tried installing RandomMatrices on a unreliable cellular connection. It failed to download the cmake binary, but the next time I tried to install, it actually unpacked the truncated file and complained about an “unexpected end of file”. And on the third attempt, it complained that the cmake bin directory did not exist in the unpacked directory tree. Ideally these operations would have been (i) atomic and (ii) the hashes would have been checked at the beginning. This was on Sep 20, so things may have been fixed since then. Full error log at https://gist.github.com/garrison/c41563adf3cc3dbb5088ab0736e872cc

The moral: it may actually be nontrivial to assess whether all relevant code paths are verifying that hashes are correct. A successful attack only requires one such path.


#11

Also: while HTTPS is a good practice, it is no substitute for verifying hashes of packages before they are to be installed. Forged SSL certificates are known to exist (e.g. for gmail in 2011).

Another way of framing thinking about security is to follow the Qubes philosophy of “distrusting the infrastructure.” More on this at https://www.qubes-os.org/faq/#what-does-it-mean-to-distrust-the-infrastructure

A related attack vector would be to target and compromise a build machine that creates binaries, thus leading to compromised versions of julia or important binary dependencies being distributed. One way to mitigate this is to work toward reproducible builds.


#12

As an addition: There are lots of things one can do wrong when verifying hashes/signatures of packages (see hilarious android vulnerabilities where the sig-check and installer used different pkzip implementations that differed for ambiguous files). One way that is very hard to get wrong is used by eg chrome extensions (.crx files): Hash and sign the zip / tarball, verify before parsing, see http://www.adambarth.com/experimental/crx/docs/crx.html. They solve key distribution (the CA problem) in the very elegant way that the uuid (the true name) of the package is a cryptographic hash of the single public key that is authorized to sign it and/or its updates. This allows a very simple (and near impossible to mess up) verification flow for updates. I don’t think we can actually do this, but it is a masterpiece of defensive design.


#13

Attack: Malware binary dependency

Mitigation: ? Keep an up to date virus scanner active, probably not foolproof

Prevention: Maybe something similar to the approach for package security, keeping a whitelist of allowable binaries and their fingerprints

A secondary concern would be binaries that are relatively safe themselves but then download other binaries, usually a virus scanner will pick this up though


#14

Virus-scanners are not a “probably not foolproof” mitigation for malware. They are a good tool for detecting past compromises (run new virus scanner on old backup / memory dumps), noise reduction (run on email server), and reduction of compromises for people who ask to get compromised (eg people who torrent binaries).

Detecting past compromises is important: Having someone own your network for 6 months is preferable to having someone own your network for 5 years. Most importantly, this produces a very positive shift in the game theory: Deploying your fancy rootkit and getting caught has real consequences, even for nation states: somebody writes a signature, and all your other deployments get found out, which gets you kicked off the network, simplifies attribution (same actor at many sites) and possibly has political fallout.

But your point is important: How are binary deps currently handled?

I see three ways: First, distribute source or blob, write build script. Handle the same as julia files. Second, don’t handle at all. It is the user’s job to install the shared library in a way that libdl can find it. Third, use something like BinDeps and download source or binary during build. This is a problem.

Simple idea: Require that packages declare in the registry/metadata, whether they download sources or binaries during their build process (triggering an aptget install that asks the user for permission does not count as that). Have the user-interface for pkg reflect that (“pkg xyz will download executable code from the internet, outside of anybody’s oversight. Proceed? [yN]”). Enforcement could be semi-formal (packages that do unauthorized downloads get yanked).

Simple mitigation with virus-scanning: Push all packages (in tarball form) to virustotal, and regularly do this with historic versions (this is the registry’s job, not the user’s job). If some package distributes binary blobs and their buildserver got powned into introducing malware, then we at least have a chance of detecting the comromise, if the attack was automated and not targeted. And we have a decent chance of detecting the compromise a year after the fact, which is far better than never.