Pkg: attack vectors

Pkg ecosystem: Learning from other's mistakes has gotten a bit long and wandering, as these discussions tend to. I’d like to have a very focused thread about attack vectors against the security of the package ecosystem. To that end, here are the attack vectors I’ve come up with so far. What are some other attack vectors?


Attack: find an existing bug in some package that you can exploit

Mitigation:

  • fix the bug
  • yank versions that have it

Prevention:

  • testing
  • fuzzing
  • basically anything that improves program correctness in general

Attack: create a back door through normal development process

Mitigation:

  • close the back door
  • yank versions that have it
  • blacklist the person who created it

Prevention:

  • identify risky changes and bring more attention to them to check if they’re malicious
  • use signatures to hold people responsible

Attack: introduce a back door by replacing an innocent package version with a malicious one

Mitigation:

  • see prevention
  • is there any other mitigation step here?

Prevention:

  • serve code from trusted servers over secure protocols (e.g. HTTPS)
  • identify versions by permanent secure hashes and don’t allow them to be changed
  • verify that code has correct hashes on installation
  • be ready to use newer hashes when old ones reach end of life (e.g. SHA1)

Attack: typo squatting

Mitigation:

  • delete squatting packages from registries

Prevention:


Attack: package deletion

Mitigation:

  • find a fork of the package
  • make the fork the official repo

Prevention:

  • automatically fork all registered packages
  • allow installation of packages from the automatic forks
16 Likes

For those wondering what “yanking” a version means; see Publishing on crates.io - The Cargo Book which we have just mimicked with https://github.com/JuliaLang/Pkg.jl/pull/726

1 Like

Attack: Yank package with lots of downstream in a temper tantrum, effectively leading to a lot of immediate downstream trouble (pkg cannot install many packages until they update). Aka leftpad.

Mitigation:

  • Restore the package in the registry, through human intervention
  • Fix all downstream to either inline the functionality or instead rely on a fork
  • If large parts of the ecosystem are downstream of leftpad-style packages, nuke it from orbit. It’s the only way to be sure

Prevention:

  • Don’t give authors unilateral power to yank packages they own
  • Don’t let large parts of the ecosystem sit downstream of semi-maintained packages that nobody knows about
  • Decide on official recommendations / defaults with regards to version pinning
4 Likes

The recent NPN attack would probably fall into the second category (introduce a backdoor through normal development), but the listed prevention steps wouldn’t prevent such an attack on a Julia package.

Maybe an additional prevention item should be to provide tight compatibility requirements for dependencies? Unfortunately, that would mean no carat or tilde version specifiers, and a whole lot more work for package authors and registry maintainers.

Good point. Authors already don’t have unilateral ability to yank anything—the only way to yank anything currently is make a manual change to the registry, which only people with permission to update the registry can do. What package owners can do, however, is delete repositories on GitHub. We do need a backup mechanism where we maintain permanent forks of all packages.

7 Likes

The idea is that yanking is a tool for the registry maintainer, not the package author. The package author can of course create a PR to yank a release, but it is then up to the maintainer to merge.

It would—that was the category of attack I saw the NPM situation falling into.

but the listed prevention steps wouldn’t prevent such an attack on a Julia package.

It would, given the right interpretation of the admittedly vague prevention:

  • identify risky changes and bring more attention to them to check if they’re malicious

In the NPM situation, the risky changes were the ones introduced by a previously unknown maintainer which people had no reason to trust. The main failure was that hardly anyone was aware of the transition of maintainership; had more people been made aware, they could have looked at the changes and, given enough eyes, someone might have caught it.

Accordingly, a good trust system to protect against this class of attacks, is one which would bring new versions of packages authored by unknown individuals that have no systemic trust to the attention of many people, giving them the option to:

  1. Choose not to use the new versions.
  2. Choose to trust the new versions blindly.
  3. Review the new versions and, if they look ok, vouch for them.

Of course, the value of such a voucher depends on whether you trust the person reviewing. Review vouchers should be sharable so that everyone benefits.

Given such a system, the process in the NPM situation would be:

  • New maintainer release a new version
  • People updating get prompted when they upgrade
  • The default is “don’t use”—most people do that
  • Some people review the new code
  • Given enough reviewers, someone discovers the backdoor.

In this workflow, the only people who get compromised are people who either actively opted to trust blindly or did a review and missed the backdoor. In either case, those people can really only hold themselves responsible in a very direct way.

In the much more common situation of a new maintainer who is not malicious, the workflow would be this instead:

  • New maintainer release a new version
  • People updating get prompted when they upgrade
  • The default is “don’t use”—most people do that
  • Some people check out the new versions
  • Given enough reviewers, who vouch for it, the code release becomes trusted
  • Making a release that becomes trusted increases trust in the new maintainer
  • After a while their code no longer needs to go through this process.

Maybe an additional prevention item should be to provide tight compatibility requirements for dependencies? Unfortunately, that would mean no carat or tilde version specifiers, and a whole lot more work for package authors and registry maintainers.

I don’t see how compatibility requirements are relevant here. NPM uses exact dependency versions and that didn’t help.

2 Likes

Additional prevention steps for bugs and backdoors (I guess long discussions of pro/contra for each of them is out-of-scope for this laudably focused thread, but listing them as options to maybe explore should be ok?):

  • Make it easier for downstream to figure out which packages/versions are “well-maintained”. E.g. by automated processes (I dunno, people voting on packages? github stars?) or by shipping an opinionated smaller registry alongside the large registry (my personal favorite).
  • Have some semi-formal “declaration of intent” with respect to maintenance for packages. Idea is that prospective downstream can easily see that relying on a package is a questionable idea, if the package authors/owners themselves claim that their package is probably not going to be maintained because they moved on or this was a one-shot PoC. Additionally, this can give guidance for package authors on how to pass on the torch (try not violate downstream trust into your own words; but life happens).
  • Have a way to trigger a “cry for help” from package owners: If they decide that they can’t keep up, then their downstream needs to be notified in order to either help out, fork, or switch out for a different provider of the functionality.
  • Attack surface reduction: Introduce a new type of “header package” that cannot contain executable code or exports, only abstract type and function declaration, plus possibly constants out of a very small selection of white-listed types. Reason is that optional dependencies in julia require import of the analog of RecipesBase in order to play with Plots. Header packages have no attack surface, even if they become malicious, and therefore reduce the amount of upstream code that potentially needs to be audited. @anon94023334 always has this issue. This would e.g. allow LightGraphs.jl to ship plotting code for Plots.jl without risking anything for users that don’t plot, even in the unlikely case that the entire plotting ecosystem gets backdoored to hell. This would require a hardened parser for header packages (always assumed to be malicious).

Thanks for the explanation re yanking, I misunderstood that.

Regarding forks: Would it be OK to automatically fork everything from the central registry and have package installs from the central registry be served from the forks?

That way, the registry maintainers can always override anything that package authors do (with respect to users who install from the registry).

Regarding current state of multiple registry support: How is that resolved? E.g. I have two registries; both contain a package called “MyExample.jl”; which one gets precedence? Is the case where both registries contain the identical package resolved painlessly (i.e. without additional user intervention)? Can a package have deps that are in a different registry, and what happens then?

Sorry for the somewhat naive questions regarding current pkg design.

At the risk of pointing out the obvious: (1) there is such a thing as “warning fatigue”. Asking end-users to review code that is 3 levels upstream of the package they use is questionable: “MBedTLS.jl has a new maintainer! update, ignore, review diff? [uIr]” for people who just want to plot something in ijulia. The ijulia maintainers are the people who are qualified to review. (2) There is a trade-off between backdoors (default: don’t update) and ordinary security bugs (install all critical security updates as quick as possible, bad guys are warming up their port-scanners NOW).

While Pkg may do this, I’m not certain that the BinDeps actually does. A few months ago, I tried installing RandomMatrices on a unreliable cellular connection. It failed to download the cmake binary, but the next time I tried to install, it actually unpacked the truncated file and complained about an “unexpected end of file”. And on the third attempt, it complained that the cmake bin directory did not exist in the unpacked directory tree. Ideally these operations would have been (i) atomic and (ii) the hashes would have been checked at the beginning. This was on Sep 20, so things may have been fixed since then. Full error log at gist:c41563adf3cc3dbb5088ab0736e872cc · GitHub

The moral: it may actually be nontrivial to assess whether all relevant code paths are verifying that hashes are correct. A successful attack only requires one such path.

4 Likes

Also: while HTTPS is a good practice, it is no substitute for verifying hashes of packages before they are to be installed. Forged SSL certificates are known to exist (e.g. for gmail in 2011).

Another way of framing thinking about security is to follow the Qubes philosophy of “distrusting the infrastructure.” More on this at Frequently asked questions (FAQ) | Qubes OS

A related attack vector would be to target and compromise a build machine that creates binaries, thus leading to compromised versions of julia or important binary dependencies being distributed. One way to mitigate this is to work toward reproducible builds.

5 Likes

As an addition: There are lots of things one can do wrong when verifying hashes/signatures of packages (see hilarious android vulnerabilities where the sig-check and installer used different pkzip implementations that differed for ambiguous files). One way that is very hard to get wrong is used by eg chrome extensions (.crx files): Hash and sign the zip / tarball, verify before parsing, see http://www.adambarth.com/experimental/crx/docs/crx.html. They solve key distribution (the CA problem) in the very elegant way that the uuid (the true name) of the package is a cryptographic hash of the single public key that is authorized to sign it and/or its updates. This allows a very simple (and near impossible to mess up) verification flow for updates. I don’t think we can actually do this, but it is a masterpiece of defensive design.

Attack: Malware binary dependency

Mitigation: ? Keep an up to date virus scanner active, probably not foolproof

Prevention: Maybe something similar to the approach for package security, keeping a whitelist of allowable binaries and their fingerprints

A secondary concern would be binaries that are relatively safe themselves but then download other binaries, usually a virus scanner will pick this up though

Virus-scanners are not a “probably not foolproof” mitigation for malware. They are a good tool for detecting past compromises (run new virus scanner on old backup / memory dumps), noise reduction (run on email server), and reduction of compromises for people who ask to get compromised (eg people who torrent binaries).

Detecting past compromises is important: Having someone own your network for 6 months is preferable to having someone own your network for 5 years. Most importantly, this produces a very positive shift in the game theory: Deploying your fancy rootkit and getting caught has real consequences, even for nation states: somebody writes a signature, and all your other deployments get found out, which gets you kicked off the network, simplifies attribution (same actor at many sites) and possibly has political fallout.

But your point is important: How are binary deps currently handled?

I see three ways: First, distribute source or blob, write build script. Handle the same as julia files. Second, don’t handle at all. It is the user’s job to install the shared library in a way that libdl can find it. Third, use something like BinDeps and download source or binary during build. This is a problem.

Simple idea: Require that packages declare in the registry/metadata, whether they download sources or binaries during their build process (triggering an aptget install that asks the user for permission does not count as that). Have the user-interface for pkg reflect that (“pkg xyz will download executable code from the internet, outside of anybody’s oversight. Proceed? [yN]”). Enforcement could be semi-formal (packages that do unauthorized downloads get yanked).

Simple mitigation with virus-scanning: Push all packages (in tarball form) to virustotal, and regularly do this with historic versions (this is the registry’s job, not the user’s job). If some package distributes binary blobs and their buildserver got powned into introducing malware, then we at least have a chance of detecting the comromise, if the attack was automated and not targeted. And we have a decent chance of detecting the compromise a year after the fact, which is far better than never.

For that to happen I think we first need a robust metadata schema for packages. For example, a way to validate and verify Project.toml metadata

name = MyPkg
uuid = xxxx...
license = SPDX
author = "Surname, FirstName and Surname, FirstName and Company"
maintainer = "Surname, FirstName"
tags = ...

Maybe have authors/maintainers registered in the registry with basic info that ID/allows communication and then have them sign off on the release / verify metadata.
Without the information to contact / verify identities I find it hard to achieve the other systems. That would also allow to register changes to the package maintainers.

How far is this getting along? Imho the typo-squatting one is definitely important as the ecosystem grows.

Some heuristic might also flag new ‘suspicious’ repo’s for human review. E.g. packages with admin rights requests. Not watertight of course: https://github.com/search?utf8=✓&q=sudo+extension%3A.jl+language%3AJulia+fork%3Atrue&type=Code&ref=advsearch&l=Julia&l=

I don’t think that any build code should do this. Either use the existing mechanisms to get binary blobs, or print a message for the user to install something (usually makes sense on Linux). Fortunately, the practice is not that widespread.

1 Like

For an attacker it matters little whether it’s build in. If the vector of typo-squatting is there it’s up to the imagination of the attacker to exploit it. I’m sure there are some creative minds that will find an interesting way to exploit this.

So to come back to the subject: How far is this getting along? Could there be some help with developing a typo-squatting countermeasure? If so where should I look and who should I talk to? I have little Julia experience but a lot of coding and security experience so this could be a good opportunity to contribute :slight_smile:

There are lots of ways to compare strings. We could employ a standard score for a package name rather then a direct lookup. Levenshtien distance is pretty dang good for this. Say a levenshtien of the lowercase package name and all packages in the directory is <95% then it can be registered or something similar. Because levenshtien is pretty slow we could use a heuristic character level model for finding the first 100 closest candidates (could be a CNN, RNN, ruleset etc). Basically this would also encourage new registers to think of good names so we don’t get 20 people with packages like “StaticArrays” “StaticyArrays” “StatikArrays” etc. Could clean things up.

2 Likes

Could we make it so that the package manager generates a security log for what updates happened and who the contributors are (change the color in the REPL output of new contributor names maybe). Then it is easier to get an overview of the history and who contributed, and if there are new contributors who need to be evaluated for trust locally. Then, you could also make your own local blacklist of certain contributors to keep out, if they are known to be malicious in the past.

In general, it would just be nice to have a log I can look at to see what kind of pkg changes are happening and who is active in the community.

3 Likes