Pkg: attack vectors

StefanKarpinski · December 5, 2018, 5:48pm

Pkg ecosystem: Learning from other's mistakes has gotten a bit long and wandering, as these discussions tend to. I’d like to have a very focused thread about attack vectors against the security of the package ecosystem. To that end, here are the attack vectors I’ve come up with so far. What are some other attack vectors?

Attack: find an existing bug in some package that you can exploit

Mitigation:

fix the bug
yank versions that have it

Prevention:

testing
fuzzing
basically anything that improves program correctness in general

Attack: create a back door through normal development process

Mitigation:

close the back door
yank versions that have it
blacklist the person who created it

Prevention:

identify risky changes and bring more attention to them to check if they’re malicious
use signatures to hold people responsible

Attack: introduce a back door by replacing an innocent package version with a malicious one

Mitigation:

see prevention
is there any other mitigation step here?

Prevention:

serve code from trusted servers over secure protocols (e.g. HTTPS)
identify versions by permanent secure hashes and don’t allow them to be changed
verify that code has correct hashes on installation
be ready to use newer hashes when old ones reach end of life (e.g. SHA1)

Attack: typo squatting

Mitigation:

delete squatting packages from registries

Prevention:

review names that might be squatting
use spell checking to catch likely typos
see https://github.com/JuliaLang/Pkg.jl/issues/931 for details

Attack: package deletion

Mitigation:

find a fork of the package
make the fork the official repo

Prevention:

automatically fork all registered packages
allow installation of packages from the automatic forks

fredrikekre · December 5, 2018, 6:02pm

For those wondering what “yanking” a version means; see Publishing on crates.io - The Cargo Book which we have just mimicked with https://github.com/JuliaLang/Pkg.jl/pull/726

foobar_lv2 · December 5, 2018, 6:37pm

Attack: Yank package with lots of downstream in a temper tantrum, effectively leading to a lot of immediate downstream trouble (pkg cannot install many packages until they update). Aka leftpad.

Mitigation:

Restore the package in the registry, through human intervention
Fix all downstream to either inline the functionality or instead rely on a fork
If large parts of the ecosystem are downstream of leftpad-style packages, nuke it from orbit. It’s the only way to be sure

Prevention:

Don’t give authors unilateral power to yank packages they own
Don’t let large parts of the ecosystem sit downstream of semi-maintained packages that nobody knows about
Decide on official recommendations / defaults with regards to version pinning

adamslc · December 5, 2018, 6:38pm

The recent NPN attack would probably fall into the second category (introduce a backdoor through normal development), but the listed prevention steps wouldn’t prevent such an attack on a Julia package.

Maybe an additional prevention item should be to provide tight compatibility requirements for dependencies? Unfortunately, that would mean no carat or tilde version specifiers, and a whole lot more work for package authors and registry maintainers.

StefanKarpinski · December 5, 2018, 7:09pm

Good point. Authors already don’t have unilateral ability to yank anything—the only way to yank anything currently is make a manual change to the registry, which only people with permission to update the registry can do. What package owners can do, however, is delete repositories on GitHub. We do need a backup mechanism where we maintain permanent forks of all packages.

fredrikekre · December 5, 2018, 7:10pm

The idea is that yanking is a tool for the registry maintainer, not the package author. The package author can of course create a PR to yank a release, but it is then up to the maintainer to merge.

StefanKarpinski · December 5, 2018, 7:31pm

It would—that was the category of attack I saw the NPM situation falling into.

but the listed prevention steps wouldn’t prevent such an attack on a Julia package.

It would, given the right interpretation of the admittedly vague prevention:

identify risky changes and bring more attention to them to check if they’re malicious

In the NPM situation, the risky changes were the ones introduced by a previously unknown maintainer which people had no reason to trust. The main failure was that hardly anyone was aware of the transition of maintainership; had more people been made aware, they could have looked at the changes and, given enough eyes, someone might have caught it.

Accordingly, a good trust system to protect against this class of attacks, is one which would bring new versions of packages authored by unknown individuals that have no systemic trust to the attention of many people, giving them the option to:

Choose not to use the new versions.
Choose to trust the new versions blindly.
Review the new versions and, if they look ok, vouch for them.

Of course, the value of such a voucher depends on whether you trust the person reviewing. Review vouchers should be sharable so that everyone benefits.

Given such a system, the process in the NPM situation would be:

New maintainer release a new version
People updating get prompted when they upgrade
The default is “don’t use”—most people do that
Some people review the new code
Given enough reviewers, someone discovers the backdoor.

In this workflow, the only people who get compromised are people who either actively opted to trust blindly or did a review and missed the backdoor. In either case, those people can really only hold themselves responsible in a very direct way.

In the much more common situation of a new maintainer who is not malicious, the workflow would be this instead:

New maintainer release a new version
People updating get prompted when they upgrade
The default is “don’t use”—most people do that
Some people check out the new versions
Given enough reviewers, who vouch for it, the code release becomes trusted
Making a release that becomes trusted increases trust in the new maintainer
After a while their code no longer needs to go through this process.

Maybe an additional prevention item should be to provide tight compatibility requirements for dependencies? Unfortunately, that would mean no carat or tilde version specifiers, and a whole lot more work for package authors and registry maintainers.

I don’t see how compatibility requirements are relevant here. NPM uses exact dependency versions and that didn’t help.

foobar_lv2 · December 5, 2018, 7:39pm

Additional prevention steps for bugs and backdoors (I guess long discussions of pro/contra for each of them is out-of-scope for this laudably focused thread, but listing them as options to maybe explore should be ok?):

Make it easier for downstream to figure out which packages/versions are “well-maintained”. E.g. by automated processes (I dunno, people voting on packages? github stars?) or by shipping an opinionated smaller registry alongside the large registry (my personal favorite).
Have some semi-formal “declaration of intent” with respect to maintenance for packages. Idea is that prospective downstream can easily see that relying on a package is a questionable idea, if the package authors/owners themselves claim that their package is probably not going to be maintained because they moved on or this was a one-shot PoC. Additionally, this can give guidance for package authors on how to pass on the torch (try not violate downstream trust into your own words; but life happens).
Have a way to trigger a “cry for help” from package owners: If they decide that they can’t keep up, then their downstream needs to be notified in order to either help out, fork, or switch out for a different provider of the functionality.
Attack surface reduction: Introduce a new type of “header package” that cannot contain executable code or exports, only abstract type and function declaration, plus possibly constants out of a very small selection of white-listed types. Reason is that optional dependencies in julia require import of the analog of RecipesBase in order to play with Plots. Header packages have no attack surface, even if they become malicious, and therefore reduce the amount of upstream code that potentially needs to be audited. @anon94023334 always has this issue. This would e.g. allow LightGraphs.jl to ship plotting code for Plots.jl without risking anything for users that don’t plot, even in the unlikely case that the entire plotting ecosystem gets backdoored to hell. This would require a hardened parser for header packages (always assumed to be malicious).

foobar_lv2 · December 5, 2018, 7:55pm

Thanks for the explanation re yanking, I misunderstood that.

Regarding forks: Would it be OK to automatically fork everything from the central registry and have package installs from the central registry be served from the forks?

That way, the registry maintainers can always override anything that package authors do (with respect to users who install from the registry).

Regarding current state of multiple registry support: How is that resolved? E.g. I have two registries; both contain a package called “MyExample.jl”; which one gets precedence? Is the case where both registries contain the identical package resolved painlessly (i.e. without additional user intervention)? Can a package have deps that are in a different registry, and what happens then?

Sorry for the somewhat naive questions regarding current pkg design.

At the risk of pointing out the obvious: (1) there is such a thing as “warning fatigue”. Asking end-users to review code that is 3 levels upstream of the package they use is questionable: “MBedTLS.jl has a new maintainer! update, ignore, review diff? [uIr]” for people who just want to plot something in ijulia. The ijulia maintainers are the people who are qualified to review. (2) There is a trade-off between backdoors (default: don’t update) and ordinary security bugs (install all critical security updates as quick as possible, bad guys are warming up their port-scanners NOW).

garrison · December 5, 2018, 9:50pm

While Pkg may do this, I’m not certain that the BinDeps actually does. A few months ago, I tried installing RandomMatrices on a unreliable cellular connection. It failed to download the cmake binary, but the next time I tried to install, it actually unpacked the truncated file and complained about an “unexpected end of file”. And on the third attempt, it complained that the cmake bin directory did not exist in the unpacked directory tree. Ideally these operations would have been (i) atomic and (ii) the hashes would have been checked at the beginning. This was on Sep 20, so things may have been fixed since then. Full error log at gist:c41563adf3cc3dbb5088ab0736e872cc · GitHub

The moral: it may actually be nontrivial to assess whether all relevant code paths are verifying that hashes are correct. A successful attack only requires one such path.

garrison · December 5, 2018, 10:11pm

Also: while HTTPS is a good practice, it is no substitute for verifying hashes of packages before they are to be installed. Forged SSL certificates are known to exist (e.g. for gmail in 2011).

Another way of framing thinking about security is to follow the Qubes philosophy of “distrusting the infrastructure.” More on this at Frequently asked questions (FAQ) | Qubes OS

A related attack vector would be to target and compromise a build machine that creates binaries, thus leading to compromised versions of julia or important binary dependencies being distributed. One way to mitigate this is to work toward reproducible builds.

foobar_lv2 · December 5, 2018, 11:01pm

As an addition: There are lots of things one can do wrong when verifying hashes/signatures of packages (see hilarious android vulnerabilities where the sig-check and installer used different pkzip implementations that differed for ambiguous files). One way that is very hard to get wrong is used by eg chrome extensions (.crx files): Hash and sign the zip / tarball, verify before parsing, see http://www.adambarth.com/experimental/crx/docs/crx.html. They solve key distribution (the CA problem) in the very elegant way that the uuid (the true name) of the package is a cryptographic hash of the single public key that is authorized to sign it and/or its updates. This allows a very simple (and near impossible to mess up) verification flow for updates. I don’t think we can actually do this, but it is a masterpiece of defensive design.

y4lu · December 5, 2018, 11:49pm

Attack: Malware binary dependency

Mitigation: ? Keep an up to date virus scanner active, probably not foolproof

Prevention: Maybe something similar to the approach for package security, keeping a whitelist of allowable binaries and their fingerprints

A secondary concern would be binaries that are relatively safe themselves but then download other binaries, usually a virus scanner will pick this up though

foobar_lv2 · December 6, 2018, 12:59pm

Virus-scanners are not a “probably not foolproof” mitigation for malware. They are a good tool for detecting past compromises (run new virus scanner on old backup / memory dumps), noise reduction (run on email server), and reduction of compromises for people who ask to get compromised (eg people who torrent binaries).

Detecting past compromises is important: Having someone own your network for 6 months is preferable to having someone own your network for 5 years. Most importantly, this produces a very positive shift in the game theory: Deploying your fancy rootkit and getting caught has real consequences, even for nation states: somebody writes a signature, and all your other deployments get found out, which gets you kicked off the network, simplifies attribution (same actor at many sites) and possibly has political fallout.

But your point is important: How are binary deps currently handled?

I see three ways: First, distribute source or blob, write build script. Handle the same as julia files. Second, don’t handle at all. It is the user’s job to install the shared library in a way that libdl can find it. Third, use something like BinDeps and download source or binary during build. This is a problem.

Simple idea: Require that packages declare in the registry/metadata, whether they download sources or binaries during their build process (triggering an aptget install that asks the user for permission does not count as that). Have the user-interface for pkg reflect that (“pkg xyz will download executable code from the internet, outside of anybody’s oversight. Proceed? [yN]”). Enforcement could be semi-formal (packages that do unauthorized downloads get yanked).

Simple mitigation with virus-scanning: Push all packages (in tarball form) to virustotal, and regularly do this with historic versions (this is the registry’s job, not the user’s job). If some package distributes binary blobs and their buildserver got powned into introducing malware, then we at least have a chance of detecting the comromise, if the attack was automated and not targeted. And we have a decent chance of detecting the compromise a year after the fact, which is far better than never.

Nosferican · December 18, 2018, 8:34am

For that to happen I think we first need a robust metadata schema for packages. For example, a way to validate and verify Project.toml metadata

name = MyPkg
uuid = xxxx...
license = SPDX
author = "Surname, FirstName and Surname, FirstName and Company"
maintainer = "Surname, FirstName"
tags = ...

Maybe have authors/maintainers registered in the registry with basic info that ID/allows communication and then have them sign off on the release / verify metadata.
Without the information to contact / verify identities I find it hard to achieve the other systems. That would also allow to register changes to the package maintainers.

dcastel · September 24, 2019, 7:54pm

How far is this getting along? Imho the typo-squatting one is definitely important as the ecosystem grows.

Some heuristic might also flag new ‘suspicious’ repo’s for human review. E.g. packages with admin rights requests. Not watertight of course: https://github.com/search?utf8=✓&q=sudo+extension%3A.jl+language%3AJulia+fork%3Atrue&type=Code&ref=advsearch&l=Julia&l=

Tamas_Papp · September 25, 2019, 6:02am

I don’t think that any build code should do this. Either use the existing mechanisms to get binary blobs, or print a message for the user to install something (usually makes sense on Linux). Fortunately, the practice is not that widespread.

dcastel · December 6, 2019, 8:22am

For an attacker it matters little whether it’s build in. If the vector of typo-squatting is there it’s up to the imagination of the attacker to exploit it. I’m sure there are some creative minds that will find an interesting way to exploit this.

So to come back to the subject: How far is this getting along? Could there be some help with developing a typo-squatting countermeasure? If so where should I look and who should I talk to? I have little Julia experience but a lot of coding and security experience so this could be a good opportunity to contribute

anon92994695 · December 6, 2019, 11:29am

There are lots of ways to compare strings. We could employ a standard score for a package name rather then a direct lookup. Levenshtien distance is pretty dang good for this. Say a levenshtien of the lowercase package name and all packages in the directory is <95% then it can be registered or something similar. Because levenshtien is pretty slow we could use a heuristic character level model for finding the first 100 closest candidates (could be a CNN, RNN, ruleset etc). Basically this would also encourage new registers to think of good names so we don’t get 20 people with packages like “StaticArrays” “StaticyArrays” “StatikArrays” etc. Could clean things up.

chakravala · December 6, 2019, 12:17pm

Could we make it so that the package manager generates a security log for what updates happened and who the contributors are (change the color in the REPL output of new contributor names maybe). Then it is easier to get an overview of the history and who contributed, and if there are new contributors who need to be evaluated for trust locally. Then, you could also make your own local blacklist of certain contributors to keep out, if they are known to be malicious in the past.

In general, it would just be nice to have a log I can look at to see what kind of pkg changes are happening and who is active in the community.

Topic		Replies	Views
Pkg ecosystem: Learning from other's mistakes Community	53	5201	January 3, 2019
Reduce package registration waiting period Meta Discussion	83	3828	June 10, 2020
The present and the future of package registration Package Management	80	2104	June 11, 2023
Julia security advisories New to Julia	23	5682	December 15, 2021
How fresh is the General package registry? Package Management general-registry	7	505	February 20, 2024

Pkg: attack vectors

Related topics