Pkg ecosystem: Learning from other's mistakes

mauro3 · December 1, 2018, 8:27pm

One of the ideas in this thread was that you’d trust some developers, and any commits they do (including merging PRs), you’d trust too. Recursively applying this to dependencies would then be a way to determine whether you’d trust a package or not. Maybe not as “save” as a third party audit, be it by person or by some sophisticated software, but something much more realistic, at least in the short to intermediate time-frame.

Per · December 2, 2018, 7:22am

Submitting a package means making your exploitation mechanism public, and while you wait three days to see if your new package got accepted, you’re running the risk that somebody spots the malicious code and updates the scanner.

If the package gets rejected with a motivation like “It is not clear that this package does something useful” you have no idea whether a scanner flagged it or not. Unless you spend the time an effort to write a package that is clearly useful for each variation of your exploit that you want to test.

Tamas_Papp · December 2, 2018, 7:40am

Trust is not an necessarily absolute concept. A procedure that significantly reduces the probability of adverse events can be useful even if it does not completely eliminate them, especially if the alternatives are much more costly.

It is easy to go overboard with suggested security measures, especially after witnessing breaches. Frankly, I am skeptical that we have the community resources at this point to implement serious auditing for a nontrivial set of packages, but that does not preclude a simpler mechanism from being immediately beneficial.

Per · December 2, 2018, 7:54am

I’m not sure if it is possible to write an open source scanner that can measure this reliably. For example, I might initialize my package with

if some_condition
    f = Sockets.connect
else
    f = innocent_function
end

where some_condition is something that I’ve calibrated to be true for as many users as possible, while being false on the open-source scanner.

Per · December 2, 2018, 10:29am

That is a very important point. Basically, it boils down to the ratio of invested effort to expected payoff for the malware writer. The goal is to keep this ratio higher for attacks on Julia than for other ways of making money.

taqtiqa-mark · December 3, 2018, 2:29am

Thanks for all the effort that has gone into making Julia, and for these open conversations. As other’s have noted about themselves, I’ like to say I am not a Julia Guru and have not internalized the Pkg internals.
In good faith:

TL;DR: Pkg supports OpenBSD type secure distribution workflows, i.e using a signify|minisign type tools/library and permit self trusted updates/upgrades of packages.

Feature List:

Optional: No change to existing packages unless the package author wants to. This proposal is for the Pkg tool to support the OpenBSD type secure workflows.
Simple: Only support one algorithm; X25519/ED25519.
Secure upgrades: Each signed-package will only upgrade if it contains the public key for the upgrade version
Usable signatures: Signed packages will have signatures compatible with being verified by a human
Notion of curated/trusted/maintained/active packages exist at the package level only and does not leak outside a package.
Introduces package Signatory: Unsigned upstream packages can be freely adopted in secure workflows by signing (and maintaining/managing) the distribution of your own fork. Package names and versions can be the same and differ only by package signatory.

Feature detail. Quotes from here:

Secure upgrades:

After each release of ~~OpenBSD~~ a Julia package, we generate a new key pair for the release after next. That’s plus two. For example, after 5.6 was released, keys for 5.8 were generated. This way, the 5.8 keys are then included in the 5.7 release.
So, if you upgrade every release, you will have an unbroken chain of keys back to your initial installation. We don’t directly sign keys with keys, however, but the next key is implicitly signed by its inclusion in a signed release. Each key is tied to a release and only used for artifacts relating to that release.

Usable signatures: The full signature can easily be verified/used.

Here’s the /etc/signify/openbsd-57-base.pub file from my system.
untrusted comment: openbsd 5.7 base public key
RWSvUZXnw9gUb70PdeSNnpSmodCyIPJEGN1wWr+6Time1eP7KiWJ5eAM

Additionally, Julia’s packaging system has an automatic secure-upgrade path established if we need to switch to a different algorithm than X25519/ED25519.

Adopt the X25519/ED25519 algorithm (alone) and put in place the data for the emergency when you have to abandon that choice because the algorithm is compromised.
This means Pkg will have put in place mechanisms to deal with the emergency that comes once in ‘the heat death of one universe’

This proposal should:

a. Isolate transport-protocol/source-of-package questions/issues from trust of package questions/issues. That is: I don’t care how a package gets to me or where it comes from. I care only that this is an upgrade from the same source (private key holder) as the current version.
b. Allow me to easily trust a package and once trusted, allow that package to contain the public keys that then permit auto-trusted updates/upgrades.
c. Restrict the notions of curated/maitained/active/trusted to the package level and not impose burdens on the community outside of that package’s maintainer(s). That is if you don’t want the burdens that come with saying this package is maintained/curated/active, don’t make it a signed package.
d. Enhance the no stdlib/base philosophy, but still allow peace-of-mind knowing that package upgrades will only come from the original ~~maintainer~~ Signatory.
e. Allow for the possibility of signed packages sharing name-version space. The full signature (e.g. RWSvUZXnw9gUb70PdeSNnpSmodCyIPJEGN1wWr+6Time1eP7KiWJ5eAM) breaks ties

Background, these discussions:
Discourse thread
Ephemaral slack thread

I doubt I could say more than the insight you will get from reading these sources, [1], [2], [3] and [4]
[1]: signify: Securing OpenBSD From Us To You
[2]: signify - sign and verify
[3]: Minisign by Frank Denis
[4]: GitHub - aperezdc/signify: OpenBSD tool to sign and verify signatures on files. Portable version.

Hope that helps?

taqtiqa-mark · December 3, 2018, 11:19pm

Some point were raised in a Slack thread:

Stefan Karpinski [9:40 AM]
have to read up on signify but I appreciate that the proposal is specific and just about making sure that the code people get is the code they meant to get

Yes, Pkg would need to have some of the functionality in signify - or use a package that provides it. Given the use case is focused maybe there is less work involved than writing signify|minisign? Nonetheless, I think there is a substantial amount of effort involved.
Hopefully the benefits are recognized as outweighing the costs?

At the risk of being pedantic: rather than “making sure that the code people get is the code they meant to get”, the proposal is really trying to bring trust-attention away from the code (that is addressed by unit tests/coverage sites), away from the location of the code or registry, and focus on the ~~maintainer(s)~~ package signatory(ies). That is: “making sure that the code people get is the code released by the ~~maintainer(s)~~ signatory that released the current version of code”.

José Bayoán Santiago Calderón:
Don’t we currently have a form of it by the Git hash?

I could be wrong, but I believe the key idea is the “look ahead” key generation. Does Pkg use the git hash in such a way that the user is assured the current package update was produced/released/signed by the same maintainer(s) that produced/released/signed the currently installed version? Also, reasonable or not, things such as http://shattered.io/ mean that in some organizations anything SHA1 is barred.

Stefan Karpinski [9:42 AM]
if you trust the content of the registry and verify the SHA1 hash, then yes, we get some of it
this appears to not require the registry, however, which is nice

The absence of key-servers/registries is a nice side-effect of forcing trust-attention to be where it matters - at the ~~maintainer(s)~~ signatory level and not the registry level. I should be able to arbitrarily switch registries/protocols etc., but not arbitrarily switch ~~signers/maintainer(s)~~ signatory, and have my packages just-work.

In one way a registry could come back… ~~Maintainer~~Signatory-Registry: verifying the initial key that kicks off a signed-package chain.
But that is a substantial issue to be debated.
For the record I’m against requiring such a ~~maintainer~~signatory-registry being present - but can see arguments why you might want that. Maybe this can be solved without a ~~maintainer~~signatory-registry in a way that prevents (makes difficult/risky) trusted ~~maintainer~~signatory handing over to un-trusted ~~maintainer~~signatory?

Essentially such ~~maintainer~~signatory-registries would always be consumed by a human and never a script. They would be a common location users come to break (excuse the pun) the chicken-egg problem of not having a trusted key and not having a trusted installation of the signed-packaged. Since the full keys are 56-characters long they can be verified by inspection, but a wav file or some such convenience might help.

Tamas_Papp · December 4, 2018, 9:09am

I think this is a valid point. Note the two examples that started this topic, my impression is that your proposal would not protect against either of them: handing over the repo would presumably involve handing over the keys (otherwise, why not just fork?), and removing code from the registry is an orthogonal problem to verifying sources.

I truly admire the OpenBSD mindset, and signify appears to be a neat solution for protection against an adversary who is motivated and resourceful enough for MITM-style attacks in the package distribution framework. However, I think that the concern in this topic is about something much more basic.

taqtiqa-mark · December 5, 2018, 12:00am

Thanks for considering this.

Apologies for the confusion, my description wasn’t clear/precise enough. Also apologies for the wall of text…

handing over the repo would presumably involve handing over the keys (otherwise, why not just fork?)

I’m not sure why this presumption holds. Can you elaborate?
No the keys should/would not be added to the repo.

Is any proposal possible that does not suffer from the observation “This can be circumvented if someone wants to?”
Actually, I agree with your earlier comment here:

Tamas_Papp
A procedure that significantly reduces the probability of adverse events can be useful even if it does not completely eliminate them, especially if the alternatives are much more costly.

Back to your most recent points raised:

Note the two examples that started this topic, my impression is that your proposal would not protect against either of them

Without presuming the signatory’s desire to provide a signed package that allows self trusted updates is consciously thwarted by the same signatory: Can you elaborate on where/how the proposal breaks down for the examples.

My understanding is the proposal resolves this scenario in the event-stream example*: Maintainer/Signatory A casually hands over a package repository to Signatory/Maintainer B. B makes changes to the package and pushes a new version. All users who installed A’s package silently get B’s changes when they update.
Under this proposal, as long as users installed A’s signed release (which contains A’s public keys for the next N releases), or signed their own instance of it, they will not get B’s changes.

Note I am not saying A would be doing anything terrible. Responsibility is B’s alone. So this proposal still allows even encourages such casual handovers - the proposal allows 1) A to guard his users by making a signed self-trusting-updates-from-A release
, 2) A’s users to guard themselves if A does not choose to make such a self-trusting release.

If this proposal was to become the default distribution for packages, and not optional, guarding against people circumventing the guard-rails is justified as a first order priority. I don’t think we are at that point, but it is worthwhile exploring how the foreseeable issues could be addressed.

The Pkg logic could be extended to ensure the Pkg signing key is:

Not saved under the project folder.
Removed from the repository history.

or Pkg refuses to create the signed release.

I’m not convinced the effort would be justified until it is observed/demonstrated the risk is real.

Like all security/assurance measures, they can be circumvented - but you’d have to hack the Pkg code… Ideally the Pkg signing/packaging logic would only be available when Pkg itself is installed as a signed packaged, and the installed code verified before a signed package is created. Again I think we are getting ahead of ourselves, but this is do-able.

Again, is this worth the effort until it is a known issue that people are hacking Pkg to make it save the signing key inside the repository and distribute it?

[*] NOTE: It would still be possible for B to distribute their changes if A does the following non-trivial setup and non-casual handover:

A set up a persistent VM that is used to sign the package, and stores the signing keys outside the package repository.
A hands over to B the persistent VM as well as the package repository.

As I said every proposal will be vulnerable to the criticism “But this can be circumvented”.

foobar_lv2 · December 5, 2018, 9:55am

The signing keys will live in some file. OpenSSH is a good role-model for how this should work (passphrases, human-readable file format, easy migration and backup, etc). No VM involved.

For obvious reasons, we would expect packages to be signed with a package-specific key; potentially the key is also author-specific (but not all packages by $author use the same key). Experience shows that private credentials show up on public github all the time. In the NPM backdoor example, we would have expected author A to hand over the signing keys to author B, along with github permissions.

Experience also shows that private keys get lost all the time (failed backups, etc). Do we want packages to be dead (replaced by fork) whenever this happens, with no way of human intervention to fix this error? Such an event has downstream consequences. Baking things into a protocol without human override is always a risky move.

StefanKarpinski · December 5, 2018, 6:00pm

I have started Pkg: attack vectors to discuss package ecosystem attack vectors. Please only post attack vectors, however, not general discussion or spitballing of ideas.

non-Jedi · January 3, 2019, 3:23pm

On the subject of the general problem of code being reviewed and establishing a web of trust or similar for reviewed code and whether it’s malicious, I recently ran across crev. Quoting from the README:

Using crev you can generate cryptographically signed artifacts ( Proofs ). Proofs can share details of code reviews (whole releases, parts of the code), or specifying trust (or mistrust) into reviews of other people.

The system is designed to be language-agnostic. Might be worth investigating some way of nicely integrating with Julia packaging. The big thing to do other than general tooling for reviewing is probably creating some mechanism by which users can attach code reviews to packages without the creators consent in the package registry (General and any others).

StefanKarpinski · January 3, 2019, 4:04pm

It’s a very interesting project and similar to what I had in mind. It appears to be in its early stages, but collaboration with the Rust community might well be fruitful. Certainly worth looking into.

mauro3 · January 3, 2019, 4:39pm

There is also a blog post on crev cargo-crev and Rust 2019 fearless code reuse — Dawid Ciężarkiewicz aka `dpc` giving a short introduction on how it works.

Topic		Replies	Views
Does the NPM faker/colors story relate to/affect the Julia package system? Package Management	1	394	January 10, 2022
Pkg: attack vectors Internals & Design security , package-manager	25	3786	September 17, 2020
Pkg Ecosystem Statistics Tooling	2	1020	October 20, 2020
Stating my opininon Community releases	37	3453	October 10, 2018
Was: Inconsistent behavior of `sum` and `mean` Community	5	1085	December 28, 2016

Pkg ecosystem: Learning from other's mistakes

Related topics