Trusted libraries/packages - "package managers are evil"

We implicitly trust all packages we use, for security (and bugs).

I realized maybe we shouldn’t and packages should be untrusted by default.

How could you do it? I see two ways.

you could do:

using trusted <package>

using untrusted <package>

for that same package and without specifying the default would be trusted, status quo for now, maybe later untrusted and that would mean the code runs in a sandbox. I.e. using DistributedNext.jl I guess, where it has no file access or network access, and likely no ccall capability. But it doesn’t need to be bounds checked there, in that subprocess. And the API to call the package’s methods is the same.

For your main file, your application code for maybe a GUI application, with your passwords and your bitcoin keys or something sensitive it all stays there in the main Julia process, and your untrusted libraries/packages can never see unless you sent it to them, and why would you?

npm had yet another security issue recently, worse than before (I’ve not looked much into it), but it got me looking at (or actually the Youtube conversation linked from there):

Odin language developer claims package managers are evil, we’re automating dependency hell. So if/since Julia has the best package manager, are we most evil, or somehow better…?

Erlang is probably better with separation, and Elixir based on it, and Elixir developer there in the conversation. I think Erlang doesn’t use [Unix] processes, or basically implements similar, likely enforcing bound checks. That would also be one way to do this.

So far I’ve really liked that Julia isn’t a VM/emulated or JITed like Java (and calling Python is in-process), but even Java trusts all its dependencies. And we trust e.g. the full Python package ecosystem too with PythonCall.jl. That might be a good package to start with untrusted by default.

2 Likes

See e.g. How to sandbox Julia code or RFC / Discussion: Security and Julia · Issue #9744 · JuliaLang/julia · GitHub

TLDR — sandboxing Julia packages is not really on the horizon.

Their argument is basically not to have dependencies that you don’t copy and distribute (“vendor”) yourself. Good luck with that. (Of course, this can be practical for a narrow set of applications.)

4 Likes

I like your link and text here:

Even Java gave up on sandboxing because it proved unworkable in practice to close all the security holes. JavaScript is only able to maintain a reasonable level of sandbox security because there is a huge industry devoted to continually vetting and patching its implementations.

I think they failed since it’s one large sandbox. I mean the top process wouldn’t be (or you could with docker, or snap or something for it too), I’m thinking for dependencies for now only, for the developer’s sanity mostly, not end user safety (or the end user would still trust the main developer, not all the others).

If you use the OS protection, i.e. processes with disabled file descriptors, then there’s not that much to implement in Julia, only reuse what the OS provides. [When the article from 2014 was written sandboxed processes were not a thing (now avaoible in Linux, and think Windows too, all, also macOS?), nor do I think web browsers used OS subprocesses for safety. So they were trying to secure their sandbox with it in-process in the web browser, much harder, not relying on page faults.]

Disabling ccall isn’t possible right now, but is very hypothetically possible. Nor file system/network access (only with outside tools/help from OS). But it would be hugely breaking, unless opted into, why I do not ask for it for the main Julia process/script. I just think very many dependencies would be ok with it. We already can disable out of bounds, but that security breaks as soon as you do ccall…

I know you can sandbox right now, I’m mostly thinking with syntax and semantics for using it would be easier for developers, and also controllable. You have a tree of dependencies, so there are some issues like would you have a tree of processes or reuse one for all of them. And what happens if they print to stdout (or just for println debugging), but to start that could be simply disabled as a possibility, then just abort…

Just spitballing here, but if more of Julia could be compiled to WASM, then maybe “untrusted” dependencies could run in a WASM runtime with custom System Interfaces that allows specified side effects, but not others.

4 Likes

Or perhaps riscv or ebpf. Another approach might plug into Julia’s compiler, so use Julia IR instead of some standard representation :man_shrugging:

I don’t know enough about the IR to know how easy it would be to comprehensively check for all network interactions, as an example. But if that’s feasible, then it would definitely be better than introducing another runtime to manage it!

One way to implement this would be a curated package registry where code has undergone a review. It slightly reminds me of the JuliaPro offering by JuliaHub.

Who do you trust at the end of the day?

3 Likes

Me neither, but that’s not quite what I had in mind. I think in principle it’s possible to use the Julia compiler as basically an abstract interpretation framework. So basically what I’m (ignorantly) proposing is just creating something like a DSL for operating system interaction that would share some of the same compiler code.

However, eBPF (ignorantly) seems like the better approach, because it’s more standard and well-understood/researched. So run an eBPF virtual machine in userspace (not supposed to be Linux-specific).

1 Like

I don’t see it flying. Even if code reviewed, it needs to be continuous, and is brittle. Even if you make a seemingly small change, use a similar but different dependency (same or different API, or any code change), say any of your dependencies switch from JSON.jl to JSON3.jl, then you would need to review JSON3 and all its dependencies, you always need to review the whole tree of your dependencies, NOT allow them calling untrusted or using ccall (currently not possible, if allowed you need to inspect arbitrary machine code binary or at least e.g. C code e.g. a dependency of Python if you use with Julia or for any jll package), and it needs only one bug or intentional attack in any dependency making your code/app unsafe, and potentially steals your bitcoin keys or passwords… And we do not want to enforce out-of-bounds checking, because it’s faster to not check and very often done selectively.

A lot of code isn’t speed-critical, and calling such dependencies out-of process would be good and safer. I think also in many cases faster, if you allow no out-of-bounds checking there, compared to enforcing it in full for your whole app.

I don’t think eBPF virtual machine or anything potentially slow needs to be used, the subprocess is fast, it’s only passing data to from it potentially slow. For a top process, e.g. GUI app, there may not be a lot of data flowing from it.

One possibility, but I think not needed. If you want a sandbox anyway, then it may as well be a different process (and maybe you were thinking that anyway), but it can simply be x86 or whatever.

Erlang has something similar, but fault tolerance is their main thing, and NOT using [Unix] processes as too heavy for them. I’m not sure we would need a tree of processes, one main, and one for the rest combined, all the untrusted dependencies. Trusted packages/modules could share the top process.

The subprocess could even run on a different machine for those very paranoid.

The OS is supposed to provide safety, barriers between processes, and that may even be a false sense of security.

Meltdown and Spectre exploit critical vulnerabilities in modern processors. These hardware vulnerabilities allow programs to steal data which is currently processed on the computer. While programs are typically not permitted to read data from other programs, a malicious program can exploit Meltdown and Spectre to get hold of secrets stored in the memory of other running programs.

You have to be very paranoid to worry about this, and there’s I think no good solution, except the process on a different computer. But at least even if on same computer, it can’t modify the top process or vice versa. But reading memory across processes is already very bad. But it raises the bar a lot, not sure how many know how to exploit this or how often it’s done (nobody knows if it ever has happened).

2 Likes

Excuse my ignorance, but can the OS start a process without access to the network? If it can, how would that work and still allow you to execute the untrusted dependency on another node?

It’s possible, see B.

Good question, my first idea was to block all file I/O i.e. in the subprocess, and with it networking (seemingly how it’s implemented, see what info I got from AI):

System Calls: A process makes a system call (e.g., socket(), send(), recv()) to request network services from the kernel.

  • Socket Abstraction: The kernel creates a socket, which acts as an endpoint for network communication. This socket is represented by a file descriptor

A.
If the process is on another machine, then yes, that would be problematic. Even if you can’t get the kernel to block socket() etc. to call it from a user process, do do it you need to call the libc library (or you bypass it and call the kernel directly by a syscall). Regular code can’t do that if ccall is disabled.

If you have no way of generating arbitrary assembly/machine code, or LLVM [intrinsics] to do that, you can’t implement ccall (even if the process is otherwise allowed to do) or a syscall. You can usually read code pages, but not write them, and you can’t write code, and then jump to it since those pages need to be executable. Also it’s rather tricky to jump to an arbitrary code address even if you could, but that would also be blocked, but isn’t necessary.

You would only have access to APIs Julia has (in Base) and it could call approved socket for you, and initialize it for you).

B.
If you want the process on another machine, with no file or network capability, and the kernel itself can block access (I’m assuming there’s such a possibility, I think on Linux at least), it’s still possible!

Your master process A, calls process B intermediately on the other machine and it talks to process C on that same machine which doesn’t have any I/O otherwise.

But you need to implement that talking between B and C, and it can use shared memory (and/or IPC, I’m not sure, I think IPC may always be implemented that way) since running on the same machine. Processes have distinct non-overlapping address spaces, and the hardware (and OS) makes it so. Shared memory is an exception you can set up, and implement this talking with it.

One other thing:
One problem is that different processes feel memory pressure differently. On the same machine you want them to feel it across processes (ideally, e.g. JavaCall.jl already does work despite the JVM you call being in different process, and not having best support for this, i.e. what I describe next), and it’s actually possible with MemBalancer (very cool):

this new heap limit lead to big reductions in memory usage for Chrome’s V8 JavaScript runtime.

Currently being implemented by firefox, chromium, mmtk, and julia

Well it was implemented in Julia, then reverted, I expect we will use it again.

I’m doubt if MemBalancer works across machines, nor that it needs to, since they have distinct physical RAM, possibly unequal amount on the machines and that’s ok, seemingly not helpful at all to feel the pressure on the other machine. But note at least you could run out on either machine.

First impressions of that blogpost, I don’t think everyone copying exact versions of dependencies into their projects is actually practical. For example, if I need to copy 2 packages that each copied different versions of a 3rd dependency, I’m just supposed to guess a compatible version on my own or somehow isolate the 2 versions of that 3rd dependency? How is that any better than letting a package manager figure out a probably compatible version for me before I test?

It however does seem correct to point out that using a package manager does practically force users to just trust a bunch of downloaded content that would execute practically arbitrary code during building or usage. Even after Artifacts started handling binary dependencies from other languages, Julia still does a bit of the former with automatic precompilation. Is it possible for Pkg to figure out versions of packages added to an environment without downloading the contents, or just the Project.toml files? Then we could just go down the list of the Manifest.toml file and validate dependencies one by one if we really want to.

The comment on updates also makes me consider the assumption that minor revisions and especially patches are compatible. That’s the goal, but in practice, package developers can introduce a bug that their tests miss until a dependent finds it the hard way, and it’s often very hard for a dependent to track down the cause, especially if they’re not a developer. A user could revert to previous manifests (unless malware did something worse) or just never update a satisfactory environment, but is there anything a package developer could do to guard against dependency updates or is that just a necessary risk? I imagine that if the dependencies of a package were all locked down to exact versions, it would just be too easily incompatible with other packages again.

1 Like

I don’t think that the premise of that conversation (rant?) applies to most people in the Julia community. Consider eg

we use currently use SDL2 for our windowing stuff at work, and we have found a huge amount of bugs and we hate it to the point that I/we will probably write our own window and input handling system from scratch for each Operating System we target.

So they found bugs and they are unhappy. Yes, all practical software has bugs, but the important part is what happens when you find a bug.

Practically, when I find a bug in an key Julia package (like an optimization library, an AD framework, or tooling like Revise.jl), I get an almost instantaneous reply from the maintainer. If I need help isolating an MWE, I get handholding. If the bug is fixable, it is usually fixed rather quickly, though sometime you have to wait for another package or Julia to catch up. In that case I can roll back, pin, write my own workaround, go back to a last known good state, etc.

In the meantime, the package ecosystem delivers updates for me without manual intervention. I find that great. In scientific code I usually unit test a lot of things for correctness (eg invariants of various computations, comparison to models with a known closed form, etc), and I only commit the changes in the Manifest.foml if tests pass. 99% of the time this means bugfixes, performance improvements, and my life being made easier.

I don’t want to get into “vendoring” my own AD system or CSV reader, thank you so much. That would be a full-time job. I want to benefit from cooperation with other people who share some of my goals and are similarly willing to contribute their time and knowledge. Julia’s package system is basically a thin veneer on social collaboration, and you would have to pry it from my cold, dead fingers.

7 Likes