Trusted libraries/packages - "package managers are evil"

We implicitly trust all packages we use, for security (and bugs).

I realized maybe we shouldn’t and packages should be untrusted by default.

How could you do it? I see two ways.

you could do:

using trusted <package>

using untrusted <package>

for that same package and without specifying the default would be trusted, status quo for now, maybe later untrusted and that would mean the code runs in a sandbox. I.e. using DistributedNext.jl I guess, where it has no file access or network access, and likely no ccall capability. But it doesn’t need to be bounds checked there, in that subprocess. And the API to call the package’s methods is the same.

For your main file, your application code for maybe a GUI application, with your passwords and your bitcoin keys or something sensitive it all stays there in the main Julia process, and your untrusted libraries/packages can never see unless you sent it to them, and why would you?

npm had yet another security issue recently, worse than before (I’ve not looked much into it), but it got me looking at (or actually the Youtube conversation linked from there):

Odin language developer claims package managers are evil, we’re automating dependency hell. So if/since Julia has the best package manager, are we most evil, or somehow better…?

Erlang is probably better with separation, and Elixir based on it, and Elixir developer there in the conversation. I think Erlang doesn’t use [Unix] processes, or basically implements similar, likely enforcing bound checks. That would also be one way to do this.

So far I’ve really liked that Julia isn’t a VM/emulated or JITed like Java (and calling Python is in-process), but even Java trusts all its dependencies. And we trust e.g. the full Python package ecosystem too with PythonCall.jl. That might be a good package to start with untrusted by default.

2 Likes

See e.g. How to sandbox Julia code or RFC / Discussion: Security and Julia · Issue #9744 · JuliaLang/julia · GitHub

TLDR — sandboxing Julia packages is not really on the horizon.

Their argument is basically not to have dependencies that you don’t copy and distribute (“vendor”) yourself. Good luck with that. (Of course, this can be practical for a narrow set of applications.)

3 Likes

I like your link and text here:

Even Java gave up on sandboxing because it proved unworkable in practice to close all the security holes. JavaScript is only able to maintain a reasonable level of sandbox security because there is a huge industry devoted to continually vetting and patching its implementations.

I think they failed since it’s one large sandbox. I mean the top process wouldn’t be (or you could with docker, or snap or something for it too), I’m thinking for dependencies for now only, for the developer’s sanity mostly, not end user safety (or the end user would still trust the main developer, not all the others).

If you use the OS protection, i.e. processes with disabled file descriptors, then there’s not that much to implement in Julia, only reuse what the OS provides. [When the article from 2014 was written sandboxed processes were not a thing (now avaoible in Linux, and think Windows too, all, also macOS?), nor do I think web browsers used OS subprocesses for safety. So they were trying to secure their sandbox with it in-process in the web browser, much harder, not relying on page faults.]

Disabling ccall isn’t possible right now, but is very hypothetically possible. Nor file system/network access (only with outside tools/help from OS). But it would be hugely breaking, unless opted into, why I do not ask for it for the main Julia process/script. I just think very many dependencies would be ok with it. We already can disable out of bounds, but that security breaks as soon as you do ccall…

I know you can sandbox right now, I’m mostly thinking with syntax and semantics for using it would be easier for developers, and also controllable. You have a tree of dependencies, so there are some issues like would you have a tree of processes or reuse one for all of them. And what happens if they print to stdout (or just for println debugging), but to start that could be simply disabled as a possibility, then just abort…

Just spitballing here, but if more of Julia could be compiled to WASM, then maybe “untrusted” dependencies could run in a WASM runtime with custom System Interfaces that allows specified side effects, but not others.

4 Likes

Or perhaps riscv or ebpf. Another approach might plug into Julia’s compiler, so use Julia IR instead of some standard representation :man_shrugging:

I don’t know enough about the IR to know how easy it would be to comprehensively check for all network interactions, as an example. But if that’s feasible, then it would definitely be better than introducing another runtime to manage it!

One way to implement this would be a curated package registry where code has undergone a review. It slightly reminds me of the JuliaPro offering by JuliaHub.

Who do you trust at the end of the day?

2 Likes

Me neither, but that’s not quite what I had in mind. I think in principle it’s possible to use the Julia compiler as basically an abstract interpretation framework. So basically what I’m (ignorantly) proposing is just creating something like a DSL for operating system interaction that would share some of the same compiler code.

However, eBPF (ignorantly) seems like the better approach, because it’s more standard and well-understood/researched. So run an eBPF virtual machine in userspace (not supposed to be Linux-specific).

1 Like

I don’t see it flying. Even if code reviewed, it needs to be continuous, and is brittle. Even if you make a seemingly small change, use a similar but different dependency (same or different API, or any code change), say any of your dependencies switch from JSON.jl to JSON3.jl, then you would need to review JSON3 and all its dependencies, you always need to review the whole tree of your dependencies, NOT allow them calling untrusted or using ccall (currently not possible, if allowed you need to inspect arbitrary machine code binary or at least e.g. C code e.g. a dependency of Python if you use with Julia or for any jll package), and it needs only one bug or intentional attack in any dependency making your code/app unsafe, and potentially steals your bitcoin keys or passwords… And we do not want to enforce out-of-bounds checking, because it’s faster to not check and very often done selectively.

A lot of code isn’t speed-critical, and calling such dependencies out-of process would be good and safer. I think also in many cases faster, if you allow no out-of-bounds checking there, compared to enforcing it in full for your whole app.

I don’t think eBPF virtual machine or anything potentially slow needs to be used, the subprocess is fast, it’s only passing data to from it potentially slow. For a top process, e.g. GUI app, there may not be a lot of data flowing from it.

One possibility, but I think not needed. If you want a sandbox anyway, then it may as well be a different process (and maybe you were thinking that anyway), but it can simply be x86 or whatever.

Erlang has something similar, but fault tolerance is their main thing, and NOT using [Unix] processes as too heavy for them. I’m not sure we would need a tree of processes, one main, and one for the rest combined, all the untrusted dependencies. Trusted packages/modules could share the top process.

The subprocess could even run on a different machine for those very paranoid.

The OS is supposed to provide safety, barriers between processes, and that may even be a false sense of security.

Meltdown and Spectre exploit critical vulnerabilities in modern processors. These hardware vulnerabilities allow programs to steal data which is currently processed on the computer. While programs are typically not permitted to read data from other programs, a malicious program can exploit Meltdown and Spectre to get hold of secrets stored in the memory of other running programs.

You have to be very paranoid to worry about this, and there’s I think no good solution, except the process on a different computer. But at least even if on same computer, it can’t modify the top process or vice versa. But reading memory across processes is already very bad. But it raises the bar a lot, not sure how many know how to exploit this or how often it’s done (nobody knows if it ever has happened).

1 Like