Why is static compilation difficult

I’d like to understand the static compilation problem better.

I know that a lot of folks crave static compilation because they have constraints (embedded etc.), and that it would help to build adoption as a replacement to C-for-Python, so I understand part of the why.

But I don’t quite get the details, this isn’t my field at all.

I see three points:

  • If we don’t compile anything at runtime, we don’t need the compiler (smaller executable, no JIT time)
  • If we don’t allocate anything at runtime, we don’t need the GC (smaller executable)
  • If there’s no dynamic dispatch, that’s another chunk of the runtime that doesn’t need to be linked.

Did I miss anything from the runtime? Incidentally, how large are these three components, in MBs?

So from my limited experience, I see PackageCompiler and StaticCompiler at two extreme points: one bundles the full runtime, the other ditches the runtime completely and fails to compile normal julia code. Why isn’t there a middleground? I assume the compiler is the biggest chunk, so couldn’t you get rid of it by compiling generic (@nospecialize) versions of the type-unstable code? Couldn’t you replace the super-optimized GC with something less performant but smaller?

Is it that nobody has gotten around to it? I get that it’s really challenging work. Or is there no value because anyway people who want small executables need really small executables?

6 Likes

The state of julia talk from this year’s Julia Con talks a bit about this.
See here https://www.youtube.com/watch?v=2D8oRtDJEeg, around 42:50.

4 Likes

Is a good thread for some previous discussion.

The big parts you’ve missed are “task management” and “exception handling”, as well as I/O. All of these are handled by the runtime as well, and without that, there’s no threading, there’s no printing (sort of), there’s no REPL, there’s no strings/arrays (due to allocations).

The truth of the matter is - the runtime does SO MUCH work for you in the background and enables so many different things that running without one is basically only possible in extremely limited circumstances. One such examples are the prototypical usecases in StaticCompiler, another is microcontrollers (where I am currently working on getting a tiny minimal “runtime”, consisting of nothing more than an allocator, to at least be able to have some global variables and an array). The reason there’s basically no middle ground is because the various features the runtime provides are all depending on each other - you need allocations for everything, in some form or another, to be able to even build a string you’d use in an error message. You need strings and arrays to be able to spawn a new task,and you need to be able to throw exceptions in case the task couldn’t be spawned.

And this doesn’t even get into what you need for dynamic dispatch! In general, that requires the compiler in its full capacity (though you could compile only a subset and fatally crash if you’d have to compile something new).

So in the end, either you’re programming in a very restrictive environment already (embedded) and can afford to implement things “just right”, or you’re going to need large parts of a runtime, in order to be able to use existing libraries more or less as-is in your static binary.

I’d love to also link my talk from JuliaCon about compiling some Julia code to an Arduino (which covers some of the hard parts of the extreme embedded side), but it seems something has gone wrong in the video processing pipeline and it’s not yet uploaded as a standalone video. If you want to look for it, it’s in the recording of the second day of 26-100, somewhere around the 5:30 mark if I’m not mistaken. I’ll edit the video in here once it’s standalone.

EDIT: It’s uploaded now!


The general TL;DR though is “this is just really big and complicated work, that runs into assumptions that were made in the past and edge cases of the current pipeline that are not easy to solve”.

9 Likes

I’m also not a compiler expert at all (I did try to go through the Julia codebase to absolutely no success so far), but the impression I got is that there could be a middle ground: a modularized compiler that does tree-shaking to eliminate unused code… but it is just a LOT of hard work to get something like that done.

One of the reasons is that the barrier to entry for new compiler contributors is just too high, as the Julia codebase is already very large and complex. Moreover, as the compiler has been heavily optimized for compilation speed, it’s also even hard just to read the compiler’s code.

To put it in a perhaps funny way: Julia has solved the two-language problem for its users, but it created the 4-language problem for it’s compiler developers (some portions of Julia are written in a custom version of Lisp, others in “core Julia”, others in C++, and the rest in ordinary Julia).

There are two ways forward which might eventually converge to the ideal solution: trim down stuff from PackageCompiler and/or enlarge StaticCompiler. For example, I believe there is work underway to remove the JIT from PackageCompiler when there is no need for dynamic dispatch. And in the StaticCompiler side, I believe an interesting approach could be linking to libjulia, so you do have the functionality provided by runtime available but as a dependency instead of in the executable, besides what Sukera has just said of adding just an allocator.

5 Likes

Multiple-dispatch itself is a runtime feature. Method selection and dispatch happens in the runtime, but specialization needs the compiler.

One neat project would be to allow the interpreter to execute foreign-calls. Right now ccall and Co need the compiler, but one could use something like libffi and have the interpreter support it as well.

Then there is the funny aspect that parts of the Julia compiler is implemented in Julia, and I suspect people want to be able to execute lightweight code without LLVM, and for that the Julia based compiler + improving the interpreter should suffice.

4 Likes

Well yes and no - you can have static method selection in a fixed world during compilation, at which point the runtime that ends up being used during execution may not necessarily need to be able to do multiple dispatch itself. In fact, that is exactly what ends up happening in StaticCompiler.jl/AVRCompiler.jl today!

For static compilation and ccall, all we’d need is a linker, not necessarily a compiler. AFAIK, the ccalls already end up as extern calls in the LLVM IR (or at least, they could be, in a static compilation context).

I think “static compilation” exactly means “not interpreted”. Also, “static compilation” (to me) means “no more compilation after this point”, hence the “static” part.

3 Likes

For me static compilation also means: “No more compilation after this point”, but how do you handle dynamic dispatch, and maybe even eval in such a system. This is where the fallback requires the use of the interpreter (or a runtime error).

For static compilation and ccall, all we’d need is a linker, not necessarily a compiler. AFAIK, the ccalls already end up as extern calls in the LLVM IR (or at least, they could be

That’s harder than you think in particular if you want to support cross-compilation, right now codegen generates code for the host ABI, which will lead to fun crashes once you want to execute the code on your cross-compilation target (I had a fun project once were we cross-compile Julia from x86 to aarch64).

Julia also supports richer ccall semantics than most current linker support, which means if you want to implement them you will need to use dlopen/dlsym (which is what the Julia runtime does and we do compile trampolines of that form as well).

For me this conversation is about how can we move the state of the art forward. How can we support more of the language in static compilation and that means that the extraction of static call-graphs from dynamic ones should not be the limiting factor. My goals would be first GC, then task runtime + exceptions, then support for dynamic dispatch through falling back to the interpreter (and maybe have a world where this generates a performance remark so that you can fix it).

In fact we are almost there. Use Julia with a hot pkgimage cache and julia --compile=no and you should only fail when you encounter a foreign-function call (or worse a llvmcall).

8 Likes

Right - from my POV, it’s fine to say “no eval” there, and hence no interpreter (or compiler) is needed. A static binary is not supposed to change, after all :slight_smile: The static part in my view exactly means having a static set of code, and not a dynamic one that eval would bring in.

Yes, cross compilation is one of the biggest challenges there, but it’s also something that can be done incrementally and doesn’t need to all be done at once. For example, getting things like Only decide the bitsize of an integer literal when interpreting, not when parsing · JuliaLang/julia · Discussion #49252 · GitHub done is something small-ish that gets us closer to proper cross compilation, without disrupting existing code, while serving as a test bed for what kinds of problems/changes are required.

That’s also something that can be done incrementally though, no? For example, getting a barebones “static LLVM module with extern symbols” is simpler conceptually than getting the full Julia semantics right away. I.e., as I understand it, the Julia semantics of ccall are a superset of regular linker semantics, at least in the context of static compilation without eval.

I agree with that, though I wouldn’t say that dynamic dispatch by falling back to the interpreter should be the next step after runtime + exceptions. There’s an intermediate step of “no eval, but dynamic dispatch of a static set of known methods” that I think would be enough for lots of usecases (this synergizes with past requests for a “compile all the things” mode, where the compiler just doesn’t give up in inference/errors out on dynamic dispatch that requires an interpreter/compiler). The classic solution to that is of course a vtable with fixed contents - at least for my microcontroller usecase, that would be plenty.

Yes, any dynamically encountered llvmcall necessarily has to error when no compilation can be done at runtime. I think that’s an acceptable tradeoff though, and something that ought to be discoverable via static analysis (which precludes dispatches on Any of course - so the desire for no unlimited dynamic dispatches combined with "no eval is the foundation upon which good static analysis in Julia can be done).


One slightly tangential thing I’d like to mention is that I think the AbstractInterpreter framework in Julia and how tightly it’s bound to the compilation process, as well as how exposed it is to a regular package, is a very big boon. There’s a lot of potential here for all kinds of static analysis using it, which would potentially enable lots of cool formal verification work on regular Julia code. This, paired with static cross compilation, would be a gamechanger!

5 Likes