Roadmap for small binaries

if you’re looking for something cool and fun to play with, and learn a bit about static compilation, dig in!

Probably asked before, but is there a table laying out parts of Julia’s runtime and how much RAM and executable size they contribute? Would help if we had a common reference frame for what we’d trim for smaller binaries and for what purposes (for example, you would sacrifice a lot more things for embedded systems than for mobile apps), that seems to be getting a tad muddled in this thread.

And, as highlighted in the blog viralinstruction, the demand for reduced memory overhead doesn’t exclude desktop applications either despite having several GB to spare:

In fact, even for desktop-level applications, consuming 150 MB on the Julia runtime is pushing it. Think of all the hate Electron gets for wasting resources. Every Julia program is in the same ballpark as Electron in this regard. A command-line calculator written in Julia consumes more memory than the 2003 video game Command & Conquer: Generals .

2 Likes

This can be calculated roughly by inspecting the size of indiviual object files. libjulia-internal.so is smaller than 20 MB so this is the upper bound. Minimum runtime requirements include builtin-types (array and string mostly)/gc/threading/error handling/io. I believe it’s about 7~10MB in total (at least the last time I checked the repo). Type calculation/method dispatch/interpreter can be discarded if we don’t query type hierarchy/method reflection (I think it currently should be disallowed for static compilation anyway). This contributes another several MB. Deserialization codes can always be fully discarded as we don’t need that.

Anyway, compared to LLVM’s 80 MB, these 10MB or 20MB are not that important…

2 Likes

But union splitting is a kind of dynamic dispatch. For example Wikipedia defines dynamic dispatch as “the process of selecting which implementation of a polymorphic operation (method or function) to call at run time”. The implementation of union splitting fits this perfectly.

I think it’s safe to say that “No dynamic dispatch (at runtime) == No missing method”.

I disagree, even if we forget about union splitting. Let’s say the program is calling a method based on a type-unstable value. Without union splitting, the program (or the runtime) needs to search the method tables to find the right method to call. Maybe there’s no such method (the “missing method” case). But maybe there are appropriate methods, in which case the program/runtime must choose the right one according to the rules. The whole process is much slower than union splitting, but it doesn’t require LLVM if an appropriate method is found.

All I’m saying is that it’s possible to have dynamic dispatch (even outside of union splitting) without missing methods. Maybe the situation is bad currently in the sense that this rarely happens (maybe your rule is correct 99% of the time), but there is no reason in principle why this couldn’t be improved.

Yeah it seems we have different definitions so we misunderstand each other. But I don’t understand your question… For me static compilation means that the program can run without compiling anything at runtime. Type stable means that a method always returns the same type, or (depending on the context) that the value is always the same type and the compiler can prove it. As explained above, I think that static compilation and type instabilities can coexist to some extent.

I don’t know, it’s just a hunch.

I agree that probably most code currently cannot be compiled statically without missing methods. I’m just optimistic that the situation could improve quickly.

To be clear, I’m not holding my breath thinking that in two years every project will be statically compilable. But I expect that it will become easy, for example, to solve an optimization problem in Julia using standard packages and distribute the solution as a small shared library. That would already be a big win!

3 Likes

For the purpose of small binaries, runtime dynamic dispatch is (roughly) defined as whether there is a jl_apply_generic in the binary product. I don’t need to read these definitions on wikipedia or POPL/PLDI’s articles to know this. Because implementation of jl_apply_generic will significantly increase size of binary products so it must be prohibited (and once it’s prohibited, we could get small binaries). The goal of static checker should be to define a model such that when the user’s codes satify such model then it guarantees the absence of jl_apply_generic (A C-style type checker with special variable rules satisfy this criteria).

How do you store your method table and how will it impact the size of final products? Tree shaking? But for a language with multiple dispatch it never works because conservative estimation of method call relationship will blow up and include almost every method eventually.

It’s possible only when we don’t care about size of binary product. Again, this is how I implement my static compiler prototype (mix of interpreter and static binary codes). It works precisely in the way you describe. But there are two problems:

  1. You need to load Base and stdlib and other modules, size of which is non-trivial (unless you pack them into something like a virtual machine). Basically every julia libraries use them. Your strategy works perfectly fine for baremodule and Core (small scaled codes), which is how Julia boostraps itself where this’s no redundant dependency. But for Base and stdlib?

  2. it can’t stably produce small binaries because when I tested it on real libraries, it encountered pure interpreter mode quite quickly because many Julia libraries are of bad types so it basically never works on >10k examples in real world. Of course, you can do method tracing to decrease the possibility. But it never works on large-scale codebases.

No. There’s no misunderstanding. The problem is that your choices of definitions are too broad and vague, from which an implementation can’t be derived. Again, look at some non-trivial realworld examples instead of these made-up union-splitting examples. That’s why I said the definition of that paper doesn’t provide a useful guidance for a static type checker. You can define arbitrary concepts in any way you want based on your need. You can have two different and dedicated concepts for type stability and static compilation model for the sake of theoretical research. But the realworld implementation limits the application of the definitions.

For example, type stability can be defined as:

But to ensure termination, realworld implementation of union splitting stops when we have a large union (instead of going on indefinitely). Then it’s possible that a “type stable” code in this sense generates dynamic dispatch (and ruins small binaries). We then have to admit :

But to what extent? What model do you use? I am asking you for a precise definition. Users need to know this otherwise they don’t know how to fix their codes if there’s compilation error.

So basically this is all your imagination and expectation. None of them is achieved yet. Your statements (and @Mason 's example) are quite misleading and even self-contradictory for those readers who are not experienced in these PL/compiler stuffs. If you aren’t sure about whether Julia’s user groups can accept static compilation, how can you even draw the conclusion that “the situation could improve quickly”/“there are a tons of situation that…”? Also, if it’s that easy, why it’s still not implemented in the core language? Instead, we have a lot of XXXCompiler.jl libraries. There are more essential difficulties than simply invoking Core.Compiler LLVM and linker to assemble binary products. Do you even test these PoCs on large codebases?

7 Likes

I’m interested to hear more (or see code) on your static compiler prototype.

1 Like

Super interesting discussion.
How does the static compiler prototype compare to pkgimages we have since 1.9?
Is it the same essentially? Does pkgimages also rely on interpretation?

Best,
Oliver

Yeah. The mechanism is basically the same: cache relocatable binary produces to reduce JIT latency.

I believe interpretation is only used for toplevel code if you have LLVM (recent Julia supports mixed compilation mode so this may not hold). Pkgimages are orthogonal to interpreter. But without LLVM, Julia has to run in pure interpreter mode. Pure interpretation is seldomly used in Julia because it’s horribly slow. Still, Julia needs interpretation to boostrap itself (which is a complicated story itself).

Anyway, let’s not get lost in the woods of these technical details (LLVM/interpreter), otherwise you will quickly fall into a rabbit hole of obscure glossaries and messy implementation zoos.

We can start by asking what prevents Julia from supporting small binaries. There are multiple potential answers to this question. For example, LLVM’s JIT support is bad so we need to wait for upstream patch. But imo the root problem is that there’s no cheap, correct and coherent static compilation model for multiple dispatch. So we need to prohibit multiple dispatch, which significantly violates Julia’s semantics.

Now someone may say that, this is totally debatable. Isn’t union splitting a perfect counter-example to my argument? Julia converts bad codes into good codes for me so I don’t need to worry about the implementation details. And multiple dispatch is eliminated magically! It becomes cheap again! To generate small binaries, it basically means the static compiler needs to precisely find out the (unique) runtime type of a value, then emitting devirtualized method call for it. Even if you can do union splitting, what about where/Any? But still, someone can argue that:

Wait for a minute! How can you draw a conclusion that where/And can’t be implemented. It seems that several ideas exist to handle where/Any, like whole program analysis/compilation, tree shaking, (and add your favorite PL papers here). Yeah, they are complicated. But that’s because we currently lack people to experiment with them so let’s wait for some clever people to implement them.

Then it comes to the coherence part of my argument. To make this static compiler useful, we need to define a model so that Julia users can reason it easily. Note that you don’t need to do this for type inferencer. Julia’s type inferencer has more freedom because it’s allowed to infer imprecisely - for example, it can infer every variable as Any, thanks to Julia’s subtyping-based type system, but it can’t be compiled statically without use of dynamic dispatch. This requirement further renders every complicated whole-program-analysis based method unusable - because it will take a long time for the compiler to draw the conclusion whether a code can be compiled statically. For large codebases we will encounter slow compilation problem quickly (So no fancy PL stuffs here). Plus, error messages are hard to read. Of course, we can restrict ourselves so “there are a tons of good situations that…”. But doing so is no different than breaking Julia’s semantics if it’s added into the core language.

What’s worse, it’s also suspectable that current situation can be improved greatly. If you try to use JET.jl on existing codebases (especially those written by Julia newbies), you may find a lot of trivial type unstability if you are a Julia expert. But it doesn’t mean that they can be improved easily, possibly the opposite - it proves that it’s hard for Julia users to manage types. JET.jl doesn’t correct codes automatically, and we have only a few Julia expert while there are thousands of packages. Apparently only contributors of these packages can fix these problems by themselves. But then people need to learn a systematic way to do this, which violates Julia’s semantics again…

Therefore, usage of producing small binaries will be limited in Julia, not because it’s useless itself, but it conflicts seriously with Julia’s design goal.

5 Likes

That’s exactly right.

I’m being hopeful…

Certainly making small binaries will require some discipline, coders will have to respect additional constraints. You seem to see that as a violation of the language semantics. For me it’s just another set of constraints. We have coding style guides, CI testing for documentation, etc. This is another set of opt-in constraints for people with specific goals. I think it’s fine to start small, make a few packages compatible, add CI testing to make sure they stay compatible, and grow from there…

1 Like

There are some parts of the Julia ecosystem that are amenable to static compilation. The SciML work is one I’ve been experimenting with for static compilation. These packages work well with static compilation because they parameterize structs like crazy and pay attention to type stability.

3 Likes

To me at least, it’s perfectly acceptable not to have all of Julia statically compilable (I think that’s way out of scope). Personally, I’d be fine with banning runtime eval, or in other words, the statically compiled binary should be locked to a world age (which exact world age is a bit tricky though, as I think the world age can advance if you specialize when union splitting).

Personally, it’s completely acceptable to disallow have type unstable code (by your definition) that can’t be union split and actually forces a dispatch. Whether that union splitting is then implemented through inlining or virtual method tables is an implementation detail :person_shrugging:

I agree that it’s going a bit against the design goals, so I guess “want static compilation? Make your code completely type stable” is a fair compromise. Well, at least for my purposes :slight_smile:

EDIT: The one worry I have/share is binary size :thinking:

12 Likes

It doesn’t even need to be that stark. It can just be “want statically compiled small binaries? Make your code completely type stable and non-allocating”

All we’d really have to do is detect if the compiler and runtime are needed, and if it’s needed, then link to them (hopefully not silently / automatically, but instead an error telling the user they need to opt into it.)

Non-allocating in some cases is not trivial. For example most of Sockets allocates and trying to make them non allocating you have to write it yourself from ground up and it will not look anything like Julia code. If we had some interference for custom allocators that would have made it very easy, though I have been told that is pretty hard to implement.

1 Like

So what? Then you’re going to have to pay the cost of linking to the runtime. Note that the runtime isn’t even that big.

1 Like
1 Like

Yeah that’s the practical way forward for the average Julia use case.

I would like to understand this better. Is there an aspect of linking the runtime that is particularly difficult at the moment?

Overall, I’m mostly fine with the stock Julia install (runtime and compiler) existing somewhere on the system. A lot of people basically accept that situation in the case of the Java Development Kit (JDK).

I just do not want to have to create a new system image for every application. I think what I am describing is essentially an executable pkgimage.

2 Likes

No it’s not very hard, it’s what the pkgimage system already does, and I made a very goofy and amateurish version of that in StaticCompiler.jl but it required a julia session to load the code and perform the linking. In principal, it would not be so hard to do the linking, it just requires someone to actually do it.

I could be wrong, but could we not compile inefficient compiled code for abstract types, e.g. Number? Compiling for a concrete type is only needed for maximal performance, no?