Roadmap for small binaries

It just occurred to me that vtables make it trival to completely eliminate a class from compiled code if that class isn’t used anywhere. Thanks for helping me connect the dots!

I’m curious whether this is a fundamental limitation of having parametric polymorphism and multiple dispatch in the same language. To avoid cluttering this thread, created Reconciling dynamic multimethod dispatch, parametric types and static compilation if anyone has thoughts or insights.

Do you anticipate creating a small writeup to describe how this project works once it’s ready? Sounds exciting!

3 Likes

Definitely there will be an article explaining a working AOT compiler.

10 Likes

One big problem on the roadmap to small binaries is that while statically typed languages’ type inferences are normally rule-based, Julia’s type inference is heuristic-based. This means simply adding a static language as a language where all the code has to be type-inferred might mean that the function breaks every now and again. I want to add this point as something not frequently talked about.

heuristics are rules, just lots of em mushed together

This isn’t really correct. Most static languages have heuristics in their inference (since turing complete subtyping is very common). It’s definitely possible to confuse Java’s sub-typing for example.

7 Likes

I’d like to share something interesting :slight_smile:


Recently, our AOT compiler (made by my team) has nearly achieved full support for type-stable Julia code, including exception handling, multiple dispatch and arbitrary complex constant data.

Something interesting is that our generated native code greatly outperforms vanilla Julia in terms of try-catch-finally statements, while in general cases our AOT Julia is slightly slower than the vanilla.

48 Likes

Thank you for sharing.

Could you run @time main() twice to see the effect of in memory compilation caching?

I would also be interested in seeing the difference in the LLVM IR ot native code generation.

1 Like

Sure, and I find the cost is stable.

julia> @time main()
149927
  0.485954 seconds (49.94 k allocations: 780.336 KiB)

julia> @time main()
149861
  0.485176 seconds (49.87 k allocations: 779.305 KiB)

julia> @time main()
149820
  0.484067 seconds (49.83 k allocations: 778.664 KiB)

I’m not sure. In fact, I did nothing special here. Upsilon/Phi-C nodes are directly translated into some C-family language, and the try-catch construct in the target language is leveraged to implement stuffs. However, the encoding for Upsilon nodes/PhiC nodes might be quite different from the one that Julia uses:

block_t _B;

handle_error:

try {
     switch (_block_target(_B)) {
        case 0:
             goto _L1;

        case catch_clause_jump_target1:
             goto _Li;
        ...
        case catch_clause_jump_target2:
            goto _Lk;
     }

    // normal code gen
   
    // Expr(:enter, jump_target_i) =>
       _block_push(_B, jump_target_i);
    
}
catch (exception e) {
    _exception_wrap_and_push(e);

    goto handle_error;
}

I don’t have an idea, or I won’t think this interesting. One thought could be our simplification to the stacktrace. So far our AOT Julia creates stacktrace that is more restricted, and more complete stacktrace is available only when the .dwo/.pdb files created by the AOT compiler are also kept beside the generated shared libraries/executable.

1 Like

This may likely end up in an announcement post anyway if it isn’t closed-source, but can I ask why this compiler diverges from the mainstream compiler rather than repurpose it with additional optimizations assuming no further code is evaluated? Or does it not diverge and it’s slightly slower in general for different reasons?

4 Likes

So while there isn’t a clear roadmap at the moment, it is a goal of the Julia team to have small binaries (It’s the main thing I’ll be working on in the near future). The vision is to have something like juliac that takes in a module or a script (this isn’t super clear just yet, but also isn’t super important, it just needs to be something with well defined entry points (main/shared library API) and generates a binary from that.

I believe that from that we will have 3 different options to generate the code, one would be equivalent to what PackageCompiler.jl does right now, the other will be StaticCompiler.jl and the one that’s missing, which will allow using the runtime but will require a well defined call graph (known targets for a possible dynamic dispatch) so that we know what code to compile, and more importantly what to throw away, so that the final binary isn’t too big.

55 Likes

That’s great news. Thanks for sharing.

As @ChrisRackauckas said in the Julia subreddit, there is also a plan that includes a static subset of Julia. Are there ideas about a little more explicit memory management? For example, we can already use Bumper.jl with StaticCompiler.jl which is a great combination, but there isn’t an interface to use Bumper with most of the language(i.e. Sockets). Or this line of thinking is not in the current plans?

For now the plan is to keep using and improving the GC.

7 Likes

Occasionally, what I want along these lines is just a launcher. A launcher is just an executable that will switch to a specific environment and then run a main function with a full Julia install. On some platforms this is easy enough to do with a shell script or perhaps a small C program. A cross platform way to do this entirely in Julia without depending on a shell or a C compiler would be nice.

4 Likes

You can do that with juliaup (portable version) and a .bat file. I already do it. For linux/apple you’ll need a bash script.
The bat file download the portable version of juliaup from github, setup some variable in PATH, then juliaup install julia, then launch julia and run some script, then remove everything at the end. That being said not sure what you mean by environment in this context. If it is a julia environment then it is also easy.

(Anyway I think this is off topic)

I’m really excited for the AOT compiler

3 Likes

Yes, that’s a completely different issue though. That’s more a question of defining and better supporting what a Julia app is. We definitely want that as well though. There’s a GitHub issue somewhere.

5 Likes
24 Likes

In the video the example is x + y, and while that’s only an example I do feel that math heavy code is more often talked about when we talk about compiling things. To me it seems instead that the bigger “market” with binaries is for string and io heavy code. Because that includes all the general purpose tools like formatters, linters, web servers that you want to deploy somewhere remotely and spin up quickly. Strings, GC and allocations are not at all a focus of GPUCompiler I’d say, and StaticCompiler also doesn’t really allow them for that reason. But I think the community infrastructure would gain much more from work on that than from packaging up purely numeric code. At least I assume that because purely numeric code doesn’t seem like the kind of thing to me where you don’t have the option to run a normal Julia session. Unless targetting embedded devices maybe.
So what is the optimum we can reach while still including all String, GC and related functionality, just no dynamic compilation?

5 Likes

Isn’t the case for purely numeric code replacing C/C++/Fortran in producing binary numeric libraries, e.g. that can be used by R and Python packages, embedded, etc. There’s a huge potential “market” there working directly from julias current strengths.

19 Likes

I’m not a numerics person so my view is heavily skewed of course :slight_smile: For me, solving the cases where you currently can’t use Julia at all, only because it doesn’t produce small binaries with quick startup time, are more exciting than calling into numerical libraries from python or R, even though it’s also useful to solve that.

5 Likes