Roadmap for small binaries

To handle error cases, types more complicated than Float64 are sometimes used. findfirst returns a Union type, i.e. an index or nothing; other functions like sqrt throw exceptions for errors. If you’re only using really basic types I’m wondering how you deal with error cases.

“oh crap, better fix that” :joy:

it’s basically a big monte carlo sim/model for definitelynotdegenerate sports betting, so while the logic is a little complex, it’s almost entirely “basic” arithmetic and control flow on floats

1 Like

Most of this discussion goes way over my head and I’m not really in the market for small binaries, but I can answer this: I’d love to be able to have a compile-time guarantee of no runtime dispatch for subsets of my code. Basically, I’d like to slap a label/macro on a method implementation to express that anytime a methodinstance is compiled, the resulting code should be free of runtime dispatch, not just within the method’s own body but also within callees, recursively all the way down; otherwise, an error should be thrown at compile time. (Small union splitting is fine, that’s just a runtime branch like any other, and exceptions are also OK.) This would be a lot nicer than the current game of chess we’re all playing against the compiler, assisted only by @code_warntype, @inferred, and various tooling packages with their inherent limitations.

Will something like this ever exist, and could it be a step forward on the road to small binaries?

8 Likes

Doesn’t JET.@report_opt do that already?

1 Like

JET prints a report for a specific call signature. That’s very useful, but it’s not the same. It puts the onus on me to ask for a new report every time I make a change to the implementation and every time I consider applying the method to new argument types. A compile-time error would prevent accidental regressions.

EDIT: I see that JET.@test_opt can be used in the test suite to avoid regressions. I should clearly start using JET more. However, it is still limited to specific call signatures that you have to pick when writing the test.

2 Likes

You can kind of get that already by trying to run your hot code on the GPU. I have a few packages I know the internals must be 100% type stable because of this.

We just need that without the GPU. A macro like you suggested would be perfect.

(Jet is great but I really want the unstable code to not run at all, rather than to have an external check)

7 Likes

I noticed that my detailed comments get basically zero like while many other comments can get more by simply expressing hope and expectation. I don’t think I have made any professional mistake in my comments, though they are not 100% original.

Really hurting. I won’t waste my time saying something people don’t want to hear.

5 Likes

@ChenNingCong I liked your comment, they’re not at all a waster of time. Basically the largest change to my expectations for static compilation this year.

But I would guess many people don’t understand them, I wouldn’t have a few years ago. Youre pitching to fairly seasoned developers only.

(Also, needing a change to Julias semantics is basically not-going-to-happen, so its kind of depressing to read. Not your fault at all just how it is.)

7 Likes

@anon56330260 I also appreciate all the detail you are providing, though I do not 100% understand it all.

I wouldn’t put too much stock in the like numbers :slight_smile:
sometimes the most heavily-liked comments are stupid one-liners — uncorrelated with quality of content

2 Likes

One interpretation of likes is “I agree”, and though I appreciate the information and explanations you are providing, I am not able to “agree” with them, because I am not fully able to understand them.

For example, my previous post getting some likes probably means “yes, I also wonder about this”, not “great post, better than the other post over there.”

Please keep up the work of explaining and arguing :+1:

16 Likes

Seconded, the comments + a bit of follow-up research made a number of things click for me wrt static compilation.

Given where and how this discussion has gone, I wonder if it shouldn’t diverge into at least two separate threads. One focused on “PackageCompiler++”, or trying to reduce binary sizes as much as possible given the existing state of the language. Topics such as stdlib excision, sysimage slimming and more would likely fit under here. Another would be a more open-ended on the language side but more technical discussion of getting binaries as small as possible without being constraint by the current tech we have in Julia land.

In both cases, it would be nice if we could collect resources on how other languages tackle this problem. For example, I think native executable generation for .NET/JVM use similar ideas to PackageCompiler and also tend to produce large binaries. But then I’ve also seen articles about reducing binary size for those compilers, so maybe they’ve found a way to do tree shaking or other methods despite having to also deal with dynamic dispatch. Likewise, Dylan exists and is compiled despite having proper multiple dispatch. What do Dylan implementations do to create binaries and keep them small, if anything? To be clear though, I think these resource should be shared only after the discussion is made more focused by splitting/forking.

5 Likes

Can julia distribute like Java JRE and have runtime packages separately installed so the app sizes can be smaller?

4 Likes

That is also the first thing I am wondering about. It is strange until now there is someone point it out. If I remember correctly, Matlab’s app also needs some pre-installed runtime support which is large.

2 Likes

MATLAB_Runtime_R2023b_Update_2_win64.zip == 4.3GiB

It looks big, but it’s much smaller compared to the size of the installer (18 GiB / R2021a_win64.iso).

Performance Considerations and MATLAB Runtime

Since MATLAB® Runtime provides full support for the MATLAB language, including the Java® programming language, starting a compiled application takes approximately the same amount of time as starting MATLAB. The amount of resources consumed by the MATLAB Runtime is necessary in order to retain the power and functionality of a full version of MATLAB.

MATLAB Compiler SDK™ was designed to work with a large range of applications that use the MATLAB programming language. Because of this, run-time libraries are large.

The size of the MATLAB runtime is fixed, independent of the complexity of your scripts.
There is also no increase in running speed, so it’s more like a MATLAB without the code editor (main GUI), documentation, compiler, …

1 Like

What would you cut from the current Julia install to make a JLRE, Julia Runtime Environment?

My sense is that the JLRE is basically a Julia install as we know it now. In the future, we may only want the juliax driver and maybe not the juliac driver for the “package” compiler.

1 Like

I guess we can start with shipping PackageCompiler.jl generated sys.so and a few artifiacts gzipped (from a typical project 300MB gzipped to 60MB) and it’ s a medium size to deploy. And other standard julia libs as the runtime.(Or just a juliaup installed julia, PackageCompiler.jl provides a way to link against them instead of copying them together?) I don’ t know whether you can split more things from sys.so to the runtime. At least it provides a way to avoid duplicate libs if you work on the same version of julia and have a lot of built applications to deploy.

@DNF captured my experience – your posts are so far beyond my ability to judge right and wrong that I don’t feel right “liking” which I interprete as meaning “I agree.” However, I always love to see your name pop up because I learn so much from your posts.

4 Likes

What do you mean? The Julia interpreter works for all code, so why is it a problem? I believe you very well could have packages precompiled, and the interpreter for the rest, and skip LLVM, and it would just work, and be small. You will just have to minimize amount of code interpreted for speed. It would work either way, so what you you mean “stably”, just too much of such code that needs the interpreter?

1 Like

Basically, (single) dynamic dispatch itself can be statically compiled with “virtual tables” in .NET CLR/JVM, which can be theoretically much slower than Julia’s multiple dispatch but do not suffer at all from re-compilation and runtime compilation latency.

Dylan’s handling does not really work for Julia due to its lack of support for parametric polymorphism (could be roughly regarded as “generics” tho not precise). Parametric polymorphism is fully supported in Julia.

You could ignore those horrible concepts but have a look at the following code:

f(x::Vector{T}) where T = ...
f(x::AbstractVector) = ...
f(x::Vector{Int}) = ...

The code could demonstrate why we cannot have fast runtime multiple dispatch like that in some LISP dialects. Besides, multiple dispatch without parametric types can be easily expressed by looking up dictionaries:

f(x::VectorInt32) = 1
f(x::VectorInt64) = 2

//  f(x) --> method_table[(typeof(x), )](x)
//
// things can be a bit more difficult when supporting subclassing, 
// but still damn easy when comparing to Julia's cases.

Actually, parametric polymorphism is powerful, and widely recognized and used in industrial languages (including nearly all C-family languages and static FP languages). However, parametric polymorphism can cause issues such as finding “principal types” or finding “most specific types” or finding most appropriate method – All these “finding *” must be slow at runtime and cannot be optimized without magic things like a quantum computer.

As a result, a “generic” Julia function could have infinite number of valid compiled (and optimized) specialization, and Julia code that is heavily dynamic cannot compile/optimize them in advanced. So far, Julia usually does a new compilation when finding a required specialization for a “generic” function, this promises the fast execution in the second time but produces severe “time-to-first-plot” latency in some cases.

That’s the whole story.

To solve the issues, the company I’m working for is now concentrating on generating small binaries (both standalone executables and dynamic libraries) for Julia codebase that is “reasonably static”. The project has made progress during mid-2023 and might be available for public use in the early 2024. Our work is based on code_typed in a frozen world age.

30 Likes
This is the runtime without code generation. about 10-20MB

Just to correct a misunderstanding that I saw a few times in this thread. This is pure C/C++ code, including an flisp runtime. So C/C++ does not give small binaries. But the flisp runtime is not necessary for static compilation (~2MB of that). Additionally, most of this file (~10MB) is comprised of debug information created by the C compiler used. The actual runtime itself is on par with the size of the C runtime that is installed on most systems to be able to execute any C code (libc-2.31.so is 2MB on my system, for glibc).

Secondly, there was a claim that Julia’s dynamic dispatch is assumed to be slow because of the lack of some feature or other. The speed of dynamic dispatch is rather remarkable though if you actually measure it (the speed of GC that can be estimated here too is quite a lot faster than manual memory management with malloc/free):

julia> using BenchmarkTools; i = identity; @btime i(1)
  23.433 ns (0 allocations: 0 bytes)

julia> using BenchmarkTools; i = identity; @btime identity(1)
  1.369 ns (0 allocations: 0 bytes)

julia> using BenchmarkTools; i = identity; @btime GC.safepoint()
  2.459 ns (0 allocations: 0 bytes)

julia> using BenchmarkTools; i = x -> Some{Int}(x); @btime i(1)
  18.255 ns (1 allocation: 16 bytes)

julia> using BenchmarkTools; i = x -> identity(x); @btime i(1)
  14.708 ns (0 allocations: 0 bytes)

julia> using BenchmarkTools; i = identity; @btime @ccall free((@ccall malloc(1::Csize_t)::Ptr{Cvoid})::Ptr{Cvoid})::Cvoid
  8.728 ns (0 allocations: 0 bytes)

julia> using BenchmarkTools; i = identity; @btime @ccall malloc(1::Csize_t)::Ptr{Cvoid}
  18.589 ns (0 allocations: 0 bytes)

julia> using BenchmarkTools; i = identity; @btime @ccall free(C_NULL::Ptr{Cvoid})::Cvoid
  4.089 ns (0 allocations: 0 bytes)
6 Likes