Roadmap for small binaries

This depends on the definition of “small”. This compiled code would need to dynamically dispatch which might require the Julia runtime. When some people mention “small” they mean self-contained without linking any Julia runtime. If we can link the runtime, then this is possible as far as I understand.

We really need better language to describe the different kinds of binaries that people want.

3 Likes

I think people are forgetting another part of the story here. The size of the pre-compiled cache files. They are (relatively speaking) huge files. Makie is 130 Mb. GMT (the one that bothers me more for direct reasons) is > 100 Mb on Julia nightly. Why are those sooo big?

2 Likes

That sounds very restrictive. C++ allows you to have type instabilities in a controlled way (runtime dispatch with inheritance and virtual functions), allocations with RAII, and exceptions (also not supported by StaticCompiler.jl).

How large is the part of the runtime dedicated to dynamic dispatch? (Assuming LLVM is not needed as all methods have been precompiled.) Some languages with dynamic dispatch are implemented with a tiny amount of code, e.g. FemtoLisp. Obviously Julia’s multiple dispatch rules are more complex, but is the codebase for disptach really that big?

1 Like

To play devil’s advocate, why does it matter? I mean, really, Julia’s compiled code size doesn’t affect me at all. It seems like there are tons of other more exciting stuff that Julia devs could do (and are doing) than optimizing storage costs of compiled code.

Isn’t this a thread about small binaries?

7 Likes
3 Likes

This is the runtime without code generation.

$ du -hcs ~/.julia/juliaup/julia-1.9.3+0.x86.linux.gnu/lib/julia/libjulia-internal.so.1.9
11M	~/.julia/juliaup/julia-1.9.3+0.x86.linux.gnu/lib/julia/libjulia-internal.so.1.9
11M	total

$ du -hcs ~/.julia/juliaup/julia-1.10.0-beta2+0.x64.linux.gnu/lib/julia/libjulia-internal.so.1.10.0
13M	~/.julia/juliaup/julia-1.10.0-beta2+0.x64.linux.gnu/lib/julia/libjulia-internal.so.1.10.0
13M	total
1 Like

And same thing with performance. Do we really care about an extra half a second here and there? Must we always be in such a hurry, when perhaps a leisurely and relaxed program pace is at least as satisfying?

4 Likes

It doesn’t work like this. For small binaries, you need to eliminate runtime by discarding all the method tables and useless binaries, especially those in Base and stdlib library, as they are rather huge codebases. If you compile with Number, then you have to save every potential method tables at runtime (so that you can lookup them), this totally ruins the goal of small binaries.

You may ask “Why there are useless binaries”? The answer is that Julia has generics and specializations. Precompilation saves more binary codes than you may need them. For example, one user uses f(Vector{Int}) and another uses f(Vector{Float64}). A library author can include both f(Vector{Int}) and f(Vector{Float64}) in the precompiled codes, but each of two uses need only one copy from the cache.

A common process called “Tree shaking” can be used to discard unused binaries. It works by figuring out what methods are unused (unreachable) and then deletes them. This requires the code to be statically typed, otherwise you quickly lose precison in the call graph and include every method eventually (so tree shaking has no effect at all).

I will emphasize once more: for small binaries there is few room for any fancy PL/Compiler stuff. So for practical consideration, please stop thinking about the possibility of runtime multiple dispatches/union splitting/whole program analysis(tree shaking). Let’s simply stick to type stable code (in a restricted sense).

You are 100% right. That’s why I said:

The adoption of such model is an essential difficulty. A restrictive compilation model is unattractive in any way. So even though I use a equally restrictive model for my TypeChecker.jl, I really hate it and know it must be improved. But I can’t improve it by myself, because it requires modification to core language (like adding OOP), while I don’t have any right to do so.

That’s why you can see a lot of people try to defend the possibility of using multiple dispatch at runtime (which is simply a waste of time in my opinion, at least for the purpose of small binaries), to make the model look less restrictive. Without modification of core language we can only have these bad workarounds, but none of them is satisfying.

6 Likes

What do you mean by “adding OOP”? You mean inheritance from concrete types? How would that help with binary size?

We need alternatives for runtime multiple dispatch because of its non-trivial implementation costs, otherwise we can only have a restrictive static compilation model. So OOP is unavoidable (or at least some kind of function dispatch table/function pointers). If you don’t offer an alternative, people then opt for runtime multiple dispatch, which renders their codes non-compilable.

Reiterate my point again:

Please read all the comments in this thread carefully to examinate why it’s so hard to add small binaries to Julia. It’s much harder than you may have thought to object my argument.

We just need to “close” or “seal” methods at some point so that we have a finite method table to dispatch upon. Perhaps we could even reduce the size of a particular dynamic method dispatch with limited knowledge of the possible types.

3 Likes

I kind of get this point, but I don’t see why you’d want non-abstract inheritance specifically. I think you’d be restricted by the static compilation requirements just the same, even if Julia got non-abstract inheritance. Julia without run-time dispatch is about as powerful as C++ without run-time dispatch, with or without inheritance, I think?

In any case, it’s curious to me that inheritance is still widely used in the industry, considering that it’s long been recognized as leading to bad design (“prefer composition over inheritance”). The “gang of four” Design Patterns book, which used to be somewhat of an OOP bible AFAIK, espouses this view, too.

Yeah, some kind of function pointer type (which would not be parameterized by the function type, to facilitate type-stable access to collections of function pointers) would clearly be useful for code that has to be statically compiled.

You can firstly try this on the Base library. I don’t buy it unless you have successfully reproduce a PoC of “tree shaking with limited knowledge of the possible types” on a large project, stably. I have said that you are wasting your time on considering all these (either broken or overly restrictive) variants of runtime multiple dispatch.

Firstly, Julia’s subtyping type system is rather complicated and actively abused in practice. (Any/bounds/where/Vararg/NamedTuple). Even simple facts about this system, like intersection of two types become too hard to calculate. I really double your limited knowledge can yield anything useful, as you can lose precision in some commonly-used methods like iterate and +. Secondly, this is still whole program analysis. I have said many times in this thread it’s impractical to do this. Also, I really can’t see the difference between “closed method” table and union splitting. As long as you have abstract type, it loses precison quite quickly.

Do Julia users really want to use “Julia/C++ without run-time dispatch”?

Then you and the OOP bible are wrong. The pattern should be “use inheritance” instead of “use composition”. That’s it. Of course, it’s your freedom to prefer composition over inheritance. But then it’s also other people’s freedom to write “anti-pattern” codes with inheritance. Please calibrate your view with the real world practice instead of opinions on the book.

Much like C, but this doesn’t quite work with Julia’s subtyping system as Julia has no void* (unless you use Ptr{Nothing} but this will not look like Julia anymore). You can’t create pointer of f(x::AbstractType)::y. You can only create f(x::ConcreteType)::y. But without void*, pointers of different concrete types will be different unless you want to introduce variance on arrow types.

5 Likes

Without commenting on the rest of the points, I thought Any (or jl_value_t* as represented behind the scenes) fills the niche of a void* equivalent? This is based on it being possible to call Julia functions from C if all arguments are boxed to jl_value_t* beforehand, and the existence of a formal (though not well documented) calling convention in Calling Conventions · The Julia Language. I agree that this doesn’t really solve the problem the way people are looking for because it degrades all non-static dispatches to the OO language equivalent of Object ret = func(Object[] args...), nor does it solve the tree-shaking problem.

You are right. Any == jl_value_t* == void* but it’s an implmentation thing. Any is formally treated on a type level (in a type inferencer). Its layout is unknown (not necessarily a pointer). Type inferencer has freedom to process it differently. Consider this function:

x = Ref{Any}(1)
push!(Vector{Any}, x[])

Is this type stable? We can implement this with no dynamic method call, as long as we treat Any as a void pointer (and by inlining push!). But Julia inferencer may decide that it should specialize the push! call and generate a jl_apply_generic of push! instead. As long as we use the builtin type inferencer to generate typed IR, using Any is unreliable and we must use Ptr{Nothing} to explicitly express the fact that we want a boxed generic pointer.

Yeah. These are basically problems of Julia’s type systems (compiler frontend). I don’t really think we can magically solve them without significantly changing Julia’s semantics. One thing I have painfully learned in my prototype implementation is that we must do the right thing at the right place. It’s just not static compiler and linker’s responsibility to patch or solve these problems. Why? Because it’s super hard to debug and correct them at this stage, just think about those unreadable linker error. No one will try to maintain such a complicated and highly-coupled implementation.

7 Likes

For me, absolutely.

I would be fascinated to hear the other side of this discussion. Instead of ‘how difficult is it to compile unstable code’, the question is ‘how difficult will it be to compile completely type stable code, with no dynamic dispatch/union splitting/Any/abstract types etc?’ How much of the language could reasonably be supported by a static compiler in the not-too-distant future? (By ‘type stable’ I mean whatever the compiler is able to figure out statically).

(If this has been discussed further up, I would appreciate a link :slight_smile: )

14 Likes

pretty sure I could use almost my entire project with just Float64 and Vector{Float64} :eyes:

1 Like

How do you do error paths in your project?

what do you mean?