About static compilation and static analysis

martin.d.maas · September 19, 2022, 8:30pm

Hi everyone,

I just wanted to share some thoughts about the state of Julia 1.X, and what I see as the most important missing features of the language. I intend to make this a “no-rush feature request” for Julia 2.0, rather than a “complaint post”, so please bare with me.

It’s no doubt that Julia is an interactive, high-level language, that can attain the performance of statically-compiled low-level languages, and this already represents a tremendous progress in language/compiler design, which has made me really enthusiastic about Julia.

However, I also realized that Julia still can’t obtain the same portability that statically-compiled languages enjoy. For example, things like Python interoperability are harder to obtain with Julia than with C++. Additonally, other nice things to look forward to, like compiling to Webassembly, are nearly impossible in Julia for the moment, let alone running on low-memory or embedded devices. That is to say, Julia (at least in it’s 1.X version) isn’t the optimal language in which to write a universal library that can run anywhere.

I am aware of the experimental project StaticCompiler.jl, which makes me really enthusiastic that all this will could be possible in the future. This is nevertheless in a very early stage of development.

Correct me if I’m wrong, but it looks like StaticCompiler.jl is headed to eventually be able to support a dialect of Julia 1.X, and not the full language.

I also remember reading that StaticCompiler.jl was one approach, while there was another possibility of making libjulia mode modular or more portable, I don’t really remember. It would be nice to hear some comments about this. Will the progress on StaticCompiler.jl go hand-in-hand with making the runtime more modular? For example, could we expect to have threads or at least being able to call C from statically-compiled Julia when something changes in relation with the runtime?

Static compilation is also related to the interface design problem that was at the core of Yuri’s criticism. That criticism was centered around the fact that some of the tools available in static languages are missing in Julia.

What do I think? Well, maybe writing a Julia package should impose some additional restrictions to a developer, like being forced to define an interface, while leaving all the freedom of a fully dynamic and interactive language at the top-level.

Of course, this could hurt Julia’s composability. However, in my case, that would be a small price to pay in order to get things like seamless Python/C++ interoperability and more safety checks.

From the point of view of a computational researcher, Julia 1.X has solved the two-language problem, that is clear to me. Before Julia, one couln’t even write a paper in my field without resorting to low-level languages (or hiding the running times). However, when the time comes to consolidate what you’ve accomplished into a universal library that is both great for users, and that can run anywhere, apparently we still have to rely on a monstrous thing like C++. It would be nice if we didn’t .

Anyway, thanks for reading, and I appreciate any comments.

johnmyleswhite · September 19, 2022, 8:56pm

What are the breaking changes needed to achieve your goals? If you don’t have to break things, you can avoid Julia 2.0 and make everyone’s life a lot better.

martin.d.maas · September 19, 2022, 9:19pm

Yeah, I don’t really know for sure if making Julia “slightly more static” or at least more compatible with static compilation would necessarily need breaking changes. It is possible that with a more modular runtime we could at least write new code that can be statically compiled (for example, avoiding the use of the GC while somehow being able to use threads, etc).

In the case of placing some limitations to package authors (like forcing them to declare an interface) that would break most packages.

My point is that even if there is a price to pay in terms of freedom or breaking changes, having some of the features that only static languages seem to enjoy would be anyway worth it.

davidanthoff · September 19, 2022, 9:48pm

Here is my take on this: for some projects I’m working on in Julia I would love if the language was more statically typed and if I had the benefits of that in terms of tooling and error reporting by the “compiler”. For other projects it is exactly the opposite, having the constraints of a statically typed language would be horrible.

I’ve wondered for a long time whether one could completely square these two needs by a linter that has an optional “strict” or “typed” mode. So, no changes to the language at all, but a linter mode that a) marks a lot of the dynamic things one can do in Julia right now as an error and b) then on the flipside lots of new cool features light up in editors that can only be reliably implemented in more statically typed languages, plus etc. But the actual code that one writes in this “strict” or “typed” mode is just normal Julia code that happens to not use some of the wilder dynamic features.

I could then just decide whether I want to turn that mode on for say a given package, or not for others.

Full disclosure: I haven’t really thought this through So, this might just not work, but I do think it would be a very interesting experiment that one could pursue entirely separately from any Julia 2.0 discussion.

Palli · September 19, 2022, 10:33pm

Julia doesn’t need 2.0 to compile to binaries, since it’s already possible.

As you may know PackageCompiler.jl makes it possible (and while I’ve not used it recently, I understand it’s much improved in recent years, or was it only since Nov 2.0 version, since I used it).

You may want to look into non-default options with that packages, and new non-default options in Julia 1.8:

New option --strip-metadata to remove docstrings, source location information, and local variable names when building a system image (#42513).

New option --strip-ir to remove the compiler’s IR (intermediate representation) of source code when building a system image. The resulting image will only work if --compile=all is used, or if all needed code is precompiled (#42925).

Poor support for static compilation

That should have all features, e.g. threading and GC, and Windows ok. But you can also look at, without that:

Tools to enable StaticCompiler.jl-based static compilation of Julia code (or more accurately, a subset of Julia which we might call “unsafe Julia”) to standalone native binaries by avoiding GC allocations and llvmcall-ing all the things!

martin.d.maas · September 19, 2022, 10:50pm

Hi Palli,

StaticCompiler.jl is only compatible with a small subset of Julia as of now. Check out their docs. For example, it is not compatible with anything that contains ccall

As for PackageCompiler, I simply couldn’t get it to work for several test cases (beyond trivial ones). I will try again to check if the recent improvements have fixed the problems I encountered, but one of them was extremely long compile times to the point I think my old laptop crashed, so I’m reluctant to try again.

Btw, for static compilation I actually meant StaticCompiler.jl. It is indeed the case that it can’t support all of Julia’s features, so major changes should be needed at some level (not sure which).

martin.d.maas · September 19, 2022, 11:23pm

A bit more background on StaticCompiler.jl.

From what I know, the way to continue to add features to it hinges on Mixtape.jl, which regrettably is far from being a core part of Julia, and got broken with Julia 1.8. So not only StaticCompiler.jl doesn’t support all of Julia, there are problems the other way around as well.

So even contributing to StaticCompiler.jl looks like something very hard to do.

ToucheSir · September 20, 2022, 12:04am

Thankfully that is not the case. The MixTape PR you’re referencing was just an exploration of how to create plugins for StaticCompiler. That said, I do believe folks are waiting for some compilation-related changes in Base to improve the flexibility of StaticCompiler (mostly how much of the language + stdlib is supported).

On PackageCompiler, the challenges I see talked about on these forums don’t always seem to be the same as the bugs reported in the issue tracker. Is there some unseen barrier to entry for opening issues there (e.g. lots of people compiling proprietary programs)?

martin.d.maas · September 20, 2022, 12:33am

Yes, I also remember this discussion. If anybody familiar with this matter could drop us a line I would be very grateful.

Lol, good point, I’m also guilty of this.

I think I might have something of an explanation. There is a very well-known fact: compiling anything with PackageCompiler takes several minutes even for a hello world application, as all of Julia has to be included. So it is no surprise that as soon as something doesn’t work as expected, people get frustrated and have close to nothing to report

(in my case I remember failing to compile Makie, trying to follow the instructions given in the latest JuliaConf, for example, so it was probably something silly like not having the proper version of Makie. I actually can’t even remember if I finally got it right, but I do remember that the experience was indeed frustrating).

Oscar_Smith · September 20, 2022, 1:28am

Yeah. There is an unfinished PR that speeds up PackageCompiler by like 2x, but no one has had time to finish it yet.

Benny · September 20, 2022, 2:03am

Someone help me out here, what sort of features would static-Julia have to give up?

GC was mentioned, but there are many statically-typed, AOT-compiled languages with GC, though there are other reasons to eschew GC. Off the top of my head, the one thing to give up is the JIT compiler intervening when dynamic dispatch runs into new call signatures. That’s not to say we can’t have any dynamic dispatch at all, but more care would have be put into proving there is a fixed number of call signatures.

As for static-like AOT compilation with no execution, my current understanding is that precompile statements and successful type inference of Julian generic methods should suffice for the vast majority of cases. The one case I can think of where more annotations could help is captured variables in closures; it’s a really tough ask to infer the type of a variable shared by two methods when only one method has been called. However, we already can achieve that inference with explicit type annotations, whether it’s of the captured variable or if we had refactored to callable types. I’m not sure how close static analysis can get, the type inference part of compilation is needed to get the type information written into statically typed source code.

So I can’t really think of many changes, let alone ones that warrant a major version revision.

martin.d.maas · September 20, 2022, 3:39pm

Are you sure about this? The only GC language I’m vaguely familiar with is Java, and you need to carry a runtime around. Actually, the way they made StaticCompiler to work is by locally disabling the GC… So I thought this was necessary. But maybe you are right, and the problem is that the GC just gets in the way.

In this case, a fully-featured Static-Julia would be theoretically possible… but it must be extremely hard to obtain in practice.

I just examined the issue tracker more closely. There is this issue about LoopVectorization causing troubles to PackageCompiler. This is funny because LoopVectorization works with StaticCompiler.

So apparently, the state of the art is that none of the approaches for static compilation supports the full language yet.

Ok, I guess I’ll try to collaborate with my preferred approach (StaticCompiler) as soon as those improvements in Base kick-in.

Thanks everyone for the insight!

Sukera · September 20, 2022, 3:56pm

“using a runtime” and “static compilation” are orthogonal concepts - for example, Go is statically compiled but also uses a runtime.

martin.d.maas · September 20, 2022, 4:53pm

Thanks.

I also found more answers on StackOverflow. For example, Eiffel has GC without a VM.

martin.d.maas · September 20, 2022, 5:25pm

There is also somebody actually trying to get the GC to work in StaticCompiler…

gbaraldi · September 20, 2022, 7:52pm

Go has GC if you were curious. And I guess languages that use Reference Counting sort of have GCs too(Swift). Even if they look quite different.

martin.d.maas · September 20, 2022, 8:27pm

It looks like I just referenced your work

I was basing my views of StaticCompiler on what StaticTools says to be doing: bypassing the GC to enable static compilation. But it seems that you have another very interesting approach.

Anyway, I really don’t know how to contribute here, as I have most of the time worked with compiled languages like C or Fortran where we have to manually allocate and deallocate memory, and obviously I don’t know much about GC.

I believe once these kinds of bottlenecks with StaticCompiler are overcome, I could jump back and help with all of the things that could be enabled by this, like automating the generation of Python bindings for Julia libraries, or C++ integration.

I believe this could be transformational for the Julia ecosystem, as if we target Python as a frontend, we could be writing libraries for thousands and thousands of users, some of whom might become interested in Julia.

gbaraldi · September 20, 2022, 8:34pm

The issue I encountered there related to the GC aren’t about language semantics. It’s basically an implementation issue, there is some data that is usually stored in the SysImg that we need that StaticCompiler does not put inside the binary. The issue is figuring out what that data is and how to store it in the binary.

Barget · September 27, 2022, 8:49am

I’d like to take the opportunity of this topic to share a thought that keeps wandering in my mind recently.

Considering that all code in Julia is eventually compiled & executed (correct me if this premise is wrong), couldn’t we just take advantage of launching a set of unit tests to get a compiled version all the methods we need, and extract them in the form of a static library? (by exploring all the function calls in the assembly code, identifying all the dependencies, etc.)

I’ve got the feeling I’m missing some key concept related to GC and runtime, though …
So I guess the question is more: why this approach is not that easy? Thanks for your lights

aplavin · September 27, 2022, 9:26am

I think this can fundamentally work for applications, while libraries are more complex. It typically doesn’t make sense to list all possible types: even very simple sum(first, [(a=1,), (a=2)]) vs sum(first, [(b=1,), (b=2)]) call sum and first with different types.

Topic		Replies	Views
Why is static compilation difficult General Usage	7	1849	September 14, 2023
Statically compiled and statically linked General Usage	35	9583	September 8, 2020
Julia static compilation General Usage question	39	14341	February 15, 2019
Building stand-alone helloworld produces many dlls, being 200MB big New to Julia	37	6105	November 18, 2021
Successful Static Compilation of Julia Code for use in Production General Usage	9	9909	June 10, 2022

About static compilation and static analysis

Related topics