Building a PC optimized for "time to first plot"

This really does not ring true. There is basically zero technical or cultural difference between the system image that comes prebuild with julia and a custom sysimage you make yourself. If a julia library works reliably when julia is loaded with its default sysimage, it would work reliably with a custom sysimage. You need a truly weird or broken library for it to not work with a custom sysimage. There is nothing special about how the base libraries are written in order to make them work with the default sysimage.

Edit: the only caveat I can think of is that you need to rebuild sysimages when you update libraries.

1 Like

Yes. Technically the only way I know how to break it is to have hardcoded absolute references to binaries. That does not include JLLs, since the Binary builder system is all systems image safe. But if you sideload some weird binary and then const the reference in the package, then the system image (which is relocatable) will not find the binaries when you relocate it. This used to be an issue of shipping GR with Pumas, but of course is fixed since 2020 when it was changed to JLLs.

So yeah, you can break it with some not too weird things, but generally those are things you shouldn’t be doing in a library anyways since it would have other issues.

For reference, some complicated wrapper packages like Sundials.jl got system image friendly with 0 targeted work and just using BinaryBuilder as documented, so I don’t think it’s unreasonable to say you don’t have to go out of your way to support it.

(This excludes the system image too big bug, which is just a bug which should go away soon)

1 Like

In theory, there’s no technical problems and you can always find plenty of workarounds. For example, if something is non-relocatable (like Regex and BigFloat), we can add a constructor hook to rebuild values instead of hard-coding them in the compiler. Still, you need sufficient toolings to rule out all the strange things, otherwise people will be disappointed if they only get a failure after waiting for a long time.

This problem is partially mitigated by precompilation. Since the introduction of precompilation actually breaks the semantics of Julia. So if your package cannot be precompiled, then I guess it’s classified as “work unreliably”. Then things become easier for system image.

But anyway, you do point out the essential problem of whole system image : rebuilding can never be incremental or separate, since the just-work logic doesn’t naturally extend to the more complicated situation like method or module redefinition. And rebuild will happen every time you change your package environment. So it means that either the current way of system image will be replaced by a better way and be integrated into the language or we have to suffer from the inconvenience forever.

I am surprised you are saying this as if there is a gotcha in the claim. I would have thought that “package has broken precompilation” is certainly included by everyone in the list of what makes a package “broken and/or unreliable”. This is probably the root of our misunderstanding.

You are now also raising an unrelated point: that rebuilding is not incremental / separate. Sure, it is annoying that you need to rebuild your sysimage, but that seems pretty normal: I do recompile my C/C++/Rust code when I change it for instance. But things are even better for Julia’s future: that type of recompilation would not be necessary for much longer as a set of developers (e.g. Keno) have prototypes for incremental sysimage (re)builds and another set of developers (e.g. Tim and Valentin) have prototypes that cache precompiled code much better even without a sysimage.

2 Likes

No, No. I would not be surprised that you guys broadly accept this claim, in order to rule out those “strange packages” and to pave way for static compilation. But still, people need to fight for precompilation problems, as some packages cease to precompile and segfaults, even if they work pretty fine in the absence of precompilations. And don’t forget that precompilation is a rather opaque and only recently documented feature.

And I also make a prototype on incremental compilation in Julia, BuildSystem.

And I have only one question : why do you believe that any of these solutions can address the latency problems and maybe achieve something better than static languages, which is well-tested and deployed in large-scale? Do you try any of the prototype and indeed observe a huge speedup, or you get the conclusion from some theoretical reasonings?

2 Likes

Oh, that BuildSystem is really cool! I remember when it was announced!

I am out of my depth to give any theoretical reasoning, I have just observed my usecases having latency of 2 minutes (3 years ago) and a latency of <1 second (today). The only change is that today I have a workflow that overlaps with the style of some C/C++/Rust codebases I have worked on, where you do need to do explicit compilation (while NOT losing the dynamic nature of Julia, as anything new can still be compiled after you launch).

From my (outsider, not-particularly-well-informed) point of view, the reason this was so difficult was because Julia needed (1) multimethods and (2) aggressive just-ahead-of-time compilation and devirtualization which means that you either (A) need to compile your code for an exponentially large number of type combinations or (B) need to have incredibly smart compilation caches or (C) have a simple compilation cache that suffers from severe TTFX. I think today we are seeing the ecosystem truly move from C to B (A was never an option). And B is possible thanks to an immense amount of work from Tim, Valentin, Keno, and at least a handful more people whose names I do not know (it is fascinating to be a fly on the wall on their public pull requests and see the work they are doing). And I would say the BuildSystem you mentioned is definitely part of that work.

3 Likes

A minor comment/addition: I added SnoopPrecompile to one of my packages for the first time the other day and did exactly that: I naively put some standard workflow into the snoop-precompile block. In particular, one of the functions (a main function of the package) does a lot of printing and another function only works on linux (in which case it is an important function). This had the effect that

  • if the package precompilation happened on using MyPackage (and not as part of ] instantiate/] add or ] precompile, which is arguable the more common case) the user would now see lots of printing due to snoop precompile running my “standard workflow”. I think this is very unfortunate and annoying.
  • the package wouldn’t even precompile on Windows / macOS anymore, because snoop-precompile would try to run my “standard workflow” which included the function that only works on Linux. (I now added an @static if Sys.islinux to fix it.)

The point I want to make is that snoop-precompilation actually runs the “standard workflow” and this can have consequences that one didn’t have to care about with standard precompilation (i.e. before/without SnoopPrecompile). Otherwise it was great and reduced TTFX (although it was more of a test than anything else for me in this package).

4 Likes

Hi, I just opened an issue in SnoopPrecompile. For me it makes absolutely no difference, other than taking longer to precompile.

I might be screwing somewhere but this is my experience so far.

If you time it, are you LLVM time dominated? It right now only does the Julia inference time, and then the LLVM time is the system image, and the coming changes are to address that. But if you use the SnoopCompile analysis to tell you the division there then it will let you know much to expect (like is shown in the blog post). Things that use a lot of ForwardDiff and StaticArrays are more LLVM time dominated

1 Like

I don’t think the goal. If Julia’s compile times are just “good enough” then it’s the dream language. It doesn’t need to be a static language on static language terms, but it should cache enough to really “feel like Python”. Then “being close enough to C in speed” is killer. From my tests and timing, even just catching what the system image can catch today is more than enough to achieve that.

3 Likes

If I understand your question correctly, no it’s not LLVM time dominated but instead inference dominated. See the SnoopePrecompile issue where I added more info, namely why I say the lag is almost all due inferences.

Just double checking, when you say almost no invalidations, all that exist have small numbers of children and don’t hit any of the major methods? Number of invalidations doesn’t matter as much as what gets invalidated (though often a strong sign is that it has a ton of children and is invalidating the whole world). In a flamegraph, does it look like there is one major call taking all of the time? That is usually a sign to look for it’s subcalls in the invalidation results.

julia> tinf = @snoopi_deep plot(rand(5,2))
InferenceTimingNode: 7.986517/8.332188 on Core.Compiler.Timings.ROOT() with 10 direct children

And this is the flame graph

And also, thanks for looking into this.

Okay, that’s an odd one. It almost seems like the time isn’t attributed to any of the calls but also isn’t LLVM time? That might be one for Tim.

Well, this depends on the dreams one has. Add reasonable small executable and I am with you, despite that I would still prefer the statically compiled small executable in general.

Did you try printing to devnull? That should exercise the code paths (at least partially?), without side effects.

No I haven’t (yet), I had exactly the same idea though. But my point wasn’t that one can’t “fix” it but that one has to “fix” it in the first place.

For the specific case of printing, maybe it would even be worth to consider changing this on the SnoopPrecompile side, i.e. redirecting all printing to devnull or similar. (I can’t imagine a scenario where one wants print output during precompilation.)

4 Likes

It would be highly ironic if you consider to buy a superfast computer for the high-performance language Julia, because it’s slow at compiling!

With my trick for faster recompilation, I seem to have converted one Common Lip user to use Julia: not-common-lisp-to-julia.org · GitHub

You can have 20 KB Julia executables, but those are limited (e.g. no GC), with Python most or all of the important limitations are lifted (StaticCompiler.jl doesn’t yet support Windows, so I’m not sure this works there, even if you use Python for the Windows compatibility):

I hope my PlotlyLight.jl suggestion helped, any plotting library (and more code) is likely excluded with that approach though, but if you do the plotting from the Python side, then something similar is I guess available.

This is new to me. When you say “executable” do you include libraries under this term? I think of deployable, small, self sufficient executables. I believe that there is something under development (or perhaps better under experimental investigation) and I hope (=strongly believe) that it will come some day for Julia language with some constraints of course needed.