When should a package move to the standard library or system image? StaticArrays, what is it?

I know I know about things moving out of Julia to packages, within reason. I am one of the people who heavily advocated for that back when Julia was batteries included. But let’s ground the discussion with some specifics. I was profiling a bunch of compile times and to my dismay I found the following:

julia> @time using StaticArrays
  0.436497 seconds (2.09 M allocations: 155.908 MiB, 0.51% compilation time)

Note that this occurs in a brand new REPL in Julia v1.7 for a package that has almost 2000 dependents. This is a package that almost all users have in a dependency tree somewhere, and by doing so they instantly bump to a noticable at least ~0.5 second delay. It only has standard library dependencies, it has no use of Requires.jl to blame, it just is.

Yes, the standard library is where packages go to slow down development, but is that really a bad case for StaticArrays in its current form? It’s already well-past its major development bump. It sees far less development than something like Base.LinearAlgebra. A good number of the PRs are just fixes for Julia version updates anyways since StaticArrays digs a bit lower than most packages should, so it would be nice if it was tracked by the Base CI.

Why is StaticArrays not in the Base system image, cutting out that load time for every user of 2,000 of the most common packages? If it is worthy of inclusion into the standard system image, where does it stop? ForwardDiff.jl has similar properties, though the usage numbers are not quite as high and it’s only adds 0.2 seconds:

julia> @time using ForwardDiff
  0.613780 seconds (2.66 M allocations: 189.512 MiB, 1.81% compilation time)

At what point is something standard enough that it should be part of the standard system image so that we can cut out its startup time? As some libraries become more used than some portions of the standard library, “never” seems like too strict of a rule.

34 Likes

If I understand correctly, the argument here is not that these packages actually need to be stdlibs, but rather than their startup time is large enough that it would be convenient if users could easily load a system image that includes those packages. Is that roughly accurate?

Instead of adding more stdlibs, I would instead suggest that the implementation of the following two features:

  1. Easy distribution of sysimages, perhaps via the existing Pkg server infrastructure. Similar to how it is easy for a user to download pre-built binary artifacts just by installing a JLL package, maybe we can make it easy for users to download pre-built sysimages that are specific to their platform.
  2. Ability to “combine” multiple sysimages. E.g. if I download one sysimage for StaticArrays.jl, a second sysimage for ForwardDiff.jl, and a third sysimage for Plots.jl, it would be nice if I could somehow load all three sysimages at the same time in the same Julia session. I don’t know enough about sysimages to know how feasible this would be from a technical point of view.

If we had those two features, then the workflow I envision is that when a user installs StaticArrays.jl, a platform-specific sysimage that contains StaticArrays.jl is automatically downloaded from the Pkg servers. And then, the next time that the user starts up Julia, that sysimage is automatically loaded, along with the default sysimage (that is shipped with Julia) as well as any other sysimages the user has chosen to enable.

16 Likes

I am divided on this one. Part of me always felt that StaticArrays.jl should be in Base because it is used everywhere in the scientific domain (they are the tuples that work with linear algebra). The other part of me resonates with what @dilumaluthge proposed, i.e. a better ecosystem for creating and consuming sysimages from project environments.

If we all had a mechanism to activate a project and quickly build a sysimage in package mode, that would solve most concerns people have about Julia out there:

] activate .
] sysimage

It solves the TTFP issue and gives users the freedom to customize sysimages locally instead of in a global repository of sysimages.

11 Likes

I like this workflow; it’s just two commands, and it is very straightforward.

Of course, one tricky part is that the user will need to have PackageCompiler.jl installed locally in order to be able to build sysimages. I don’t think we need to ship PackageCompiler with Julia by default. Instead, the first time that the user runs ] sysimage, if PackageCompiler is not available, I think we can just install PackageCompiler into a scratchspace or something like that.

The other tricky part is that the user needs to have a C compiler available locally. But again, we could just download that on-demand the first time the user runs ] sysimage.

Oh, and we also probably want to make ] sysimage a no-op if nothing has changed in the manifest since the last time the sysimage was built.

9 Likes

One of Julia’s advantages is that it empowers package developers to have the same abilities as the core developers, as much as possible. (e.g. + is just a regular function, etc.)

This is a good test case: can we enable third-party modules to offer as little using-latency as stdlib modules do, with as little hassle to users as stdlib modules do?

4 Likes

While adding precompile runscripts to this would be good, it seems to do the job for the simple libraries which have good precompilation. As a test, I’ve been looking at using DifferentialEquations times. Remove some background packages to greatly improve startup time by ChrisRackauckas · Pull Request #835 · SciML/DifferentialEquations.jl · GitHub chunks it down, but it still is rather high. But when I do the simplest system image:

using PackageCompiler
create_sysimage(["StaticArrays","ForwardDiff"],sysimage_path="sysimage2.so")

then I see it cut out that second:

# Before PR
julia> @time using DifferentialEquations
  8.694322 seconds (24.77 M allocations: 1.736 GiB, 7.27% gc time, 17.92% compilation time)

# After PR
julia> @time using DifferentialEquations
  5.761738 seconds (18.00 M allocations: 1.327 GiB, 4.54% gc time, 10.38% compilation time)

# After PR + sysimage
julia> @time using DifferentialEquations
  4.565475 seconds (14.80 M allocations: 1.108 GiB, 6.05% gc time, 8.73% compilation time)

So yes, DifferentialEquations.jl has other work to do (specifically due to the JuliaSIMD stack), but you can see that there’s about 1 second of using time that is added to the about one thousand libraries using just two rather core packages, StaticArrays.jl and ForwardDiff.jl. No library which uses a few standard pieces will get below 1 second using times without some fundamental change to those libraries or to the way we are shipping system images. This is why libraries that seem small still have noticeable latency.

Anyways, I’ll jump out again and let people debate the right solution, but I think real-world numbers always helps highlight true problems. Making sysimage usage as part of standard Pkg usage seems a little cumbersome to me: I think most users won’t know about it, and so the newbies who are still evaluating Julia will be the ones that pay the biggest price. I would like to see the default somehow get better, like a “scientific Julia” binary on the website or something. I can’t say my fix ideas are any good.

19 Likes

I love the idea. Just as Conda has a mini version and a full version. Maybe Julia can provide specialized versions dedicating to scientific computing or data processing.

5 Likes

I think this conda-like solution has been discussed before, and one of the major drawbacks that was raised at that time was the fact that it doesn’t help with reproducible science. People would have to specify a Julia version + a specific sysimage downloaded from a URL somewhere and things could easily get worst than they are today.

The solution with JIT compiled sysimages on the user side in package mode seems much more flexible and is also a first step towards a more integrated experience with PackageCompiler.jl. My impression is that not many people use the technology because it is not readily available.

So if no one is working privately on a major solution to TTFP, then making PackageCompiler.jl accessible via a Pkg.jl command like sysimage is a good first step. People can then evolve that and make it more automatic in the future with more aggressive caching, etc. Beginners shouldn’t need to run the command manually forever.

8 Likes

Then we can ask the same question for the packages used by vscode plugins and we find that if we move StaticArrays into stdlib, why not further add them to improve vscode’s latency.

Another problem is that, I think compilation speed can only become slower in the future version of Julia as compiler evolves and adds more and move optimization passes. This is only a temporary solution and doesn’t work at scale. The idea solution is moving to separate compilation but it really requires some non-trivial efforts…

But I personally think JuliaInterpreter and Revise should be added to stdlib. They are tightly coupled to Julia’s internal and they actually should be considered as part of the compiler.

2 Likes

Ideally the sysimage could be built reproducibly, so that anybody could generate the same sysimage from the specified input package versions/platform/arch.

I like the idea of adding StaticArrays to the stdlib. It is a simple and pragmatic solution. Sure sysimage features that solve the problem more generally would be amazing. But I suspect that is nontrivial and will not happen any time soon. So until we have these features shipping StaticArrays with julia is the way to go in my opinion.

7 Likes

I agree, adding StaticArrays.jl to the standard lib is the most realistic solution here. Better integration of PackageCompiler.jl seems nice, but we are not there yet.

We waste a lot of time waiting for StaticArrays.jl compilation, and working around including it in our small libraries.

It also means choices like “what object do we use to represent point data” are needlessly complicated.

13 Likes

I think that adding StaticArrays to the stdlibs would fix one small thing, it affects a lot of stuff, but it isn’t the problem. Code caching has to get better, I’m not sure what the solution to that is. Maybe the sysimages stuff can help.

9 Likes

Here’s hoping https://github.com/JuliaLang/julia/pull/42016 helps with code caching :slight_smile:

5 Likes

I couldn’t agree more that StaticArrays.jl should be part of the standard library. For me the compile time improvement isn’t even a necessary argument. The fact that static arrays are such a fundamental part of scientific computing is already good enough. From my experience in various different fields of physics (climate physics, condensed matter, nonlinear dynamics), static arrays are actually just as fundamentally important as linear algebra.

8 Likes

Just a note, that within VS Code we essentially have the ] sysimage command already, there is a command that the Julia extension provides to compile a custom sysimage for the currently active project, and then when there is one of those and you start a REPL it automatically loads it. I haven’t used it in a while since compiling sysimages is so slow and it never really was a convenient workflow for me, thought… Also, in particular having the automatic loading of such custom sysimages in Julia itself so that it also works in a standalone REPL would be great. I know that there were various proposals floating around at various times, but not sure where any of that went… Also, didn’t @Keno work on some infrastructure in Julia itself that was meant to make sysimage stuff easier down the road?

14 Likes

I totally agree, but consider it an argument to drop linear algebra from stdlib in the future (2.0?). Lots of programming language applications don’t need linear algebra, so why always bring all those fancy multiplications and decompositions? Even better if A * B and exp(A) threw an error for matrices without explicit using LinearAlgebra.

4 Likes

I am not a fan of StaticArray’s implementation approach, and would want to see things reimplemented with loops instead of unrolling, and then better compiler support for the cases where this doesn’t work out well.

Loops are much easier to analyze and paralyze than straight line code (including SIMD), so it’s an odd quirk of the compiler that unrolling everything sometimes/often produces faster code.
Unrolling everything does produce slower to compile code, as well as performance cliffs.

EDIT:
I recall @kristoffer.carlsson having a prototype for benchmarking, but couldn’t find it. Might be useful for benchmarking.
I’ve been starting to look into LLVM, so might be a fun project to start looking for some of the regressions, like the unnecessary copying of data when converting stack allocated MArrays into SArrays.

22 Likes

There is a solution that is not discussed yet here: make packages in the sysimage updatable without re-compiling sysimage. If we can update StaticArrays after julia is released, we don’t need to worry about loosing the opportunity to improve it. Furthermore, we can treat StaticArrays as non-@stdlib this way since we can stop shipping sysimage without StaticArrays and avoid breaking the stability guarantee.

A straightforward support for this may require tweaking the module loader. But my hunch is that it is much less intrusive than, say, making the custom system image workflow seamless (which would require touching the julia runtime; it’d be great but a challenging task) as Dilum suggested.

In fact, I think there’s a way to make it work even without touching the module loader. The idea is to have (say) StaticArraysImpl package in sysimage and re-export it from an external (not in sysimage) StaticArrays. The workflow may look like this:

  1. Suppose that StaticArrays’ master branch is at X.Y.Z-DEV (unreleased) when we feature freeze julia.
  2. We release StaticArrays X.Y.Z that re-exports StaticArraysImpl and then release X.Y.(Z+1) that does not depend on StaticArraysImpl (i.e., just like current StaticArrays).
  3. When releasing julia, include StaticArraysImpl with the content identical to StaticArrays X.Y.(Z+1) but the top-level module renamed to StaticArraysImpl from StaticArrays.

This way, as long as you use StaticArrays X.Y.Z, it is almost as fast as if it were in the sysimage. You can also upgrade or downgrade StaticArrays without re-compiling sysimage.

We probably still need to tweak Pkg anyway to favor StaticArrays X.Y.Z over other versions. I’m not familiar with Pkg’s resolver but maybe the easiest approach is to automatically pin StaticArrays to X.Y.Z whenever it is installed.

(I’m not suggesting that this is better than tweaking the module loader. The main purpose is to show that it is possible to make in-sysimage package updatable.)

I think an updatable in-sysimage package is a practical approach that does not sacrifice future possible improvements. As Chris Rackauckas says, it is great if we can help non-expert users automatically. This approach does not need anything from the users. At the same time, we can drop StaticArrays from the official julia release any time once Chris Elrod (or someone else) comes up with a better static array package.

This approach also can be applied to other packages. We can be more flexible at choosing which package should go into the system image since the set of these packages can be changed between minor releases.

12 Likes

StaticArrays used to load quickly but as more and more stuff got put into it, it got slower and slower. 5 arg mul! support really slowed it down IIRC. But that seems like something that should be brought up with the development of StaticArrays.jl and not an argument for putting something as a stdlib. It sets a weird precedent in that you make something slow and therefore it should be in the sysimage.

Also, I don’t think the load time of StaticArrays is something inherent to the library but more a consequence of the implementation.

A StaticArrays implementation in Base should not need to be more than a few hundred lines max, defining the very basic operations and then everything should work generically, just like with other abstract array. Some compiler optimizations to avoid intermediate mutable static arrays to build up the result might be needed. The current StaticArrays pretty much has a duplicate implementation of all the functions it support which isn’t really maintainable. There are many cases where the API slightly differs in what it supports due to that.

Yes, I can clean it up a bit and put it up. It’s not very complete but it was quite easy to find some cases where Julia could probably do better from it.

28 Likes