Fastest Julia startup challenge/smallest sysimage

@vchuravy or anyone who would like to help in the challenge.

  • sys.so shrinks from 173M to 96M
  • With julia --startup-file=no startup time improves to 77ms from 127ms on my machine

That’s 39% faster startup, with a 44% smaller sysimage, so to a first approximation, the startup time of Julia scales linearally with the sysimage size.

With that speedup I think it would really help Julia make top spot at the Debian benchmark game. My only requirement is that the sysimage/julia be usable for that. For now, I’m ok with simply dropping LinearAlgebra etc.

Since this is on the milestone for 1.10 only, I’m curious, would a sysimage made for it or some specific version work on older Julia (or even newer). Ideally I would want a sysimage that works for 1.9 (release seems around the corner), or even 1.8.

About:

Right now REPL is the only sysimage included […]

  • The current strategy of pre-loading stdlibs increases loading time to ~3s most of that time is spent in method_table_insert

I suppose that’s a typo for “only stdlib included”, and ~3s seems like a lot, so might that be a typo too for “~3ms”?

To make the sysimage smaller, REPL needs to go (but I’m also ok with just getting that sysimage in my hands to test as is). What’s already dropped is e.g. Printf used by some benchmark code, so I kind of want it back in…, though all such code could be rewritten to use println. Random is also out. You might want in (benchmark code however doesn’t need it since it insists on its on RNG routine). And Unicode is out, I think ok, so usable as is.

Downloads is out and LibGit2, so I’m not sure MbedTLS or any other security (crypto) code is in, in a usable state. I’m fine with that and actually want it all out long-term. I’m not sure if it helps to strip out .sos too much, for start-up speed but if anyone has a good idea, or actual profiling to show what the startup time is spent on, then that would be great too.

Once you remove the REPL stdlib it should automatically fallback to the simpler REPL in Base. And once you remove Base it should fall back to the REPL implemented in the C launcher.

I think removing Base would go a long way to speeding up start time and shrinking the SysImage.
You would then want to implement a few operations into a little Preamble.jl like stuff so iteration works, and getproperty etc, but i think you could still get benchmark game to run

2 Likes

I honestly can’t tell if it’s worth it, as a reference point, the startup time goes from 11x of Python to 7x of Python.

As for sysimage, I think people who care about this the most works on an embedded system? advancing AOT would be the real solution, not sure how many problems fall into this gap that just happen to get solved by 44% smaller sysimage

1 Like

Right, for embedded, and some codes, but I still also like Julia to be usable for scripting. It really is, and I’m not sure why startup is so very much slower than for Python (or Perl). The absolute minimum that needs to happen, before you start parsing your actual script code, shouldn’t be too different.

I suppose as you parse and compile the first bit of code a fair bit of the standard library comes into play, and it comes from the sysimage, which needs to be parsed and fully compiled (I think it may be fully precompiled in the sysimage on master already).

The aim is faster startup, more than just a smaller sysimage. I just thought the latter would be a big part of it. We are already competing against C, C++ and Rust, not just Python, which we usually win already in benchmarks. Just the startup isn’t too worrying vs Python, since we make up for it for sufficiently long-running/real-world code.

I did tests a few weeks back and found that DaemonMode.jl did almost 2x better than PackageCompiler.jl for reducing the latency of a script I was running multiple times, despite needing to communicate the script over a socket and the fact I attempted to compile the script when package compiling it.
PackageCompiler also takes a very long time to build the executable, vs just starting DaemonMode.

I haven’t commited to actually starting a Julia server, but I have thought about writing a systemd config file to start it automatically so I can do scripting with Julia.

Anyway, my point here is that improvements like that would also help the PackageCompiler approach vs DaemonMode, as well as bolster the argument vs Python & Co.

EDIT:
There are some important applications that would love a 44% smaller sysimage.
Some people want to collect a huge number of executables. For these groups, it makes a huge difference, and helps make the Julia pill easier to swallow vs something like C.
I still think it’d be cool to get StaticCompiler.jl working well enough for them, but those more familiar with the issue think it’s the wrong approach.

6 Likes

If I ever get a genie to grant me a wish, I would go back in time and make sure we have a LinearAlgebra.Matrix type distinct from Base.Array{T,2} (and similar for Vector). The fact that we conflate the two makes a few things a little weird. But to the conversation at hand, it also means that we basically always have to have LinearAlgebra loaded, if only for the purpose of reserving Base.:*, Base.:/, and Base.:\ (and a few others I’m missing?) on arrays without resorting to piracy. The current design decision means we can never excise LinearAlgebra (one of the heaviest standard libraries, as far as I understand) or even have it unloaded on the v1.x timescale.

Marginally less breaking (but still breaking) would be to add placeholder methods that would throw errors insisting you import/using LinearAlgebra for linear algebraic operations. Then we could keep BLAS etc unloaded on startup. LinearAlgebra would commit some pretty viscous piracy on import, however, and we’d only have the flimsy excuse that it’s a standard library and we’re sorry it ended up this way.

1 Like

I realize we have 2D+ arrays, e.g. for that, but many languages don’t (nor the benchmarks I have in mind…), don’t need that. When I say I want LinearAlgebra out, I mean that as a proof-of concept. Who knows, maybe we want to change some things for Julia 2.0. For many that need 3D arrays, or a similar structure possibly a DataFrame is the right abstraction anyway.

I’ve never used the operator \ so I wouldn’t miss it that much. :slight_smile: I’m not even sure it’s strictly needed, isn’t A \ B = B / A (and only for matrices)? Most languages do without it, what do they do instead?

I might want to keep ND arrays, and just hcat, vcat, and slicing, but for now even just 1D and chat makes sense for it only.

I’m not wishing we’d remove Base.Array and its support for arbitrary numbers of dimensions. I also wouldn’t propose to remove array functions like concatenation, indexing, or sorting (although sorting could go to a separable standard library, too). Rather, I’m wishing we’d stop treating AbstractArrays as vectors and matrices in the linear algebraic sense. For those behaviors, I’d want them wrapped in a LinearAlgebra-owned subtype of AbstractArray. Then, we wouldn’t need Base to own stuff like matrix multiplication and LinearAlgebra could be separated. But this amount of disruption could only be considered at v2.0 and even then could still be a tough sell.

In linear algebra, \ is usually more common than /. X = A \ Y is used to solve for X such that A * X == Y. X = Y / A is used to solve X * A == Y, which appears less frequently due to prevailing conventions. In any case, in linear algebra it holds that X \ Y == (Y' / X')', so one is easily done in terms of the other. But it’s convenient to have both in any system with non-commutative multiplication (ie, when A * B != B * A).

Since DaemonMode.jl is run via julia --startup-file=no -e 'using DaemonMode; runargs()', and all this julia needs to do is load DaemonMode would it not be even faster to run julia --sysimage /path/to/daemon-mode-sysimage.so --startup-file=no -e 'using DaemonMode; runargs()' where daemon-mode-sysimage could be the really stripped down sysimage we’re talking about in this thread, but with DaemonMode added in.

I imagine this would give the best (currently) possible performance in general.