(Unofficial) Julia 1.9 for lower latency (startup)

The timing for (not unofficial) 1.9 master shows 7.6% faster startup than for Julia 1.7.0:

$ hyperfine '~/Downloads/julia-a60c76ea57/bin/julia -e ""'
Benchmark 1: ~/Downloads/julia-a60c76ea57/bin/julia -e ""
  Time (mean ± σ):     191.2 ms ±  14.4 ms    [User: 179.9 ms, System: 303.4 ms]
  Range (min … max):   168.6 ms … 213.0 ms    14 runs

$ hyperfine 'julia -e ""'
Benchmark 1: julia -e ""
  Time (mean ± σ):     206.0 ms ±  29.6 ms    [User: 165.5 ms, System: 122.3 ms]
  Range (min … max):   182.4 ms … 278.2 ms    10 runs

Since the “System” time is more than the total time, it implies (2?) threads used. And note, with the timing below using time I get very different numbers for “sys”, so I assume it might mean average thread time for here presumably 2 threads.

How could the startup time be reduced further? I was thinking about compiling my own Julia (already done, not shown here), and throwing out as much as possible, stuff not needed, such as LinearAlgebra, maybe Threads and basically everything from Base, not used by Julia itself.

Note, already thrown out in Julia 1.9 from the sysimage are e.g. Statistics, DelimitedFiles, or so I thought. Whatever the reason(s) for the speedup, that and/or some other, the sysimage is actually larger:

232731608 jún 25 17:00 sys.so
32830120 jún 25 16:48 libopenblas64_.0.3.20.so

vs in 1.7.0:
199483960 nóv 30 2021 sys.so
31736520 nóv 30 2021 libopenblas64_.0.3.13.so

Those are the big-ticket items to reduce (sys.so), or eliminate e.g. LinearAlgebra/libopenblas64. I’ve yet to profile anything (I recall from JuliaLang issue, it’s been done). Can anyone tell me where to look in the code about removing e.g. that .so and best tools to profile, or point to that forgotten issue.

A. I’m thinking of doing this unofficial (breaking) Julia 2.0, not as a hostile takeover, but to explore how much can and should be taken out, but still be useful for scripts and benchmarks such as Debian Benchmark Game (some scripts there require threads… at least one GMP/BigInt, but none LinearAlgebra).

B. I’m also considering implementing some of the changes from the 2.0 milestone (any ideas?), some that seems sensible, at least if faster, and also removing Dict from Base… i.e. changing to a better (for Julia) unexported version. I suspect it only needs small Dicts, not a scalable Dict implementation.

$ time julia --startup-file=no -O0 -e "println(\"Hello world\")"
Hello world

real	0m0,216s
user	0m0,194s
sys	0m0,069s

$ time ~/Downloads/julia-a60c76ea57/bin/julia --startup-file=no -O0 -e "println(\"Hello world\")"
Hello world

real	0m0,192s
user	0m0,160s
sys	0m0,156s
$ hyperfine '~/Downloads/julia-a60c76ea57/bin/julia --startup-file=no -O0 -e "println(\"Hello world\")"'
Benchmark 1: ~/Downloads/julia-a60c76ea57/bin/julia --startup-file=no -O0 -e "println(\"Hello world\")"
  Time (mean ± σ):     190.1 ms ±  14.6 ms    [User: 177.9 ms, System: 291.8 ms]
  Range (min … max):   170.7 ms … 208.6 ms    14 runs

$ hyperfine 'julia --startup-file=no -O0 -e "println(\"Hello world\")"'
Benchmark 1: julia --startup-file=no -O0 -e "println(\"Hello world\")"
  Time (mean ± σ):     213.0 ms ±  17.8 ms    [User: 183.2 ms, System: 140.9 ms]
  Range (min … max):   184.9 ms … 236.4 ms    12 runs
1 Like

check this PR:

1 Like

The sharpest constraint we have is that we cannot remove any stdlibs that are direct or indirect dependencies of Pkg. (Because otherwise, you cannot install any external packages.)

@DilumAluthge, why is that? I want to excise as much as possible (even Pkg). It seems that a stdlib must only depend on stdlibs? Is that restriction really needed?

For (very simple) scripts (just as an experiment), I do not need Pkg (or the REPL). I WOULD still like to have Pkg available…, just as an ordinary package. It seems that might be hypothetically possible, except you would have the problem of installing it first… If it came with Julia (in the “mere aggregation” sense), it seems like I should possibly be able to use it, by somehow pointing to it.

You list 30 dependencies of Pkg (actually 29…) that all (plus even LinearAlgebra) could go, in particular (to not need to handle security issues in Julia) LibCURL_jll, LibGit2, LibSSH2_jll, MbedTLS_jll, MozillaCACerts_jll (and more Sockets?), with the exception of (probably) Unicode (and maybe Random because of Threads, that I’m conflicted about dropping).

Was discussed here (but don’t want to discuss there, since not only on DelimitedFiles):

Recipe for moving thing out while retaining history: JuliaAI/MLJOpenML.jl#1

Do you think it will help with startup time to get any or all of those stdlibs out? Or do they have NO impact until actually used? If they do slow down is that because of precompilation in the sysimage (that could be gotten rid of, still keeping the stdlibs).

“-10% … -19% with PGO+LTO” is nice (will use when PR merged), in addition to the 7.6% speedup already, but I’m aiming for something closer to the 98% speed-reduction you get by using Perl.

I know getting that far is unrealistic, but I want to know where the extra time is spent, when actually not compiling ANY code:

$ time ~/julia-1-9-DEV-a60c76ea57/bin/julia --startup-file=no --compile=min -e ""

vs.

$ time ~/perl -c ""

A lot of it is loading code from the sysimage.

Right, thanks, why I want to radically reduce it. It’s just unclear to me if stdlibs go there. I think not, except if some of them are precompiled. E.g. openblas[64] is just machine code binary, but some wrapper code exists to use it, that goes in there. Do you have any idea what might be the largest single factor?

Another workaround, and WHY loading the sysimage is slow, is that it’s not fully compiled to machine code (I think, or historically, same for packages). If that is changing (already PR in that stores machine code with?), then it’s unclear why not much faster, if not instant:

For .so files, to use them, is it like memory mapping, very quick, and I don’t need to worry about the size, until I do something more than the bare minimum that I need to do first.

now Julia has a multi-microarchitecture: (3x - X86 ):

probably 4 separated X86 microarchitecture will be leaner ( at least on disk )

  • x86-64-v1
  • x86-64-v2
  • x86-64-v3 (AVX2)
  • x86-64-v4 ( AVX512)
1 Like

How about a ramdisk? :- ) But seriously, and this is slightly off topic, so I hope you don’t mind. Do you know what is the current state of affairs re BLAS settings for Julia 1.8 and even more up to date versions? I’m recalling I’ve been reading about some significant changes planned (link), however, when I skimmed RC notes I can’t recall any particular words on this topic, thus the question.

I’m not up-to-speed on BLAS, and not looking much since I want to drop it (i.e. OpenBLAS, @Elrod has a substitute).

@ScottPJones may not be active here anymore, but he made a “Julia-lite” in 2015 (the jlite branch, he also has e.g. more recent lite branch from 2016, and his master is non-lite), that wasn’t used much as far as I know, ahead of his time…

He did drop LinearAlgebra/BLAS, from the sysimage at least. First look at his base/exports.jl it looked like he dropped e.g. Dates too, then I see he just rearranged, and LinAlg still, there.

So I’m looking at doing a more recent version of this (and maybe compile this, and also time some other old versions):

Sounds like a great idea re OpenBLAS substitute and thanks for the info. As for OpenBLAS itself I’ll try to check at github.

Building a sysimage without any stdlibs is very easy with PackageCompiler. Just use the filter_stdlibs argument and an empty project.

6 Likes

I had removed more than the things that were later moved out to stdlibs.
In particular, I removed most anything that required extra libraries (besides LLVM, of course).
BigInt and BigFloat, for example.
Regex support could really be moved out as well, while there are a few uses of regexes in Base Julia, they are generally very simple patterns that would actually be more efficient written without regexes.

I hadn’t done any more on this, because after most of the “kitchen sink” items were moved out to stdlib, it became a lot easier to deal with Julia on things like the Raspberry Pi (also, faster, more powerful Pi’s came out :slight_smile: )

1 Like