Slow Julia startup time after sysimage creation (and an unbelievable observation!)

Hey Julianers,

I found this:

➜  julia-awesomeness git:(master) ✗ time julia -e ""         
julia -e ""  0,10s user 0,09s system 181% cpu 0,110 total

I built a sysimage and got these numbers:

➜  julia-awesomeness git:(master) ✗ time julia -e ""                      
julia -e ""  4,84s user 0,67s system 117% cpu 4,684 total

Can someone help me how to build a sysimage that has the same startup time like the original “julia”?

I think the solution to this could be a really big deal for anyone who use julia or coming from python.

imho: probably more modules ( 152 vs. 33 ) … more time ??

I don’t know

  • your julia environment ( 1.4? 1.5? )
    • the 1.5 is better
  • your number of Base.loaded_modules

so I tried to replicate your times … but your script is not worked for me …

So with your prefered packages - without package versions ( see my dockerfile ) …
I see 152 modules

root@88ce0d0df7d7:/# time julia -E "println(length(Base.loaded_modules));VERSION"

real	0m3.464s
user	0m3.504s
sys	0m0.938s

the default 33 modules time tested on docker run -it --rm julia:1.5 bash

root@5bf6a36abe91:/# time julia -E "println(length(Base.loaded_modules));VERSION"

real	0m0.528s
user	0m0.783s
sys	0m0.535s

the reference Julia 1.4 results is more ( ~4.363s ) … so the v1.5 version is better.

root@b2262a156039:/# time julia -E "println(length(Base.loaded_modules));VERSION"

real	0m4.363s
user	0m4.346s
sys	0m0.936s

As I know the latency is high priority … you can find more info in this video

1 Like

Thank you for the test command.
My version was 1.5.0-rc2. The test:

➜  ~ time julia -E 'print("Julia $(VERSION) with $(length(Base.loaded_modules)) module");""' 
Julia 1.5.0-rc2.0 with 152 module""
julia -E   7,89s user 0,72s system 110% cpu 7,795 total

I rerun the whole create_sysimage and now my timings are:

➜  ~ time julia -E 'print("Julia $(VERSION) with $(length(Base.loaded_modules)) module");""'  
Julia 1.5.0-rc2.0 with 152 module""
julia -E   1,53s user 0,66s system 161% cpu 1,357 total

This is crazy! :open_mouth:
What is going on?

One of my friend also tried with other versions of julia (1.4.2 and 1.5.0-rc1) and this works! We have to build the sysimage double times and then it dreceases the startuptime with 5-8x!!! :open_mouth:

Any master has any idea on this?

ouch :thinking:, I can’t replicate in my docker environments.
I have a near same results ( ~ same time )

Would you please re-create your method in a Dockerfile ?

Can you send me an example Dockerfile and the steps you use. So I can modify the workflow to reproduce this case.

jpct15base simple package recompile


real	0m0.474s
user	0m0.820s
sys	0m0.685s

jpct15t1 trace + create_sysimage


real	0m3.523s
user	0m3.618s
sys	0m0.942s

jpct15t2 trace + create_sysimage(traced.jl) + trace2 + create_sysimage2(traced2.jl)


real	0m3.479s
user	0m3.560s
sys	0m0.952s

jpct15c2 trace + 2x create_sysimage(traced.jl)


real	0m3.485s
user	0m3.566s
sys	0m0.882s

Is there a difference when starting julia with --trace-compile=stderr when using the two different sysimages. I’m thinking maybe the first one has some unresolved invalidations in it that causes recompilation. But that’s just a guess.

Also, if possible could you test on julia master.

it is not easy …
as I see some packages ( Plot ) not 1.6.0-DEV.572 (2020-08-02) compatible yet …
so we need to find - compatible packages …

@v1.6) pkg> precompile
Precompiling project...
[ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]
ERROR: LoadError: InitError: could not load library "/root/.julia/artifacts/08ab4b2de494e1d61e581ab1fbd610b0b4d1876f/lib/" cannot open shared object file: No such file or directory
  [1] dlopen(s::String, flags::UInt32; throw_error::Bool)
    @ Base.Libc.Libdl ./libdl.jl:114

Ok, I had to realise I am not good at docker managment. I had an error and didn’t have time to solve it.
ERROR: LoadError: InitError: could not load library "/root/.julia/artifacts/08ab4b2de494e1d61e581ab1fbd610b0b4d1876f/lib/"

I think you did it right.
I tested it on 2 computer with different setups and I had to double build to get this extreme speedup. Essentially I am just running this file twice:

I hope you can achieve the same!

I can say for the first build I had 4.5-5.0 second startuptime and after the second build I have 0.9s which is totally acceptable.

I will try to build the master, but I have to work now. Maybe tomorrow!

thank you for the tips.

I have added the --trace-compile=stderr to my test script …
the result:

interesting … lot of CUDA related precompile …

Do you also replace the default sysimage when running it the first time? So the second time you run PackageCompiler, it runs with the new sysimage?

I have updated my test script - see

And now testing this command

   docker run -it --rm $1   bash -c 'time julia \
              --trace-compile=stderr \
              -E "using \
                    Revise, JuliaFormatter, \
                    LanguageServer, \
                    PackageCompiler, \
                    EllipsisNotation, Parameters, JSON2, \
                    Distributions, \
                    OhMyREPL, \
                    Plots, \
                    Flux, Zygote; \
                   println(length(Base.loaded_modules));VERSION" \
              ' >>

real 2m20.329s only precompile :
real 0m3.822s precompile + trace + create_sysimage
real 0m3.610s precompile + 2x( trace + create_sysimage ) (a little faster!)
real 0m3.711s precompile + trace + 2x( create_sysimage )

2nd. run : ( cat | grep real )

real time
real 2m22.098s
real 0m3.637s
real 0m3.554s
real 0m3.558s

3rd. run :

real time
real 2m26.261s
real 0m3.753s
real 0m3.695s
real 0m3.662s

As I see … a little faster after the second sysimage creation ( 3. and 4. row)

On the other hand …
original: 140sec vs. sysimage: 4sec
imho ~35x faster in this extreme case !!!

my test environment:

  • Thinkpad T480s i5; ubuntu20.04 ; Turbo boost OFF --> 1.6Ghz )

Yes-yes. That could be the key. But you can run the same file I run:

And remember you can always restore the state! :

using PackageCompiler

My script was tried by some of my friends so I can tell you it is independent from the julia version and the hardware.

I really suggest everyone to run that script two times in a row and just check it out, it will be damn fast.

@ImreSamu did you try the script I linked and it didn’t bring the speedup?

as I see now the precompile.jl is exist

  • but empty … is it correct?

I have added 4 extra test cases … all Dockerfile ( 8 ) + uploaded

my result:

real 1m59.897s ( only with precompile )
real 0m3.510s ( trace + sysimage )
real 0m3.469s
real 0m3.442s

with sysimage_awesome.jl
real 0m10.778s ( run 1x sysimage_awesome )
real 0m3.481s ( run 2x sysimage_awesome )
real 0m3.444s ( run 3x sysimage_awesome )
real 0m3.452s ( run 4x sysimage_awesome )

the sysimage_awesome first start time is high (10.7s ) !
and sysimage_awesome second run is similar with my original start times. ( ~ 3.5s)

strange … but I don’t know the “Why?”

in the official doc :
there is a --trace-compile part …

The first goal is to have Julia emit the functions it compiles when running OhMyREPL. To this end, start Julia with the --trace-compile=ohmyrepl_precompile.jl flag.

but this part is missing from your script …

  • no --trace-compile=precompile.jl
  • and the precompile.jl is empty

Thank you for pointing out!

Thank you for the tests, as I see, there is a 10.7s and a 3.48s about 3x speed increase, which can actually mean it is working in your case!
I measure my speedups on a:

AMD® Ryzen threadripper 1950x 16-core processor × 32
128GB DDR4 4 channel
3200MBit/s 1TB SSD
Ubuntu 20.04.

Your speedup from 10.8s to 3.5s
Mine is from 7.8s to 1.3s which is like a 1.5 times more speedup and maybe could be associated with the hardware diff, I guess.

IF I test the simple : 'time julia -e "" ' command ( as you suggested in the first post )
THEN the real times:
real - with tracing
real 0m0.239s ( only with precompile )
real 0m2.476s ( trace + sysimage )
real 0m2.438s
real 0m2.413s
real - sysimage_awesome
real 0m9.644s ( run 1x sysimage_awesome )
real 0m2.457s ( run 2x sysimage_awesome )
real 0m2.427s ( run 3x sysimage_awesome )
real 0m2.430s ( run 4x sysimage_awesome )

so 9.6s/2.4s = 4x ( and yours 7.8/1.3 = 6x )