Hey Julianers,
I found this:
➜ julia-awesomeness git:(master) ✗ time julia -e ""
julia -e "" 0,10s user 0,09s system 181% cpu 0,110 total
I built a sysimage and got these numbers:
➜ julia-awesomeness git:(master) ✗ time julia -e ""
julia -e "" 4,84s user 0,67s system 117% cpu 4,684 total
Can someone help me how to build a sysimage that has the same startup time like the original “julia”?
I think the solution to this could be a really big deal for anyone who use julia or coming from python.
imho: probably more modules ( 152 vs. 33 ) … more time ??
I don’t know
- your julia environment ( 1.4? 1.5? )
- your number of Base.loaded_modules
so I tried to replicate your times … but your script is not worked for me …
So with your prefered packages - without package versions ( see my dockerfile ) …
I see 152 modules
root@88ce0d0df7d7:/# time julia -E "println(length(Base.loaded_modules));VERSION"
152
v"1.5.0-rc2.0"
real 0m3.464s
user 0m3.504s
sys 0m0.938s
the default 33 modules time tested on docker run -it --rm julia:1.5 bash
root@5bf6a36abe91:/# time julia -E "println(length(Base.loaded_modules));VERSION"
33
v"1.5.0-rc2.0"
real 0m0.528s
user 0m0.783s
sys 0m0.535s
the reference Julia 1.4 results is more ( ~4.363s ) … so the v1.5 version is better.
root@b2262a156039:/# time julia -E "println(length(Base.loaded_modules));VERSION"
152
v"1.4.2"
real 0m4.363s
user 0m4.346s
sys 0m0.936s
As I know the latency is high priority … you can find more info in this video
1 Like
Thank you for the test command.
My version was 1.5.0-rc2. The test:
➜ ~ time julia -E 'print("Julia $(VERSION) with $(length(Base.loaded_modules)) module");""'
Julia 1.5.0-rc2.0 with 152 module""
julia -E 7,89s user 0,72s system 110% cpu 7,795 total
I rerun the whole create_sysimage and now my timings are:
➜ ~ time julia -E 'print("Julia $(VERSION) with $(length(Base.loaded_modules)) module");""'
Julia 1.5.0-rc2.0 with 152 module""
julia -E 1,53s user 0,66s system 161% cpu 1,357 total
This is crazy!
What is going on?
One of my friend also tried with other versions of julia (1.4.2 and 1.5.0-rc1) and this works! We have to build the sysimage double times and then it dreceases the startuptime with 5-8x!!!
Any master has any idea on this?
ouch , I can’t replicate in my docker environments.
I have a near same results ( ~ same time )
Would you please re-create your method in a Dockerfile ?
Can you send me an example Dockerfile and the steps you use. So I can modify the workflow to reproduce this case.
jpct15base simple package recompile
33
v"1.5.0-rc2.0"
real 0m0.474s
user 0m0.820s
sys 0m0.685s
jpct15t1 trace + create_sysimage
152
v"1.5.0-rc2.0"
real 0m3.523s
user 0m3.618s
sys 0m0.942s
jpct15t2 trace + create_sysimage(traced.jl) + trace2 + create_sysimage2(traced2.jl)
152
v"1.5.0-rc2.0"
real 0m3.479s
user 0m3.560s
sys 0m0.952s
jpct15c2 trace + 2x create_sysimage(traced.jl)
152
v"1.5.0-rc2.0"
real 0m3.485s
user 0m3.566s
sys 0m0.882s
Is there a difference when starting julia with --trace-compile=stderr
when using the two different sysimages. I’m thinking maybe the first one has some unresolved invalidations in it that causes recompilation. But that’s just a guess.
Also, if possible could you test on julia master.
it is not easy …
as I see some packages ( Plot
) not 1.6.0-DEV.572 (2020-08-02)
compatible yet …
so we need to find - compatible packages …
@v1.6) pkg> precompile
Precompiling project...
[ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]
ERROR: LoadError: InitError: could not load library "/root/.julia/artifacts/08ab4b2de494e1d61e581ab1fbd610b0b4d1876f/lib/libavdevice.so"
libx264.so.157: cannot open shared object file: No such file or directory
Stacktrace:
[1] dlopen(s::String, flags::UInt32; throw_error::Bool)
@ Base.Libc.Libdl ./libdl.jl:114
Ok, I had to realise I am not good at docker managment. I had an error and didn’t have time to solve it.
ERROR: LoadError: InitError: could not load library "/root/.julia/artifacts/08ab4b2de494e1d61e581ab1fbd610b0b4d1876f/lib/libavdevice.so"
I think you did it right.
I tested it on 2 computer with different setups and I had to double build to get this extreme speedup. Essentially I am just running this file twice:
https://github.com/Cvikli/julia-awesomeness/blob/master/sysimage_awesome.jl
I hope you can achieve the same!
I can say for the first build I had 4.5-5.0 second startuptime and after the second build I have 0.9s which is totally acceptable.
I will try to build the master, but I have to work now. Maybe tomorrow!
thank you for the tips.
I have added the --trace-compile=stderr
to my test script …
the result:
interesting … lot of CUDA
related precompile …
Do you also replace the default sysimage when running it the first time? So the second time you run PackageCompiler, it runs with the new sysimage?
I have updated my test script - see output.md
And now testing this command
docker run -it --rm $1 bash -c 'time julia \
--trace-compile=stderr \
-E "using \
Revise, JuliaFormatter, \
LanguageServer, \
PackageCompiler, \
EllipsisNotation, Parameters, JSON2, \
Distributions, \
OhMyREPL, \
Plots, \
Flux, Zygote; \
println(length(Base.loaded_modules));VERSION" \
' >> output.md
real 2m20.329s
only precompile :
real 0m3.822s
precompile + trace + create_sysimage
real 0m3.610s
precompile + 2x( trace + create_sysimage ) (a little faster!)
real 0m3.711s
precompile + trace + 2x( create_sysimage )
2nd. run : ( cat output.md | grep real
)
real |
time |
real |
2m22.098s |
real |
0m3.637s |
real |
0m3.554s |
real |
0m3.558s |
3rd. run :
real |
time |
real |
2m26.261s |
real |
0m3.753s |
real |
0m3.695s |
real |
0m3.662s |
As I see … a little faster after the second sysimage creation ( 3. and 4. row)
On the other hand …
original: 140sec vs. sysimage: 4sec
imho ~35x faster in this extreme case !!!
my test environment:
- Thinkpad T480s i5; ubuntu20.04 ; Turbo boost OFF → 1.6Ghz )
Yes-yes. That could be the key. But you can run the same file I run:
https://github.com/Cvikli/julia-awesomeness/blob/master/sysimage_awesome.jl
And remember you can always restore the state! :
using PackageCompiler
restore_default_sysimage()
My script was tried by some of my friends so I can tell you it is independent from the julia version and the hardware.
I really suggest everyone to run that script two times in a row and just check it out, it will be damn fast.
@ImreSamu did you try the script I linked and it didn’t bring the speedup?
as I see now the precompile.jl
is exist
- but empty … is it correct?
I have added 4 extra test cases … all Dockerfile ( 8 ) + test.sh uploaded
my result:
reference
real 1m59.897s ( only with precompile )
real 0m3.510s ( trace + sysimage )
real 0m3.469s
real 0m3.442s
with sysimage_awesome.jl
real 0m10.778s ( run 1x sysimage_awesome )
real 0m3.481s ( run 2x sysimage_awesome )
real 0m3.444s ( run 3x sysimage_awesome )
real 0m3.452s ( run 4x sysimage_awesome )
the sysimage_awesome first start time is high (10.7s ) !
and sysimage_awesome second run is similar with my original start times. ( ~ 3.5s)
strange … but I don’t know the “Why?”
in the official doc : https://julialang.github.io/PackageCompiler.jl/dev/examples/ohmyrepl/
there is a --trace-compile
part …
The first goal is to have Julia emit the functions it compiles when running OhMyREPL. To this end, start Julia with the --trace-compile=ohmyrepl_precompile.jl
flag.
but this part is missing from your script …
- no
--trace-compile=precompile.jl
- and the
precompile.jl
is empty
Thank you for pointing out!
Thank you for the tests, as I see, there is a 10.7s and a 3.48s about 3x speed increase, which can actually mean it is working in your case!
I measure my speedups on a:
AMD® Ryzen threadripper 1950x 16-core processor × 32
128GB DDR4 4 channel
3200MBit/s 1TB SSD
Ubuntu 20.04.
Your speedup from 10.8s to 3.5s
Mine is from 7.8s to 1.3s which is like a 1.5 times more speedup and maybe could be associated with the hardware diff, I guess.
IF I test the simple : 'time julia -e "" '
command ( as you suggested in the first post )
THEN the real times:
real - with tracing
real 0m0.239s ( only with precompile )
real 0m2.476s ( trace + sysimage )
real 0m2.438s
real 0m2.413s
real - sysimage_awesome
real 0m9.644s ( run 1x sysimage_awesome )
real 0m2.457s ( run 2x sysimage_awesome )
real 0m2.427s ( run 3x sysimage_awesome )
real 0m2.430s ( run 4x sysimage_awesome )
so 9.6s/2.4s = 4x ( and yours 7.8/1.3 = 6x )