Parallel loading of packages, to help e.g. Queryverse (VegaLite), time-to-first plot

First I want to say, time-to-first-plot isn’t really an issue (I consider it a solved problem if you know what you’re doing, using right packages/options), e.g. in about a sec. on Julia 1.6 (you can excpect this speed soon, on default options in 1.6):

$ julia -O1 --compile=min -q
julia> @time using VegaLite
  0.902355 seconds (1.20 M allocations: 83.260 MiB)

julia> @time @vlplot(
               x={field="a", type="ordinal"},
               y={field="b", type="quantitative"}
  0.260332 seconds (355.97 k allocations: 21.581 MiB, 12.28% gc time)

This is straight from the docs, and I just added @time (and started Julia differently).

I’ve been working on, or should I say looking into, faster loading of packages, and there are still problems (while time has been cut in half); when you have lots of dependencies. Which brings me to the meta-package (or other similar packages) for VegaLite, that e.g. gets you the slow loading VegaDatasets too. People may have been conditioned to use that “get me the kitchen-sink” Queryverse metapackage, thinking time-to plot slow, and it’s worth exploring what can be done.

It’s actually by rather ok (compared to 30 sec on 1.4.0), if you’re ok with non-default:

$ julia -O1 --compile=min
julia> @time using Queryverse
  3.108088 seconds (4.24 M allocations: 288.382 MiB, 2.35% gc time)

but on default:
julia> @time using Queryverse
 10.728247 seconds (15.29 M allocations: 922.528 MiB, 3.43% gc time)

I opened an issue about parallel loading, but it was closed as not specific enough. Here’s are more details on what I have in mind. Loading those dependencies on default settings seem very fast, except my implementation of the idea doesn’t work [EDIT: My code works now, and I’ve replaced the example here, see also my other post below.]

$ time julia parallel_test.jl  # amended Queryverse.jl code

real	0m0,337s  # these numbers are not valid, see in post further down
user	0m0,663s
sys	0m0,435s

$ cat parallel_test.jl
using Reexport


t1 = @async @eval @reexport using DataValues
t2 = @async @eval import IterableTables
t3 = @async @eval using Query
t4 = @async @eval using DataTables
t5 = @async @eval using DataFrames
t6 = @async @eval @reexport using FileIO
t7 = @async @eval @reexport using ExcelFiles
t8 = @async @eval @reexport using StatFiles
t9 = @async @eval @reexport using CSVFiles
t10 = @async @eval @reexport using FeatherFiles
t11 = @async @eval @reexport using ParquetFiles
t12 = @async @eval @reexport using VegaLite
t13 = @async @eval @reexport using DataVoyager

# Here I would rather want to do wait(t1, t2, t3, ... t13), and in general to fix the boilerplate to: using p1, p2...

Now, the two specific questions:

What’s wrong 1) with the code/idea, and 2) since it runs, what does eval do, do the modules load, just in some other namespace, since I get no error?

I don’t know whether this approach is sound, but I think you’d rather want something along the lines of:

@sync begin
    @async eval(:(using Pkg1))
    @async eval(:(using Pkg2))

@reexport Pkg1

EDIT: some quick tests (on my system, v1.5.1) seem to indicate that this is not faster than the plain series of using statements.

1 Like

Thanks for answering, I’ve gotten async to work, but only by disabling precompiling, and the code seems to work, and I can get it to load faster (2x as fast):

$ julia -O0 --compile=min -q
julia> @time using Queryverse
[ Info: Precompiling Queryverse [612083be-0b0f-5412-89c1-4e7c75506a58]
[ Info: Skipping precompilation since __precompile__(false). Importing Queryverse [612083be-0b0f-5412-89c1-4e7c75506a58].
  5.555800 seconds (8.37 M allocations: 525.525 MiB, 5.04% gc time)

on default:

julia> @time using Queryverse
[ Info: Precompiling Queryverse [612083be-0b0f-5412-89c1-4e7c75506a58]
[ Info: Skipping precompilation since __precompile__(false). Importing Queryverse [612083be-0b0f-5412-89c1-4e7c75506a58].
 14.761669 seconds (18.15 M allocations: 1.059 GiB, 2.98% gc time)

yes, this latter is slower than the original 10.7 sec. on defaults (and while the former faster, not fully comparable), but that’s already precompiled code, so not fair. I’m only really making the original compilation faster, with the code without my changes I get:

julia> @time using Queryverse
[ Info: Precompiling Queryverse [612083be-0b0f-5412-89c1-4e7c75506a58]
 19.157164 seconds (15.50 M allocations: 936.122 MiB, 1.94% gc time)

So I’m “solving” a separate (already annoying) problem.

Without disabling precompiling, what I really want in, I get:

julia> @time using Queryverse
[ Info: Precompiling Queryverse [612083be-0b0f-5412-89c1-4e7c75506a58]
fatal: error thrown and no exception handler available.
ErrorException("Task cannot be serialized")
jl_error at /buildworker/worker/package_linux64/build/src/rtutils.c:41
jl_serialize_value_ at /buildworker/worker/package_linux64/build/src/dump.c:675
jl_serialize_value_ at /buildworker/worker/package_linux64/build/src/julia.h:1001 [inlined]
jl_serialize_module at /buildworker/worker/package_linux64/build/src/dump.c:361 [inlined]
jl_serialize_value_ at /buildworker/worker/package_linux64/build/src/dump.c:672
jl_serialize_value_ at /buildworker/worker/package_linux64/build/src/dump.c:505
jl_serialize_value_ at /buildworker/worker/package_linux64/build/src/dump.c:384 [inlined]
jl_save_incremental at /buildworker/worker/package_linux64/build/src/dump.c:2135
jl_write_compiler_output at /buildworker/worker/package_linux64/build/src/precompile.c:61
jl_atexit_hook at /buildworker/worker/package_linux64/build/src/init.c:218
main at /buildworker/worker/package_linux64/build/ui/repl.c:228
__libc_start_main at /lib/x86_64-linux-gnu/ (unknown line)
_start at /home/pharaldsson_sym/julia-1.6-latest-f047d7ffc7/bin/julia (unknown line)
ERROR: Failed to precompile Queryverse [612083be-0b0f-5412-89c1-4e7c75506a58] to /home/pharaldsson_sym/.julia/compiled/v1.6/Queryverse/hLJnW_09XvJ.ji.
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] compilecache(pkg::Base.PkgId, path::String)
   @ Base ./loading.jl:1351
 [3] _require(pkg::Base.PkgId)
   @ Base ./loading.jl:1031
 [4] require(uuidkey::Base.PkgId)
   @ Base ./loading.jl:929
 [5] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:924
 [6] top-level scope
   @ timing.jl:174

If loading packages in parallel was as easy as throwing an @async on there, it’s quite likely that it would already been done, no?


I’m not saying it’s easy, I spent a ton of time investigating, and I found a way to load Queryverse “instantaneously”, at least for interactive work:

$ julia -t2
julia> @time t = Threads.@spawn @eval using Queryverse
  0.011742 seconds (6.31 k allocations: 425.034 KiB)
Task (runnable) @0x00007f4b91ee5430

It matters to use -t2 or higher (or you get the ca. 10 sec wait), and then if you’re worried about race conditions:

julia> wait(t) # before some plot/using that package if you worry, BUT also may be needed otherwise:

Seemingly I can do a lot before the wait (or if I skip it), but if you’re unlucky you’ll get:

julia> @time t = Threads.@spawn @eval using Queryverse
0.000039 seconds (112 allocations: 5.750 KiB)
Task (failed) @0x00007fc8d3bb4fe0
concurrency violation detected
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] concurrency_violation()
    @ Base ./condition.jl:8
  [3] assert_havelock
    @ ./condition.jl:25 [inlined]
  [4] assert_havelock
    @ ./condition.jl:48 [inlined]
  [5] assert_havelock
    @ ./condition.jl:72 [inlined]
  [6] wait(c::Condition)
    @ Base ./condition.jl:102
  [7] _require(pkg::Base.PkgId, cache::Base.TOMLCache)
    @ Base ./loading.jl:878
  [8] require(uuidkey::Base.PkgId, cache::Base.TOMLCache)
    @ Base ./loading.jl:818
  [9] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:813
 [10] eval
    @ ./boot.jl:344 [inlined]
 [11] (::var"#3#4")()
    @ Main ./threadingconstructs.jl:169

You can’t e.g. do this for two packages (without wait in between) at a time:

julia> @time t = Threads.@spawn @eval using Queryverse; @time t2 = Threads.@spawn @eval using Plots;
  0.010398 seconds (6.31 k allocations: 425.034 KiB)
  double free or corruption (out)

I seemingly got 5x speedup for parallel loading of JLLs, but that was an illusion…

You are just creating a task that you are not waiting for so the timing immidiately prints. Again, if you think that the trick to loading packages instantly is by just spawning using in a separate thread, then you are underestimating the Julia developers. If you want to run things multithreaded you need to make sure that the code you run is thread safe. This is not, which is why you get the errors.