First I want to say, time-to-first-plot isn’t really an issue (I consider it a solved problem if you know what you’re doing, using right packages/options), e.g. in about a sec. on Julia 1.6 (you can excpect this speed soon, on default options in 1.6):
$ julia -O1 --compile=min -q
julia> @time using VegaLite
0.902355 seconds (1.20 M allocations: 83.260 MiB)
julia> @time @vlplot(
data={
values=[
{a="A",b=28},{a="B",b=55},{a="C",b=43},
{a="D",b=91},{a="E",b=81},{a="F",b=53},
{a="G",b=19},{a="H",b=87},{a="I",b=52}
]
},
mark="bar",
encoding={
x={field="a", type="ordinal"},
y={field="b", type="quantitative"}
}
)
0.260332 seconds (355.97 k allocations: 21.581 MiB, 12.28% gc time)
This is straight from the docs, and I just added @time
(and started Julia differently).
I’ve been working on, or should I say looking into, faster loading of packages, and there are still problems (while time has been cut in half); when you have lots of dependencies. Which brings me to the meta-package (or other similar packages) for VegaLite, that e.g. gets you the slow loading VegaDatasets too. People may have been conditioned to use that “get me the kitchen-sink” Queryverse metapackage, thinking time-to plot slow, and it’s worth exploring what can be done.
It’s actually by rather ok (compared to 30 sec on 1.4.0), if you’re ok with non-default:
$ julia -O1 --compile=min
julia> @time using Queryverse
3.108088 seconds (4.24 M allocations: 288.382 MiB, 2.35% gc time)
but on default:
julia> @time using Queryverse
10.728247 seconds (15.29 M allocations: 922.528 MiB, 3.43% gc time)
I opened an issue about parallel loading, but it was closed as not specific enough. Here’s are more details on what I have in mind. Loading those dependencies on default settings seem very fast, except my implementation of the idea doesn’t work [EDIT: My code works now, and I’ve replaced the example here, see also my other post below.]
$ time julia parallel_test.jl # amended Queryverse.jl code
real 0m0,337s # these numbers are not valid, see in post further down
user 0m0,663s
sys 0m0,435s
$ cat parallel_test.jl
using Reexport
__precompile__(false)
t1 = @async @eval @reexport using DataValues
t2 = @async @eval import IterableTables
t3 = @async @eval using Query
t4 = @async @eval using DataTables
t5 = @async @eval using DataFrames
t6 = @async @eval @reexport using FileIO
t7 = @async @eval @reexport using ExcelFiles
t8 = @async @eval @reexport using StatFiles
t9 = @async @eval @reexport using CSVFiles
t10 = @async @eval @reexport using FeatherFiles
t11 = @async @eval @reexport using ParquetFiles
t12 = @async @eval @reexport using VegaLite
t13 = @async @eval @reexport using DataVoyager
# Here I would rather want to do wait(t1, t2, t3, ... t13), and in general to fix the boilerplate to: using p1, p2...
wait(t1)
wait(t2)
wait(t2)
wait(t3)
wait(t4)
wait(t5)
wait(t6)
wait(t7)
wait(t8)
wait(t9)
wait(t10)
wait(t11)
wait(t12)
wait(t13)
Now, the two specific questions:
What’s wrong 1) with the code/idea, and 2) since it runs, what does eval do, do the modules load, just in some other namespace, since I get no error?