I spend a lot of time staring at
[ Info: Precompiling X messages. AFAICT this process is completely single-threaded. This seems like a bit of a shame. I spent all this money on a nice beefy machine with a bunch of cores, but I’m waiting on Julia running on a single core. I guess my question is two-fold: Why is this not parallelized? And are there other parallelization opportunities here besides precompilation, eg. JIT compilation, etc?
I spend a lot of time staring at
I can hazard a guess. Issues like cache invalidation makes this essentially single thread only. E.g on windows you can only install one program at a time.
LIke say PkgA defines
Base.+(a::SomeType, b::SomeType) and PkgB defines
Base.+(b::SomeOtherType, d::SomeOtherType) then the function
+ that does the dispatching needs to know about both? So you can’t do it in a multithreaded way trivially.
this is why Julia is very good at parallel computation
Using multiple threads in the compiler.
Sure, and fast-at-runtime is certainly a plus, but if I have to wait 5 minutes just to get a very simple script to start running Julia becomes a lot less attractive as a tool.
precompile only happens once per package update, no?
In general, @Samuel_Ainsworth, I don’t see any reason to try and argue why X is wrong or Y is not 100% correct etc.
Sometimes, it’s best to acknowledge imperfections and also acknowledge there is a work making the imperfections better.
Julia like many things is just a tool. No need to insist it’s right tool and is the best at everything.
Sure, and I only bring this up in the hopes of improving Julia as a whole. I posted initially out of genuine curiosity for why this is not parallelized at the moment, and hope that it can be parallelized in the future.
https://julialang.org/blog/2020/08/invalidations/ gives a pretty deep view into the problem of invalidations, and if packages can invalidate then you can’t trivially just precompile all of the pieces at the same time because the dispatches from the whole can be combinations of the parts. That said, there’s multiple things going on.
First, (pre) compilation is just getting faster, especially in the next Julia release. This plot from Plots.jl issues really speaks for itself:
(TTFP is total time to first plot)
I’m also being fed information like 3x decrease in
using DifferentialEquations time.
But secondly, and back to your point here, the LLVM ORCJIT allows for multithreading its operations. That has not really mattered as much before because most of the time for the very long cases was spent in Julia’s type inference itself, so this would’ve only sped up a minority of the cost. But I’m sure that this thread (pun intended) will get picked up again as LLVM time starts to become more and more important.
Lastly, I also wouldn’t think it would be crazy to include some more big stable packages in the stdlib and the standard system image, like Requires, MacroTools, ForwardDiff, etc.
Yes it is, the precompilation is serial (while there’s a loophole to allow precompiling in parallel, see below), and I’ve been looking into the loading part (even with precompilation done) it is also serial, meaning single-threaded.
I’m personally less worried about the precompilation part, as it only happens once, I think more about the general loading phase, and note, it also involves some compilation (precompilation is only partial).
If you read my that other thread of mine to the end, no, I’m not “underestimating” the Julia developers, this is just a difficult problem, and other stuff may have had priority. I want to say @kristoffer.carlsson (and the other Julia developers) are doing a great job, also with answering.
The problem as I see it, is, in Julia:
using A using B
is not the same (in general, could have side-effects, and other issues I’m not getting into here) as even (let alone, doing both at the same time, Julia’s semantics disallow that, currently):
using B using A
[I’ve shown in another thread, that the time for doing the above can differ, so it’s hard to know the best order of `using packages, with n! possible orderings.]
I did a test, to make sure precompilation CAN happen in parallel, as I thought (given separate Julia processes/terminals). This might at least be helpful to know, if you anticipate compiling packages, like these package below that are big.
You can do the downloading of separate packages, in parallel, obviously, in separate terminals, and also then the precompiling:
julia> @time using Plots [ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80] 134.201107 seconds (6.86 M allocations: 478.701 MiB, 0.29% gc time)
in the other terminal:
julia> @time using Gtk [ Info: Precompiling Gtk [4c0ca9eb-093a-5379-98c5-f87ac0bbbf44] Gtk-Message: 14:08:49.532: Failed to load module "canberra-gtk-module" Gtk-Message: 14:08:49.532: Failed to load module "canberra-gtk-module" 108.263449 seconds (3.77 M allocations: 245.317 MiB, 0.08% gc time)
and I confirmed doing both in the same terminal/Julia process takes the combined time, not only 134 sec.
So why is this? While Julia is great for parallel programming, by default, it has serial semantics, even for
using (and it needs to preserve that illusion at least).
using A, B has exactly the same semantics as
using A; using B, i.e. also disallowing parallel semantics, while maybe that could be relaxed. You can also foresee the Julia runtime looking ahead, and seeing the
using B in the latter case, and starting with it, what it could in theory parallelize, e.g. its compilation, if e.g. waiting for precompiling A.
What I said was that if you think
@sync being @async using A @async using B @async using C end
is all that is needed to load packages in parallel (which looks to me what you are doing in that thread) then you are indeed underestimating.
Package precompilation would be more amenable to parallelization where the leafs could probably be done in parallel.
I assume what Kristoffer means is: the leaf packages in the dependency tree of packages. Those packages, at least Gtk, have a lot of dependencies, and (many, not all) are JLL packages (and while the do have some Julia code, it short/trivial for each, and the JLLs are actually already compiled binaries). I’m not sure if he means limited to JLLs, or only leafs. I don’t know enough about the implementation of Julia or exactly why there seemed to be no overhead in my test for parallel precompilation. It seems to me, maybe precompilation of all code could be in parallel (implemented eventually), while speedup for JLLs only would be a great start.
Direct Dependencies (16)
Binary Artifacts (7)
Indirect Dependencies (75)
almost all JLLs: