I tried those things, but mostly they didn’t seem to help.
To simplify, I switched to executing julia
from bash
, removing VSCode
and its debugger/language server as possible culprits. I even tried quitting VSCode completely in case it was writing to or locking files.
I do start Revise
in my startup file.
Good News
- Using threads produced no errors about unrecognized packages. Though it also produced no sign more than one thread was active.
- Using
@everywhere
(thanks for the tip) I was able to get all the processes in the same environment using
@everywhere import Pkg
@everywhere Pkg.activate(".")
Unchanged Bad News
- Cloning the project into a fresh directory and starting with
julia -t4
didn’t seem to induce much parallelism, increase in CPU use beyond 1, or speedup.
(@v1.8) pkg> activate .
Activating project at `~/Documents/BP-2`
julia> import Pkg
julia> @time Pkg.precompile()
┌ Warning: The active manifest file is missing a julia version entry. Dependencies may have been resolved with a different julia version.
└ @ ~/Documents/BP-2/Manifest.toml:0
Precompiling project...
1 dependency successfully precompiled in 408 seconds. 123 already precompiled.
410.341454 seconds (2.77 M allocations: 177.222 MiB, 0.04% gc time, 0.16% compilation time)
So one package was precompiled but it took 410 seconds!? Maybe the work is hidden in the threads?
Since packages are managed as a shared pool (I think, even if one is using a custom environment), it’s unsurprising that the work was mostly done–I had already used the packages on the same machine. But if the work really was all done, why did it take so long?
- Starting
julia
with threads produced no speedup and no increase in CPU use beyond 12% when include
’ing my test file, which defines one type, one function, and calls it. The function generates 12 rows of random data, but takes over a minute:
@time include("src/maker.jl")
# small table output omitted
73.580062 seconds (33.69 M allocations: 2.036 GiB, 4.34% gc time, 42.84% compilation time: 22% of which was recompilation)
# but if I repeat the only action line that is not printout
@time data = maker()
0.000539 seconds (153 allocations: 9.477 KiB)
While this is not the same as what happens when I debug in VSCode, the 75s delay seems similar. And it happens every time I debug, even if I don’t change the code. It also happens when I run the code without debugging in VSCode, just as it happens every time I start the REPL and include
the file. The time clearly isn’t going into actually executing the maker()
function.
The results of @time
with -t4
are about the same as when julia
starts with no options.
It seems odd that julia is spending anytime compiling code that has already been compiled, and it also raises the question of what the 53% of the time that is not compilation or gc is going for.
It’s hard to develop code when every debugging cycle takes > a minute to get started.
- Using
julia -p4
and the @everywhere
code above did produce activity in more than one CPU for a little while at the start. But then it seemed to go back to one core, and the whole thing was even slower than the version with threads:
125.444737 seconds (34.02 M allocations: 2.052 GiB, 2.46% gc time, 26.13% compilation time: 25% of which was recompilation)
I’m not sure if these times, like 125 seconds just above, are wall-clock times, sum of CPU time for all threads/processes, or something else–the help isn’t clear to me. But, as I sat before the terminal, they seemed plausible as wall-clock times.
Small code, big dependencies?
The list of packages explicitly added to the environment is not large:
(MSEP) pkg> st
Project MSEP v0.1.0
Status `~/Documents/BP-MSEP/Project.toml`
[a93c6f00] DataFrames v1.3.5
[31c24e10] Distributions v0.25.71
[442a2c76] FastGaussQuadrature v0.4.9
[c91e804a] Gadfly v1.3.4
[ff71e718] MixedModels v4.7.1 `~/.julia/dev/MixedModels`
[86f7a689] NamedArrays v0.9.6
[1fd47b50] QuadGK v2.5.0
[4c63d2b9] StatsFuns v1.0.1
[37e2e46d] LinearAlgebra
However, there are a lot of packages, ~120, when dependencies are included.
The test file only uses some of them:
using DataFrames
using Distributions
using MixedModels
using StatsFuns