@jewh just fyi, this precise issue is why I ended up fixing avoid using `@sync_add` on remotecalls by exaexa · Pull Request #44671 · JuliaLang/julia · GitHub . Precompilation helps a bit but you’ll still spend terrible amount of time with just initializing stuff in almost-serial way (see the kinda benchmark in the PR).
The fix (currently in 1.8.0-beta3 as Mose pointed out) won’t allow you to dodge the package loading&precompilation time, but at least everything is going to be fully parallel, and you can very easily improve more with “normal” use of PackageCompiler.jl.
Also, why not use ClusterManagers.jl and addprocs_slurm? IIRC julia -p81 ... will spawn all processes on a single node. (What I usually do is documented e.g. here with the sbatch script here)