I’m fairly new to julia, and not overly experienced with scaling to huge numbers of processes, so forgive me if I’m missing something obvious.
Alright, to be honest, I actually used a couple thousand cpu-hours to run using DoubleFloats
, but it looks like 1+2
would have done the same. From my testing, the time taken to call
@everywhere <basically any fast piece of code>
scales linearly with the number of processes (at least the first time it’s called), and the constant in front of that scaling is ~1 second, even for something as simple as
@everywhere 1+2
But that linear wallclock scaling means that the cpu-time spent on this call scales quadratically. Now, that ~1 second may not sound like much, but it can easily get to ~1 hour (wallclock) with a few thousand processes — which is thousands of cpu-hours.
Obviously I run into this mostly with @everywhere using WhateverPackage
, where the package can be something as simple as Base
. But it doesn’t matter whether DEPOT_PATH points to a local SSD for each worker or some slow home directory.
I didn’t realize this was going to happen, saw that my code worked nicely on a few dozen processes, and threw it onto a few thousand processes (several dozen nodes with 56 cores each, and one process per core), which took about an hour to get past the first @everywhere using DoubleFloats
, thereby wasting a couple thousand cpu-hours on the first occurrence of @everywhere
!!! I can’t really afford to waste this much compute time for each run, so this makes julia unusable for what I need here. (Making up the difference with threading isn’t really in the cards.)
Am I missing something? Can I prime my workers to know how to use @everywhere
or distribute this preparation so that it doesn’t go quadratically? Can I somehow use pmap
and friends without @everywhere using SomePackage
? Or is this just somewhere julia won’t be able to reach?