I’m fairly new to julia, and not overly experienced with scaling to huge numbers of processes, so forgive me if I’m missing something obvious.
Alright, to be honest, I actually used a couple thousand cpu-hours to run
using DoubleFloats, but it looks like
1+2 would have done the same. From my testing, the time taken to call
@everywhere <basically any fast piece of code>
scales linearly with the number of processes (at least the first time it’s called), and the constant in front of that scaling is ~1 second, even for something as simple as
But that linear wallclock scaling means that the cpu-time spent on this call scales quadratically. Now, that ~1 second may not sound like much, but it can easily get to ~1 hour (wallclock) with a few thousand processes — which is thousands of cpu-hours.
Obviously I run into this mostly with
@everywhere using WhateverPackage, where the package can be something as simple as
Base. But it doesn’t matter whether DEPOT_PATH points to a local SSD for each worker or some slow home directory.
I didn’t realize this was going to happen, saw that my code worked nicely on a few dozen processes, and threw it onto a few thousand processes (several dozen nodes with 56 cores each, and one process per core), which took about an hour to get past the first
@everywhere using DoubleFloats, thereby wasting a couple thousand cpu-hours on the first occurrence of
@everywhere!!! I can’t really afford to waste this much compute time for each run, so this makes julia unusable for what I need here. (Making up the difference with threading isn’t really in the cards.)
Am I missing something? Can I prime my workers to know how to use
@everywhere or distribute this preparation so that it doesn’t go quadratically? Can I somehow use
pmap and friends without
@everywhere using SomePackage? Or is this just somewhere julia won’t be able to reach?