Overhead in passing data to worker processes

I have several large objects that need to be passed to worker processes. Currently, I am just using a simple @parallel for loop without any reducer. Since the objects are quite large, it takes 5 seconds before a worker process actually starts doing the work.

However, it appears that each worker process is taking this hit sequentially. So, the first worker takes 5 seconds, then the second worker takes another 5 seconds, so on and so forth. While I have many worker processes (24), the later one progressively take a longer time to even start doing work.

Is there any way to avoid that? Perhaps using something other than @parallel? I’ll work out a MWE if needed. Thanks

Any chance you can construct the objects, or load the necessary data, on those workers (ie, rather than on the main process)?
That’d be the simplest solution, if possible.

If you aren’t on a distributed memory system, you could use threads instead.

Let me try to use JLD2 to save to a SSD and ask the workers to load the data in parallel. I’m unsure if my code is thread-safe… can try that next. Thanks for the ideas.

JLD2+SSD worked well.

Multi-threading failed in BLAS. I have 96 threads running on a server with 144 vCPU’s and tons of memory.

Error thrown in threaded loop on thread 47: Base.KeyError(key=1111826)BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.

signal (11): Segmentation fault
while loading no file, in expression starting on line 0

signal (11): Segmentation fault
while loading no file, in expression starting on line 0
unknown function (ip: 0x7fa6871343a8)
dgemv_t_HASWELL at /opt/julia/bin/../lib/julia/libopenblas64_.so (unknown line)

signal (11): Segmentation fault
while loading no file, in expression starting on line 0
unknown function (ip: 0x7fa6871343a8)
unknown function (ip: 0x7fa685bbe23b)
dgemv_t_HASWELL at /opt/julia/bin/../lib/julia/libopenblas64_.so (unknown line)
unknown function (ip: 0x7fa6871343a8)
exec_blas at /opt/julia/bin/../lib/julia/libopenblas64_.so (unknown line)
unknown function (ip: 0x7fa685bbe23b)

Google shows me this issue. I guess you have more that 128 vCPUs

A grown-up will be along in a minute, but surely this limit should be configureable by an environment variable?