I am new to memory-shared parallelism and need some pointers on moving forward with a project.
The setup is a high performance cluster with 18 nodes, 32 cores per node with hyperthreading enabled (32 logical cores = 64 with hyperthreading). It uses Slurm for queueing and workload management. To make use of the cluster I user ClusterManagers.jl
which makes my job as easy as addprocs(# of processors)
.
Suppose I use pmap(x -> simulation(x), 1:500)
to launch 500 independent simulations (embarrassingly parallel) on 500 logical cores, each simulation doing it’s own thing. However, inside my computationally expensive simulation
function, I avoid println
, logging statements, printing progress of the simulation, and data processing as to not slow the function down.
My original plan was to pass simulation
a callback function which the main process (pid = 1
) can print or output to file. This is similar to how PmapProgressMeter
worked but it seems this package is not maintained anymore. Another idea is to use RemoteChannels
. Both these ideas are fine if the callback function is simply printing to the screen, but any type of file IO would kill performance. Imagine 500 callbacks from each worker processes asking the main process to write something to the file.
Since hyperthreading is enabled, can I use the @threads
macro for each logical core that pmap
has distributed the function to and that the extra thread can “perform” the printing, logging, and progress from at the simulation level? For example, I would like to ask the thread (associated with the logical core) to compute how much time is left on that simulation or do some data processing. Another idea is to print certain information about the simulation as It chugs along instead of waiting to finish. Yet another idea is to introduce some sort interrupt that the thread listens for incase I want to kill a particular simulation.
If someone can provide a MWE for me to get started with, it would be appreciated. I can deal with the use cases and implementation of my ideas, but I can’t figure out how to enable threads per simulation, per core.
Thanks