In my project, I am running large agent-based models on a cluster environment. My typical workflow is a function main
that does a complete independent simulation (can take up to 10 minutes!). I usually run pmap
after addprocs
, i.e.
results = pmap(x -> main(x), 1:100)
The typical problem I run into is that my simulations tend to write things to file. If I have 50 processes (i.e. addprocs(50)
) running and they are all trying to write to file, it slows down my computation tremendously.
In order to avoid this, I don’t write to file anymore. I tend to keep all the results in memory until the pmap
returns, and then process the returned data. This usually works fine, except sometimes I run out of memory issue (50 simulations all running simultaneously can eat a lot of memory).
My idea is:
It would be great if every time an event/message happens in any of the 50 simulations, this information is sent to the head node in some sort of queue system. The head node is sitting idle while the simulations are running, and so I am thinking if events/messages come in one at a time to the head node, it can process these messages and even write to file periodically. As an added benefit, I can also have some sort of dashboard showing live data of my simulations.
Is there any low overhead message passing functionality in Julia? Is it RemoteChannel
that I am looking for? What about a “callback” function from each simulation (I’ve used this method before to create a progress bar of sorts).