A lot of (Monte Carlo) simulations can be done simultaneously and independently on many nodes of an HPC, each generating large solutions for later analysis. However, a straight pmap will try to build a giant output that likely won’t fit on any node, and then just crash the system.
Are there distributed databases I could instead write information to, or instead somehow serialize the types separately and concatenate them into one big data file?
Related to this post is the other post, discussing what to actually save:
In the past I have used DistributedArrays (you can distribute more than arrays) and the write the data to a series of files on a distributed Filesystem (lustre). In my use cases it was most likely that I would want to read-in the data distributed as well in the end.
There is an mpi extension to HDF5 that might be usable, but I have never used that from within Julia.
Yes, you basically give it an offset and as long as you’re doing stripe aligned writes (writes in multiples of the file system block size), and only write a given strip from one process at a time, it’ll generally be pretty high performance. I just checked whether we have wrapped pwrite in filesystem.jl, but it doesn’t seem like it. For the quick and dirty solution see https://github.com/jeff-regier/Celeste.jl/blob/master/src/SDSSIO.jl#L736-L782, which is the same idea but for reads. At some point we should wrap it in base.