How to handle and store large amounts of (distributed) generated data?

Yes, you basically give it an offset and as long as you’re doing stripe aligned writes (writes in multiples of the file system block size), and only write a given strip from one process at a time, it’ll generally be pretty high performance. I just checked whether we have wrapped pwrite in filesystem.jl, but it doesn’t seem like it. For the quick and dirty solution see https://github.com/jeff-regier/Celeste.jl/blob/master/src/SDSSIO.jl#L736-L782, which is the same idea but for reads. At some point we should wrap it in base.

2 Likes