Save data to same files from independent, parallel processes

Problem

I have a Julia script which computes two numbers, num1 and num2, but the computation isn’t deterministic and so these computed numbers are different each time I run the script. Now I want to run this script many times, save these numbers, and to do statistics on each set of num1s and num2s afterwards. Each run takes some time, so I will run this script in parallel on a computer cluster, with each process running on an individual core.

Here’s my naive idea of simply appending all the numbers (within Julia) to two files, one for num1 and one for num2:

# main.jl

num1 = #some number, different for each run
num2 = #some other number, also different for each run

# append my output files with data from this run
open("num1s.txt", "a") do file1
    println(file1, num1)
end

open("num2s.txt", "a") do file2
    println(file2, num2)
end

But what happens if two independent processes try to append my files at the same time? I’m worried that this isn’t safe and that conflicts will cause some runs to fail. Do you know of a better way to save Julia outputs from independent processes?

Additional information

How I usually submit such jobs to the cluster is using GNU parallel with a (simplified) job script looking something like this when using a single 24-core node:

#!/bin/bash
#PBS job parameters go here

# run main.jl 24 times in parallel on a 24-core node
parallel julia main.jl ::: {01..24}

I imagine a possible solution is edit my Julia script to save my numbers to individual files, then append this bash script with the necessary incantations to collate the data into two files (num1s.txt and num2s.txt), but I was hoping there might be an elegant solution from within Julia.

1 Like

Yes it is not safe for multiple processes to append to the same file at the same time using the regular open function with the "a" flag. I think the most cross platform thing to do is have each process write to its own file, and then combine the results at the end. There are also libraries like SQLight.jl you can try, but if your cluster is using a network filesystem SQLight might be subtly broken. Ref: Atomic Commit In SQLite