Save data to same files from independent, parallel processes

J-J · November 20, 2024, 11:45pm

Problem

I have a Julia script which computes two numbers, num1 and num2, but the computation isn’t deterministic and so these computed numbers are different each time I run the script. Now I want to run this script many times, save these numbers, and to do statistics on each set of num1s and num2s afterwards. Each run takes some time, so I will run this script in parallel on a computer cluster, with each process running on an individual core.

Here’s my naive idea of simply appending all the numbers (within Julia) to two files, one for num1 and one for num2:

# main.jl

num1 = #some number, different for each run
num2 = #some other number, also different for each run

# append my output files with data from this run
open("num1s.txt", "a") do file1
    println(file1, num1)
end

open("num2s.txt", "a") do file2
    println(file2, num2)
end

But what happens if two independent processes try to append my files at the same time? I’m worried that this isn’t safe and that conflicts will cause some runs to fail. Do you know of a better way to save Julia outputs from independent processes?

Additional information

How I usually submit such jobs to the cluster is using GNU parallel with a (simplified) job script looking something like this when using a single 24-core node:

#!/bin/bash
#PBS job parameters go here

# run main.jl 24 times in parallel on a 24-core node
parallel julia main.jl ::: {01..24}

I imagine a possible solution is edit my Julia script to save my numbers to individual files, then append this bash script with the necessary incantations to collate the data into two files (num1s.txt and num2s.txt), but I was hoping there might be an elegant solution from within Julia.

nhz2 · November 21, 2024, 4:05pm

Yes it is not safe for multiple processes to append to the same file at the same time using the regular open function with the "a" flag. I think the most cross platform thing to do is have each process write to its own file, and then combine the results at the end. There are also libraries like SQLight.jl you can try, but if your cluster is using a network filesystem SQLight might be subtly broken. Ref: Atomic Commit In SQLite

Topic		Replies	Views
Running a single Julia script simultaneously on different workers, each with different input parameters General Usage question , parallel , scripting	0	457	August 5, 2021
Parallel computing: running from terminal vs. editor. Same code, missing parallelisation General Usage question , parallel	0	514	April 16, 2020
A problem: writing results to a file in parallel New to Julia question , parallel , io	14	1365	December 15, 2023
Running Julia with native Multithreading vs in Seperate Processes Performance	1	346	August 18, 2021
Integrating bash/Julia script General Usage cluster , bash	5	334	July 10, 2023

Save data to same files from independent, parallel processes

Problem

Additional information

Related topics