Updating an XLSX file simultaneously by more than one Julia script

I have a Julia script which does some calculations and in the end, the output is written to an Excel file using the following function:

function populate_spreadsheet(sol, instance_number, sheet_number, stat::Symbol)
    XLSX.openxlsx("test.xlsx", mode = "rw") do xf
        sheet = xf[sheet_number]
        if stat == :obj
            sheet[string("A", instance_number)] = sol.objective
        elseif stat == :gap
            sheet[string("B", instance_number)] = sol.gap
        elseif stat == :time
            sheet[string("C", instance_number)] = sol.solvetime
        end
    end
end

This works perfectly fine when I run a single instance of my script (say for instance_number=1); however, I start to run into problems when I run multiple instances of the script at once. I am using an HPC cluster, and run an array of jobs using a slurm script. For example, I run an array of jobs for instance_number 1 to 40. The errors are not consistent. Two of the most common errors I get are unable to parse XML file or AssertionError: Couldn't find xl/sharedStrings.xml . I suspect it is because the independent jobs are trying to access and write to the file at once. Is there a way to overcome this problem? Any help is appreciated. Thanks!

Writing to the same file on disk with multiple processes is always tricky. Can’t you write to multiple files in parallel and then add a final post-processing step with the main process that simply merges the results?

2 Likes

Yes, that’s definitely one way I was thinking, where I generate one XLSX file per job and then just run a script to merge all data into one XLSX file; however, I want to avoid generating so many extra files (I have a lot more parallel jobs – somewhere around 200). So 200 XLSX files don’t sound like the most optimal way to go about this. :sweat_smile:

If you want to stick to job arrays, I would write out one file per job to a temporary directory, then combine them at the end with a dependent job. 200 XLSX files is hardly a sin compared to a lot of the hacks that are required to get things to cooperate on HPC systems. There are more-complicated ways to do this with tools like Dagger.jl or parallel HDF5 if you don’t mind switching file formats and using MPI.

1 Like

OK. I guess than I would do it the easy, more intuitive way of generating one file per job and then merging it. Thanks! :slight_smile:

For reference, you can use the Pidfile standard library to manage exclusive access to files.

1 Like