A problem: writing results to a file in parallel

I’m a newbie in julia language, I’ve written a parallel code using Distributed, and I want to write the intermediate results of the program’s calculations to a file, this is my code.

    using Distributed
    # Set up parallel workers
    addprocs(4)
    @everywhere begin
        function my_function(n)
            # Compute the result
            result=n;
            # Open the file in append mode
            f = open("output.txt", "a")
            # # Write the result to the file
            println(f, "Result: ", result)
            close(f)
            # open("output.txt","a",lock=true) do io
            #     write(io,"Result: "*string(result))
            #     write(io,"\r\n")
            # end
            return result
        end
    end
    @distributed for i in 1:10000
        result = my_function(i)
        println("Result for iteration $i: $result")
    end

I want get output.txt as follow:
Result: 1
Result: 2501
Result: 2
Result: 3
Result: 4
Result: 5
Result: 6
Result: 7
Result: 8

but I got:
Result: 2566
Result: 85
Result: 2567
Result: 86
Result: 2568
Result: 87
Result: 2569
Result: 88
0
Result: 2571
Result: 89
Result: 90
2
Result: 2573
Result: 7501
Result: 105

Result: 2587
Result: 107

I can`t understand.

You are creating a race condition when multiple processes try to write to the same file simultaneously and end up overwriting one another.

You’ll either need to

  • use a lock file to control access (so that only one process writes to at once), e.g. using Pidfile
  • use a a different file format that supports parallel I/O (e.g. parallel HDF5, though that requires you to use MPI).
  • send all of the data you want to output to a single process, and write from that.

The third option is usually the simplest and most robust.

2 Likes

Thank you very much, I will try the third option, the first two were a bit difficult for me.
In the documentation, I see this in the function open: “The lock keyword argument controls whether the operation will be locked to ensure secure multithreaded access.” But it doesn’t seem to work. Thanks again.

Given that your output is pretty simple, I think a fourth way to do this would be with Logging and then use LoggingExtras.jl to pipe them to a file.

@zhpf0530 I hope this is said in a nice manner. You are doing distributed processing. Multithreaded applies to running on a single server.

@johnh Thanks. Multithreading is really good for running on a single server. I tried it, and the result was the same, and the output file was also messy. Following Steward’s suggestion, I now use channels to log the data, and then a separate process writes the data.

I think there are two mistakes regarding the use of lock=true for open in the code above (which is the default by the way):

  • it’s meant for multithreading rather than multiprocess
  • it’s meant for multiple accesses to the io value returned by open

The second is the biggest mistake: if you do io = open(..., lock=true) in different threads then each thread will use its own io object with its own lock, so the locks are useless. Instead, you should call open only once, and share the io between threads.

1 Like

You’re not doing multi-threaded access. You’re doing multi-process access. Google processes vs threads.