Write to text file while using threads

Hello everyone,

I’m trying to use the Threads module.
I have long running simulations that I want to run in parallel and save the data to a single file while the simulation is running.

Here is what I have done so far:

using Base.Threads: @threads, @spawn, @sync
using DelimitedFiles

a = zeros(10,5)

@sync for j in 1:5
    @spawn for i in 1:10
        a[i,j] = j*i
    end
    sleep(2)
    writedlm("test_mt_save.txt", a[:,1:j])
end

which gives the following output in “test_mt_save.txt”

1.0	2.0	3.0	4.0	5.0
2.0	4.0	6.0	8.0	10.0
3.0	6.0	9.0	12.0	15.0
4.0	8.0	12.0	16.0	20.0
5.0	10.0	15.0	20.0	25.0
6.0	12.0	18.0	24.0	30.0
7.0	14.0	21.0	28.0	35.0
8.0	16.0	24.0	32.0	40.0
9.0	18.0	27.0	36.0	45.0
10.0	20.0	30.0	40.0	50.0

So far so good.
When I replace the simple i*j calculation by something that runs much longer, then things go wrong.

I just get

0.0	0.0	0.0	0.0	0.0	
0.0	0.0	0.0	0.0	0.0	
0.0	0.0	0.0	0.0	0.0	
0.0	0.0	0.0	0.0	0.0	
0.0	0.0	0.0	0.0	0.0	
0.0	0.0	0.0	0.0	0.0	
0.0	0.0	0.0	0.0	0.0	
0.0	0.0	0.0	0.0	0.0	
0.0	0.0	0.0	0.0	0.0	
0.0	0.0	0.0	0.0	0.0	

Nothing is updated…

Any help would be very much appreciated.
Thanks again.

Olivier

your sync is too late, after @spawn, if 2 seconds is not enough for your function, the write wil lhappen before a is updated.

HI, thanks

The sleep was just for this mwe. If I remove sleep and replace i*j by

sum(sum(rand(5000,5000)*rand(5000,5000)))

I get this

3.125722834150277e10	0.0	0.0	0.0	0.0
3.125004375236692e10	0.0	0.0	0.0	0.0
3.1252838191133118e10	0.0	0.0	0.0	0.0
0.0	0.0	0.0	0.0	0.0
0.0	0.0	0.0	0.0	0.0
0.0	0.0	0.0	0.0	0.0
0.0	0.0	0.0	0.0	0.0
0.0	0.0	0.0	0.0	0.0
0.0	0.0	0.0	0.0	0.0
0.0	0.0	0.0	0.0	0.0

so something is written, but not yet what I want

the fix is not to remove the sleep removing makes it worse.

you should move the @sync to inner place

ok, indeed that worked!

Thanks a lot.
But honestly I have no clue why this works…

@spawn creates a task that then starts running on an available thread, but while it’s running the main thread keeps running code that follows until told to wait, e.g., via @sync. To illustrate:

@sync begin
    @spawn begin
        sleep(2)
        println("Task done!")
    end
    println("Waiting for task to complete...") # Executes while spawned task is `sleep`ing
end
println("Done waiting!") # Executes after spawned task completes
2 Likes

very nice illustration!

But then, why does Jeff Bezanson’s example from his julia computing seminar (minute 15:18) on multithreading look the following way

function escapetime(z; maxiter = 80)
           c = z
           for n in 1:maxiter
               if abs(z) > 2
                   return n-1
               end
               z = z^2 + c
           end
           return maxiter
        end

function mandelbrot(; width = 80, height = 20, maxiter = 80)           
    out = zeros(Int, height, width)           
    real = range(-2.0, 0.5, length = width)
    imag = range(-1.0, 1.0, length = height)
    @sync for x in 1:width
        @spawn for y in 1:height
             z = real[x] + imag[y]*im
             out[y,x] = escapetime(z, maxiter = maxiter)
         end
    end
    return out
end

Here clearly, the @sync macro is on the outer loop.
Thanks again!

but this line is outside of @sync, where you were writing inside @sync which was too early

    @sync for x in 1:width
        @spawn for y in 1:height
             z = real[x] + imag[y]*im
             out[y,x] = escapetime(z, maxiter = maxiter)
         end
        # Code written here will run concurrently with spawned code
    end
    # Code written here will run after spawned code finishes
    return out
2 Likes

Thanks a lot to all of you for taking the time to explain those concepts.
I’ll need a bit more time to grasp all of the details but you put me on the right track!

I think finally found the solution to my problem.
I used the following code in order to write to file correctly while still being able to use multithreading:

using Base.Threads: @threads, @spawn, @sync
using DelimitedFiles

a = zeros(10,5)

for j in 1:5
    @sync for i in 1:10
       @spawn begin
            a[i,j] = sum(rand(5000,5000)*rand(5000,5000))
       end
    end
    writedlm("test_mt_save.txt", a[:,1:j])
end

This way for each value of i, a new task will be spawned. But putting @sync in front of the inner loop, I am sure the tasks finish before writing to file.
It seems to work.
Please correct me if I wrote something which is not correct.

Thanks again,
Olivier

Edit: I edited the code following @StevenWhitaker 's comment about sum(sum(matrix)) = sum(matrix)

By the way, sum(sum(x)) == sum(x) when x is an array of numbers; sum (without the dims keyword argument) sums all the values of the input collection, not just the values along one dimension (like MATLAB does).

2 Likes

Aha! Thanks for pointing that out!

I edited the post.