What is the mechanics of `copy` in the thread-parallization by `Threads.@spawn`? Or: Problems with copy arrays in thread parallelism

draftman9 · October 8, 2023, 3:51pm

I hope to implement a thread-parallel sample, a common example in solving time-dependent evolution problems (e.g. main_serial in the code.

Generate a matrix arr_task over time. This matrix is updated over loop/time and the matrix arr_task generated by the current loop is dependent on the result of arr_task generated by the last loop/time, so it must be generated serially. Then pass arr_task into process!() to be processed, and the result of the process!() is stored in matrix arr _stored. These two steps give the result of processing the arr_task at each time.

The main_serial in the code is a serial program. For the sake of parallelism I defined a function main_parallel_copy where I tried to use copy to pass arr_task into process!() for processing based on multithreads. However the result is different from the serial program, it seems that there is a data race (presumed based on the output after I run it). With the help of julia Chinese group, I assigned arr_task to arr_temp: arr_temp = copy(arr_task), then passed arr_temp into process!() as shown in the function main_parallel_assign.

Running environment:

win10: Microsoft Windows [Version 10.0.17763.4252]
julia Version 1.8.5
How to run: julia -t4 test.jl
This is the code from the test.jl script:

function main_serial()
    arr_stored = zeros(Int, 4)
    arr_task = zeros(Int, 1)
    for time in 1:4
        # get a task, should be serial
        arr_task[1] = time + arr_task[1]
        sleep(1)

        # handle the task
        process!(arr_stored, arr_task, time)
    end
    println("serial : $arr_stored")
end

function main_parallel_copy()
    arr_stored = zeros(Int, 4)
    arr_task = zeros(Int, 1)
    @sync for time in 1:4
        # get a task, should be serial
        # println("copy!")
        arr_task[1] = time + arr_task[1]
        sleep(1)

        # handle the task, but parallel on threads
        Threads.@spawn process!(arr_stored, copy(arr_task), time)
    end
    println("parallel copy : $arr_stored")
end

function main_parallel_assign()
    arr_stored = zeros(Int, 4)
    arr_task = zeros(Int, 1)
    @sync for time in 1:4
        # get a task, should be serial
        arr_task[1] = time + arr_task[1]
        sleep(1)
        arr_temp = copy(arr_task)

        # handle the task, but parallel on threads
        Threads.@spawn process!(arr_stored, arr_temp, time)
    end
    println("parallel assign : $arr_stored")
end

function process!(stored, task, t)
    # time of processing
    @time begin a = rand(100,100)
        [exp(a) for i in 1:100]
    end
    stored[t] = task[1]
end

@time main_serial()
println()
@time main_parallel_copy()
println()
@time main_parallel_assign()

These are the output:

  2.040268 seconds (1.50 k allocations: 46.069 MiB, 2.20% gc time)
  2.136096 seconds (1.50 k allocations: 46.069 MiB, 1.82% gc time)
  2.072491 seconds (1.50 k allocations: 46.069 MiB, 1.06% gc time)
  2.073747 seconds (1.50 k allocations: 46.069 MiB, 0.64% gc time)
serial : [1, 3, 6, 10]
 12.372814 seconds (6.39 k allocations: 184.297 MiB, 0.96% gc time, 0.32% compilation time)

  2.692697 seconds (2.70 k allocations: 82.098 MiB, 1.35% gc time)
  2.884239 seconds (3.36 k allocations: 101.799 MiB, 1.26% gc time)
  2.676654 seconds (2.80 k allocations: 84.476 MiB, 1.36% gc time)
  2.124010 seconds (1.52 k allocations: 46.069 MiB, 1.87% gc time)
parallel copy : [3, 6, 10, 10]
  7.806365 seconds (7.28 k allocations: 184.345 MiB, 0.98% gc time, 0.26% compilation time)

  2.763133 seconds (2.64 k allocations: 80.335 MiB, 2.10% gc time)
  3.105436 seconds (3.35 k allocations: 101.645 MiB, 1.87% gc time)
  2.889880 seconds (2.78 k allocations: 83.787 MiB, 3.14% gc time)
  2.118461 seconds (1.52 k allocations: 46.069 MiB, 0.87% gc time)
parallel assign : [1, 3, 6, 10]
  8.015358 seconds (6.95 k allocations: 184.329 MiB, 1.36% gc time, 0.12% compilation time)

My question are:

why is it that after copying an array into a function , the array can still be updated externally? See main_parallel_copy.
How else can I parallelize this type of loop based on threads?
In julia threads parallelization, is there any option to set variable attribute like “firstprivate” in openmp?

bertschi · October 8, 2023, 4:40pm

As copy is happening in the spawned thread, it’s executing to late, i.e., the loop has already advanced to the sleep in the next iteration when the thread has started up. (Sleeping before the update to arr_task will give the right result – but does not actually fix it as the race condition is still there if for some reason the thread starts up very slowly).
Make sure the copy is done before passing the data into the thread, i.e., main_parallel_assign is a correct solution.
That I don’t know.

Mutable data and threading is always fun (or a recipe for disaster depending on your point of view).

Topic		Replies	Views
Strange behavior parallel for General Usage	1	353	December 9, 2016
Questions on parallel programming and Threads.@threads vs @parallel with SharedArray Julia at Scale	7	3742	December 26, 2017
Assignment not effective in `@spawn` in function General Usage multithreading	4	407	December 26, 2020
Mutable object in parallel loop General Usage	1	476	October 21, 2017
How does a variable get copied to another process? General Usage parallel	4	1341	August 7, 2017

What is the mechanics of `copy` in the thread-parallization by `Threads.@spawn`? Or: Problems with copy arrays in thread parallelism

Related topics