What's the most efficient way to sum the results of multithreading?

yingqiuz · May 27, 2021, 3:41pm

Hi,

I need to sum the outputs of each thread. Is there a standard/efficient way to do this?
for example:

X = Matrix{Float64}(d, N)
Threads.@threads for k = 1:N
    X[:, k] .= some_calculations()
end
sum!(ones(Int64, d), X)

Preferably avoid initialising an X, because I will only need the row sum.
Sorry about it if this is a stupid question. Any help would be much appreciated!

Satvik · May 27, 2021, 3:56pm

I think the easiest way is to use ThreadsX.mapreduce: https://github.com/tkf/ThreadsX.jl

tkf · May 28, 2021, 2:26am

ThreadsX.jl or any JuliaFolds-related packages do not have out-of-the-box multi-dimensional reduction API. This is mainly because Julia already has a rich set of DSL packages such as Tullio.jl, TensorOperations.jl, and LoopVectorization.jl etc. to support it.

Having said that, you can use Broadcasting transducer to construct a custom multi-dimensional reduction:

julia> using Folds, Transducers, Statistics

julia> randn(2, 3) |> eachcol |> Map(x -> x ./ mean(x)) |> Broadcasting() |> Folds.sum
2-element Vector{Float64}:
 -3.962371501856117
  9.962371501856117

or simply looping over the input in the other way around:

julia> Folds.collect(sum(xs) for xs in eachcol(randn(2, 3)))
3-element Vector{Float64}:
 -1.9645612346392887
  0.16303379291104828
 -0.22374280710495784

yingqiuz · May 28, 2021, 10:23pm

Hi, thank you for your reply. LoopVectorization works for me.
I also tried Broadcasting transducer, but it was not as good as ThreadsX, even with allocations. Could you shed some light on this?

julia> @btime ThreadsX.sum(x -> (sleep(0.5); x .* transpose(x)), eachcol(randn(1000, 50)))  
  2.106 s (524 allocations: 755.72 MiB)

julia> @btime randn(1000, 50) |> eachcol |> Map(x -> (sleep(0.5); x .* transpose(x))) |> Broadcasting() |> Folds.sum
  12.717 s (358 allocations: 397.12 MiB)

Thank you very much!

Topic		Replies	Views
Looking for advice on threading General Usage	6	1622	January 15, 2020
Repetitive Matrix Addition using Threads General Usage multithreading	1	316	August 4, 2022
Threaded map and reduce problem General Usage parallel	5	3303	February 25, 2021
Simple example of @threads Performance multithreading	17	827	June 22, 2022
Parallelization of simple loop: reductions, thread-private variables? Julia at Scale	9	4199	September 30, 2017

What's the most efficient way to sum the results of multithreading?

Related topics