I am adding a 2D array multiplied by a constant to each 2D slice of a 3D array as part of a larger code. The function simply loops over all the indices to perform the calculation. To increase the speed of the computation, I added multithreading to the loop with the @threads
macro, and it works fine. However, I also tried to implement the loop with CartesianRange
. It works fine without multithreading, but gives an erroneous result when combined with @threads
. I would like to know if there is something wrong with my code or if it is a bug (I know multithreading support is still experimental). A simplified version of my code is given below.
I can perform the calculation by explicitly looping over all the indices using test_ind!
.
A = Array{Float64}(M, N, L)
B = Array{Float64}(M, N)
C = similar(A)
function test_ind!(C, A, B)
for k = 1:size(C)[3], j = 1:size(C)[2], i = 1:size(C)[1]
C[i,j,k] = A[i,j,k] + k*B[i,j]
end
end
M
, N
, and L
are the array dimensions. For larger arrays, it is faster to use multiple threads which are easily added with the @threads
macro as shown in test_ind_multi!
below.
function test_ind_multi!(C, A, B)
Threads.@threads for k = 1:size(C)[3]
for j = 1:size(C)[2], i = 1:size(C)[1]
C[i,j,k] = A[i,j,k] + k*B[i,j]
end
end
end
Running a simple test confirms test_ind!
and test_ind_multi!
give the same result.
A = rand(4, 4, 3)
B = rand(4, 4)
C1 = similar(A)
C2 = similar(A)
test_ind!(C1, A, B)
test_ind_multi!(C2, A, B)
isapprox(C1, C2) # true
The loop in test_ind!
can also be written using CartesianRange
as shown below.
function test_cart!(C, A, B)
for ind in CartesianRange(size(C))
C[ind] = A[ind] + ind.I[3]*B[ind.I[1],ind.I[2]]
end
end
Testing like before confirms the result is the same as test_ind!
. Again I would like to add multithreading in the same manner.
function test_cart_multi!(C, A, B)
Threads.@threads for ind in CartesianRange(size(C))
C[ind] = A[ind] + ind.I[3]*B[ind.I[1],ind.I[2]]
end
end
However, test_cart_multi!
does not return the same result as the other three functions.