I am adding a 2D array multiplied by a constant to each 2D slice of a 3D array as part of a larger code. The function simply loops over all the indices to perform the calculation. To increase the speed of the computation, I added multithreading to the loop with the `@threads` macro, and it works fine. However, I also tried to implement the loop with `CartesianRange`. It works fine without multithreading, but gives an erroneous result when combined with `@threads`. I would like to know if there is something wrong with my code or if it is a bug (I know multithreading support is still experimental). A simplified version of my code is given below.

I can perform the calculation by explicitly looping over all the indices using `test_ind!`.

``````A = Array{Float64}(M, N, L)
B = Array{Float64}(M, N)
C = similar(A)

function test_ind!(C, A, B)
for k = 1:size(C)[3], j = 1:size(C)[2], i = 1:size(C)[1]
C[i,j,k] = A[i,j,k] + k*B[i,j]
end
end
``````

`M`, `N`, and `L` are the array dimensions. For larger arrays, it is faster to use multiple threads which are easily added with the `@threads` macro as shown in `test_ind_multi!` below.

``````function test_ind_multi!(C, A, B)
for j = 1:size(C)[2], i = 1:size(C)[1]
C[i,j,k] = A[i,j,k] + k*B[i,j]
end
end
end
``````

Running a simple test confirms `test_ind!` and `test_ind_multi!` give the same result.

``````A = rand(4, 4, 3)
B = rand(4, 4)
C1 = similar(A)
C2 = similar(A)
test_ind!(C1, A, B)
test_ind_multi!(C2, A, B)
isapprox(C1, C2) # true
``````

The loop in `test_ind!` can also be written using `CartesianRange` as shown below.

``````function test_cart!(C, A, B)
for ind in CartesianRange(size(C))
C[ind] = A[ind] + ind.I[3]*B[ind.I[1],ind.I[2]]
end
end
``````

Testing like before confirms the result is the same as `test_ind!`. Again I would like to add multithreading in the same manner.

``````function test_cart_multi!(C, A, B)
However, `test_cart_multi!` does not return the same result as the other three functions.
The current version of `@threads` only works with `for` loops where the iteration is directly convertible into linear indexing. (Iterators with state obviously need specialized logic.) There are already some GitHub issues related to this, but it seems that public improvements to multithreading have been postponed until after v0.6.