Multithreading with CartesianRange bug?

I am adding a 2D array multiplied by a constant to each 2D slice of a 3D array as part of a larger code. The function simply loops over all the indices to perform the calculation. To increase the speed of the computation, I added multithreading to the loop with the @threads macro, and it works fine. However, I also tried to implement the loop with CartesianRange. It works fine without multithreading, but gives an erroneous result when combined with @threads. I would like to know if there is something wrong with my code or if it is a bug (I know multithreading support is still experimental). A simplified version of my code is given below.

I can perform the calculation by explicitly looping over all the indices using test_ind!.

A = Array{Float64}(M, N, L)
B = Array{Float64}(M, N)
C = similar(A)

function test_ind!(C, A, B)
    for k = 1:size(C)[3], j = 1:size(C)[2], i = 1:size(C)[1]
        C[i,j,k] = A[i,j,k] + k*B[i,j]
    end
end

M, N, and L are the array dimensions. For larger arrays, it is faster to use multiple threads which are easily added with the @threads macro as shown in test_ind_multi! below.

function test_ind_multi!(C, A, B)
    Threads.@threads for k = 1:size(C)[3]
        for j = 1:size(C)[2], i = 1:size(C)[1]
            C[i,j,k] = A[i,j,k] + k*B[i,j]
        end
    end
end

Running a simple test confirms test_ind! and test_ind_multi! give the same result.

A = rand(4, 4, 3)
B = rand(4, 4)
C1 = similar(A)
C2 = similar(A)
test_ind!(C1, A, B)
test_ind_multi!(C2, A, B)
isapprox(C1, C2) # true

The loop in test_ind! can also be written using CartesianRange as shown below.

function test_cart!(C, A, B)
    for ind in CartesianRange(size(C))
        C[ind] = A[ind] + ind.I[3]*B[ind.I[1],ind.I[2]]
    end
end

Testing like before confirms the result is the same as test_ind!. Again I would like to add multithreading in the same manner.

function test_cart_multi!(C, A, B)
    Threads.@threads for ind in CartesianRange(size(C))
        C[ind] = A[ind] + ind.I[3]*B[ind.I[1],ind.I[2]]
    end
end

However, test_cart_multi! does not return the same result as the other three functions.

The current version of @threads only works with for loops where the iteration is directly convertible into linear indexing. (Iterators with state obviously need specialized logic.) There are already some GitHub issues related to this, but it seems that public improvements to multithreading have been postponed until after v0.6.