Hi,
v[i] *= M[i,j]
is not an atomic operation, so this is expected. For simplicity, imagine you only have two threads, and M is just 1x4: [1 0 0 1]
Since you split the work in threads by columns, thread 1 will take care of the first two columns [1 0] and the second thread of the other columns [0 1].
Assume each thread is working with their first element. Each one sees that v[1] is not zero and they compute the value v[1]*M[i,j]. For the first thread this ill be 1 and for the second thread it will be 0, but since the operation is not atomic, now the final value of v[1] could be 0 or 1, depending on who gets faster to the store operation.
Compare these two functions (modified from your code to just work on vectors, and working with addition instead). The first one, following your approach exhibits the same problem. Using mask_t (with atomic_add!), the problem goes away:
using Base.Threads
using Random
function mask!(v, M)
@threads for i ∈ 1:length(M)
v += M[i]
end
v
end
function mask_t!(v, M)
r = Atomic{eltype(v)}(one(eltype(v)))
@threads for i ∈ 1:length(M)
atomic_add!(r,M[i])
end
r[]
end
function test(seed=999, n=10^6)
println("Number of threads: ", nthreads())
v = one(Int)
Random.seed!(seed)
M = rand(0:1, n)
println(sum(mask!(v, M)))
println(sum(mask_t!(v, M)))
end
julia> test()
Number of threads: 4
126877
500543
julia> test()
Number of threads: 4
126268
500543
julia> test()
Number of threads: 4
126882
500543
Cheers,
AdV