# How to do chunk matrix/vector multiplication with @threads

I implemented `c = Ab` using multithreading naively by computing row-wise dot products. I did so on `BitArrays` because it is the closest thing to the actual data structure in my problem.

However, when I tried to switch the row-wise inner products to submatrix/vector operations, my updates to the final vector stops happening, and I’m not sure why. I suspect it has to do with atomic operations, but there is so little documentation on this I’m still not sure what to do.

Below I paste my code:

``````function naive_mul!(
c :: AbstractVector,
A :: AbstractMatrix,
b :: AbstractVector
)
for j in eachindex(b)
c[j] = dot(A[j, :], b)
end
end

c :: AbstractVector,
A :: AbstractMatrix,
b :: AbstractVector
)
c[j] = dot(A[j, :], b)
end
end
``````

On 8 threads, the multiplication is much faster:

``````using BenchmarkTools
a = bitrand(1000, 1000)
b = rand(1000)
c = zeros(1000)

julia> @btime naive_mul!(c, a, b)
2.231 ms (1000 allocations: 46.88 KiB)

528.598 μs (515 allocations: 22.83 KiB)
``````

But when I tried to do the same using “chunks” of the original matrix instead of row-by-row dot product, somehow I don’t get anything:

``````#function to assign the chunk size
function delegate(X::AbstractArray)
splits = [round(Int, s) for s in linspace(0.0, size(X,1), nchunks+1)]
return splits
end

c::AbstractVector,
a::AbstractMatrix,
b::AbstractVector,
splits::Vector{Int64}
)
region = splits[i]+1:splits[i+1]
A_mul_B!(c[region], view(a, region, :), b)
end
end
``````

Split up the 1000x1000 matrix into flat evenly spaced chunks that begins on:

``````julia> splits = delegate(a)
9-element Array{Int64,1}:
0
125
250
375
500
625
750
875
1000
``````

Why does this not correctly assign `c`??

``````a = bitrand(1000, 1000)
b = rand(1000)
c = zeros(1000)

julia> c
1000-element Array{Float64,1}:
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
⋮
``````

It was because `c[region]` actually creates a new vector that will store the result. Since this vector is not bound to the original vector `c`, it gets lost and hence the original vector does not get updated. To fix this, simply use `view()` on c and everything should work:

``````using BenchmarkTools
a = bitrand(1000, 1000)
b = rand(1000)
c = zeros(1000)

function delegate(X::AbstractArray)
splits = [round(Int, s) for s in linspace(0.0, size(X,1), nchunks+1)]
return splits
end

c::AbstractVector,
a::AbstractMatrix,
b::AbstractVector,
splits::Vector{Int64}
)