Hello,

I’m measuring the latency of `mul!(C, A, B)`

, where `C`

and `B`

are dense and `A`

is sparse, and I’m seeing something surprising: the latency isn’t linear in the number of operations required. Does anyone know why this is?

The number of columns of `A`

is `m=1000`

, the number of columns of `B`

is `k=3`

, and I’m varying the number of rows of `A`

, denoted by `n`

. I’m on an Intel system and I’m including `MKLSparse`

(but I see the same behaviour regardless of if I use `MKLSparse`

).

My experiment looks like the following code, except that it’s in a function, I’m running it many times, and I discard the first sample (since it includes compilation time)

```
A = sprand(n, m, 0.05)
B = randn(m, k)
C = zeros(n, k)
latency_mul = @elapsed mul!(C, A, B)
latency_mymul! = @elapsed mymul!(C, A, B) # mymul! defined below
```

Here’s the definition of `mymul!`

:

```
function mymul!(C::Matrix, A::SparseMatrixCSC, B::Matrix)
@boundscheck size(C, 1) == size(A, 1) || throw(DimensionMismatch("C has dimensions $(size(C)), but A has dimensions $(size(A))"))
@boundscheck size(C, 2) == size(B, 2) || throw(DimensionMismatch("C has dimensions $(size(C)), but B has dimensions $(size(B))"))
@boundscheck size(A, 2) == size(B, 1) || throw(DimensionMismatch("A has dimensions $(size(A)), but B has dimensions $(size(B))"))
m, n = size(A)
k = size(B, 2)
rows = rowvals(A)
vals = nonzeros(A)
C .= 0
for col = 1:n # column of A
for i in nzrange(A, col)
@inbounds row = rows[i] # row of A
@inbounds val = vals[i] # A[row, col]
@simd for j = 1:k # columns indices of B
@inbounds C[row, j] += val * B[col, j]
end
end
end
C
end
```

This is the latency I’m seeing. I’d have expected that latency would increase linearly with the number of rows, which is what I’m seeing for `mymul!`

(ish, for large enough matrices). However, for `mul!`

I see something else. The dashed lines are quadratic functions fit to the data. The latency looks linear in the number of rows when running the same experiment for dense matrices so it’s got something to do with the sparse matrix multiplication.