Hi all, I noticed the following performance discrepancy:
julia> using BenchmarkTools
julia> const A = sprand(500, 500, 0.01);
julia> const B = sprand(500, 500, 0.01);
julia> const C = full(A*B);
julia> @benchmark A*B
BenchmarkTools.Trial:
memory estimate: 425.88 KiB
allocs estimate: 1495
--------------
minimum time: 400.802 μs (0.00% GC)
median time: 408.625 μs (0.00% GC)
mean time: 426.060 μs (3.16% GC)
maximum time: 1.485 ms (56.57% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark A_mul_B!(C, A, B)
BenchmarkTools.Trial:
memory estimate: 336 bytes
allocs estimate: 6
--------------
minimum time: 182.815 ms (0.00% GC)
median time: 182.990 ms (0.00% GC)
mean time: 182.990 ms (0.00% GC)
maximum time: 183.201 ms (0.00% GC)
--------------
samples: 28
evals/sample: 1
The product of two sparse matrices, when creating a new sparse matrix, is an order of magnitude faster than when storing the product in a pre-allocated dense matrix, which should surely not be the case. I think that A_mul_B!
must be falling back to a generic implementation?