Performance discrepancy in sparse matrix product

Hi all, I noticed the following performance discrepancy:

julia> using BenchmarkTools

julia> const A = sprand(500, 500, 0.01);

julia> const B = sprand(500, 500, 0.01);

julia> const C = full(A*B);

julia> @benchmark A*B
BenchmarkTools.Trial: 
  memory estimate:  425.88 KiB
  allocs estimate:  1495
  --------------
  minimum time:     400.802 μs (0.00% GC)
  median time:      408.625 μs (0.00% GC)
  mean time:        426.060 μs (3.16% GC)
  maximum time:     1.485 ms (56.57% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark A_mul_B!(C, A, B)
BenchmarkTools.Trial: 
  memory estimate:  336 bytes
  allocs estimate:  6
  --------------
  minimum time:     182.815 ms (0.00% GC)
  median time:      182.990 ms (0.00% GC)
  mean time:        182.990 ms (0.00% GC)
  maximum time:     183.201 ms (0.00% GC)
  --------------
  samples:          28
  evals/sample:     1

The product of two sparse matrices, when creating a new sparse matrix, is an order of magnitude faster than when storing the product in a pre-allocated dense matrix, which should surely not be the case. I think that A_mul_B! must be falling back to a generic implementation?

Indeed it is:

julia> @which A_mul_B!(C, A, B)
A_mul_B!(C::AbstractArray{T,2} where T, A::Union{AbstractArray{T,1}, AbstractArray{T,2}} where T, B::Union{AbstractArray{T,1}, AbstractArray{T,2}} where T) in Base.LinAlg at linalg/matmul.jl:178

Ah, I didn’t know the @which macro before. Does the specialized version exist, or would that be a good pull request?

No specialised version seems to exist. You are welcome to open a pull request, though note that in the development version of Julia some functions have changed (e.g. A_mul_B! is now mul!) and sparse arrays have been moved to the standard library:
https://github.com/JuliaLang/julia/blob/master/stdlib/SparseArrays/src/linalg.jl

Also, see https://github.com/JuliaLang/julia/pull/24045 which might be a good point to keep going from.