Right multiplication of Diagonal matrix


In Julia 0.7

using SparseArrays
n = 1000
@time Diagonal(1:n) * sparse(1:n, n:-1:1, 1:n)
@time sparse(1:n, n:-1:1, 1:n) * Diagonal(1:n)
  0.002262 seconds (42 allocations: 110.125 KiB)
  0.893608 seconds (39 allocations: 80.578 KiB)

Right multiplication is much slower than left multiplication. The reason is, in stdlib/LinearAlgebra/src/diagonal.jl, we have

mul!(out::AbstractMatrix, A::Diagonal, in::AbstractMatrix) = out .= A.diag .* in

but we have no mul!(out::AbstractMatrix, in::AbstractMatrix, A::Diagonal). Therefore, the right multiplication actually uses the more general method mul!(C::AbstractMatrix, A::AbstractVecOrMat, B::AbstractVecOrMat) in stdlib/LinearAlgebra/src/matmul.jl, which is not optimized for Diagonal matrix multiplication.

On the other hand, we do have optimized code for right multiplication method rmul!(A::AbstractMatrix, D::Diagonal) in stdlib/LinearAlgebra/src/diagonal.jl

using SparseArrays

n = 1000
@time rmul!(sparse(1:n, n:-1:1, 1:n), Diagonal(1:n));
@time sparse(1:n, n:-1:1, 1:n) * Diagonal(1:n);

  0.000061 seconds (39 allocations: 120.109 KiB)
  0.908803 seconds (39 allocations: 80.578 KiB)

My questions:
Should we reuse rmul! and lmul! for mul!?
Should we also have mul!(out::AbstractMatrix, in::AbstractMatrix, A::Diagonal)?