In Julia 0.7

```
using SparseArrays
n = 1000
@time Diagonal(1:n) * sparse(1:n, n:-1:1, 1:n)
@time sparse(1:n, n:-1:1, 1:n) * Diagonal(1:n)
0.002262 seconds (42 allocations: 110.125 KiB)
0.893608 seconds (39 allocations: 80.578 KiB)
```

Right multiplication is much slower than left multiplication. The reason is, in `stdlib/LinearAlgebra/src/diagonal.jl`

, we have

```
mul!(out::AbstractMatrix, A::Diagonal, in::AbstractMatrix) = out .= A.diag .* in
```

but we have no `mul!(out::AbstractMatrix, in::AbstractMatrix, A::Diagonal)`

. Therefore, the right multiplication actually uses the more general method `mul!(C::AbstractMatrix, A::AbstractVecOrMat, B::AbstractVecOrMat)`

in `stdlib/LinearAlgebra/src/matmul.jl`

, which is not optimized for Diagonal matrix multiplication.

On the other hand, we do have optimized code for right multiplication method `rmul!(A::AbstractMatrix, D::Diagonal)`

in `stdlib/LinearAlgebra/src/diagonal.jl`

```
using SparseArrays
n = 1000
@time rmul!(sparse(1:n, n:-1:1, 1:n), Diagonal(1:n));
@time sparse(1:n, n:-1:1, 1:n) * Diagonal(1:n);
0.000061 seconds (39 allocations: 120.109 KiB)
0.908803 seconds (39 allocations: 80.578 KiB)
```

My questions:

Should we reuse `rmul!`

and `lmul!`

for `mul!`

?

Should we also have `mul!(out::AbstractMatrix, in::AbstractMatrix, A::Diagonal)`

?