Possible speedup in matrix-diagonal products?

Right multiplication by a Diagonal matrix effectively scales the columns of a matrix. Currently the default Matrix-Diagonal product appears to use copy_similar:

(*)(A::AbstractMatrix, D::Diagonal) =
    rmul!(copy_similar(A, promote_op(*, eltype(A), eltype(D.diag))), D)

which seems a bit wasteful. Wouldn’t an uninitialized matrix be better as the destination here? I compare a naive implementation of this matrix product with and without bounds-checking disabled, and it appears that the present implementation is even less performant than the version with bounds checking.

julia> A = ones(3000, 2000); D = Diagonal(ones(2000));

julia> @btime $A * $D;
  42.105 ms (4 allocations: 45.78 MiB)

julia> Base.@propagate_inbounds function muldiag(A::Matrix{T}, D::Diagonal{R}) where {R,T}
           @assert size(A,2) == size(D,1) "sizes are incompatible"
           C = similar(A, promote_type(T, R)) # some promotion operation
           d = diag(D);
           for (j, dj) in enumerate(d), i in axes(C, 1) 
               C[i, j] = dj * A[i, j]
           end
           return C
       end
muldiag (generic function with 1 method)

julia> @btime muldiag($A, $D);
  36.357 ms (3 allocations: 45.79 MiB)

julia> @btime @inbounds muldiag($A, $D);
  28.553 ms (3 allocations: 45.79 MiB)

julia> A * D == muldiag(A, D)
true

Perhaps there are instances where the copy is essential, but it seems to be an expensive operation

julia> @btime LinearAlgebra.copy_similar($A, Float64);
  34.992 ms (2 allocations: 45.78 MiB)

and it appears that we may do without it after all?

Using mul! instead of rmul! seems to resolve this:

julia> muldiag(A, D) = mul!(similar(A, promote_type(eltype(A), eltype(D))), A, D);

julia> @btime muldiag($A, $D);
  28.734 ms (4 allocations: 45.78 MiB)
5 Likes

I’ve created a PR to add this for matrices of numbers

3 Likes