Possible slow fallback of adjoint matrix multiplication for different element types

songxianxu · May 10, 2021, 6:16am

I was running some matrix diagonalization and processing the eigenvectors. I notice that the script takes an unusually long time and I realize that it could be a problem of the adjoint matrix multiplication.

Here is the testing script:

using LinearAlgebra, BenchmarkTools, Cthulhu

function adjoint_mul1(dim::Int)
  A = rand(ComplexF64, dim, dim)
  B = rand(Int, dim, dim)
  A' * B
end

function adjoint_mul2(dim::Int)
  A = rand(ComplexF64, dim, dim)
  B = rand(ComplexF64, dim, dim)
  A' * B
end

@btime a = adjoint_mul1(2^11)
@btime b = adjoint_mul2(2^11)

The result is as follows:

  12.410 s (6 allocations: 160.00 MiB)
  191.027 ms (6 allocations: 192.00 MiB)

When the matrix size grows even larger, say (4096,4096), basically adjoint_mul1() will take forever (well actually 100+ seconds for 4096). By checking the function calltree, it appears that adjoint_mul1() eventually uses generic_matmatmul!

# Line 455 in https://github.com/JuliaLang/julia/blob/master/stdlib/LinearAlgebra/src/matmul.jl
@inline function mul!(C::AbstractMatrix, adjA::Adjoint{<:Any,<:AbstractVecOrMat}, B::AbstractVecOrMat,
                 alpha::Number, beta::Number)
    A = adjA.parent
    return generic_matmatmul!(C, 'C', 'N', A, B, MulAddMul(alpha, beta))
end

and adjoint_mul2() eventually uses blas wrapper

# Line 446 in https://github.com/JuliaLang/julia/blob/master/stdlib/LinearAlgebra/src/matmul.jl
@inline function mul!(C::StridedMatrix{T}, adjA::Adjoint{<:Any,<:StridedVecOrMat{T}}, B::StridedVecOrMat{T},
                 alpha::Number, beta::Number) where {T<:BlasComplex}
    A = adjA.parent
    if A===B
        return herk_wrapper!(C, 'C', A, MulAddMul(alpha, beta))
    else
        return gemm_wrapper!(C, 'C', 'N', A, B, MulAddMul(alpha, beta))
    end
end

I wonder if this is by purpose or it should be opened as an issue for better performance? I understand that this is a problem of dispatching to BLAS when different types are used.

For me, I feel that it seems a type reinterpretation is missing such that when different types are used, a slow fallback is called. And my personal feeling is that code like A' * B should take care of type promotion (not exactly what should be done there, perhaps more of reinterpretation) itself for generic users.

Topic		Replies	Views
Is it possible for complex matrix multiplication and adjoint speed to catch up with matlab? New to Julia question	38	1338	October 11, 2023
Strange performance with Adjoint structures Performance	7	738	August 14, 2018
Why Julia and linear algebra module are incredibly slow, compare to C++ and Eigen Performance question , package , performance , linearalgebra	14	5228	March 1, 2019
Possible performance improvement in matrix multiplication involving a transpose and a complex matrix Performance linearalgebra	3	839	December 10, 2021
Specialized matrix-matrix multiplication algorithm New to Julia question , performance , linearalgebra	5	411	July 9, 2024

Possible slow fallback of adjoint matrix multiplication for different element types

Related topics