Puzzling performance when multiplying real and complex matrices


#1

Perhaps the following puzzle will ring a bell for someone?

Multiply mul!(H, B, Gamma):

    (eltype(H), eltype(B), eltype(Gamma)) = (Complex{Float64}, Complex{Float64}, Complex{Float64})   

It takes 13.091117 seconds

Multiply mul!(Vn, Gamma, P):

    (eltype(Vn), eltype(Gamma), eltype(P), size(Gamma), size(P)) = (Complex{Float64}, Complex{Float64}, Complex{Float64}, (4436, 4436), (4436, 181))                

It takes 0.643775 seconds

Now we change Gamma to be real, everything else is the same.

      (eltype(H), eltype(B), eltype(Gamma)) = (Complex{Float64}, Complex{Float64}, Float64)         

The timing for the first product decreased with more than a factor of two (4.532288 seconds).

The timing for the second product increased by about a factor of 8:

      (eltype(Vn), eltype(Gamma), eltype(P), size(Gamma), size(P)) = (Complex{Float64}, Float64, Complex{Float64}, (4436, 4436), (4436, 181))       

This takes 4.958194 seconds (6 allocations: 336 bytes)

I could track it down to _generic_matmatmul!. I believe tiled execution is selected. (There are some scary # FIXME: This code is completely invalid!!! warnings around!) But that’s where I got stuck. I don’t see why changing the type of one of the arrays should speed up one product and slow down another one.


#2

Can you provide a MWE that one can copy and run?


#3


Perhaps


#5

These are useful links Kristoffer, thanks. In fact https://github.com/JuliaLang/julia/issues/30811 turned out to describe the reason for the speed up/ slow down problem. The gist: if one has a complex matrix and a real matrix, they will be processed by BLAS if the product is complex*real, otherwise by the generic matmul for real*complex.