# Puzzling performance when multiplying real and complex matrices

Perhaps the following puzzle will ring a bell for someone?

Multiply `mul!(H, B, Gamma)`:

``````    (eltype(H), eltype(B), eltype(Gamma)) = (Complex{Float64}, Complex{Float64}, Complex{Float64})
``````

It takes 13.091117 seconds

Multiply `mul!(Vn, Gamma, P)`:

``````    (eltype(Vn), eltype(Gamma), eltype(P), size(Gamma), size(P)) = (Complex{Float64}, Complex{Float64}, Complex{Float64}, (4436, 4436), (4436, 181))
``````

It takes 0.643775 seconds

Now we change `Gamma` to be real, everything else is the same.

``````      (eltype(H), eltype(B), eltype(Gamma)) = (Complex{Float64}, Complex{Float64}, Float64)
``````

The timing for the first product decreased with more than a factor of two (4.532288 seconds).

The timing for the second product increased by about a factor of 8:

``````      (eltype(Vn), eltype(Gamma), eltype(P), size(Gamma), size(P)) = (Complex{Float64}, Float64, Complex{Float64}, (4436, 4436), (4436, 181))
``````

This takes 4.958194 seconds (6 allocations: 336 bytes)

I could track it down to `_generic_matmatmul!`. I believe tiled execution is selected. (There are some scary `# FIXME: This code is completely invalid!!!` warnings around!) But that’s where I got stuck. I don’t see why changing the type of one of the arrays should speed up one product and slow down another one.

Can you provide a MWE that one can copy and run?

Perhaps

1 Like

These are useful links Kristoffer, thanks. In fact https://github.com/JuliaLang/julia/issues/30811 turned out to describe the reason for the speed up/ slow down problem. The gist: if one has a complex matrix and a real matrix, they will be processed by BLAS if the product is `complex*real`, otherwise by the generic matmul for `real*complex`.

1 Like