Perhaps the following puzzle will ring a bell for someone?

Multiply `mul!(H, B, Gamma)`

:

```
(eltype(H), eltype(B), eltype(Gamma)) = (Complex{Float64}, Complex{Float64}, Complex{Float64})
```

It takes 13.091117 seconds

Multiply `mul!(Vn, Gamma, P)`

:

```
(eltype(Vn), eltype(Gamma), eltype(P), size(Gamma), size(P)) = (Complex{Float64}, Complex{Float64}, Complex{Float64}, (4436, 4436), (4436, 181))
```

It takes 0.643775 seconds

Now we change `Gamma`

to be real, everything else is the same.

```
(eltype(H), eltype(B), eltype(Gamma)) = (Complex{Float64}, Complex{Float64}, Float64)
```

The timing for the first product *decreased* with more than a factor of two (4.532288 seconds).

The timing for the second product *increased* by about a factor of 8:

```
(eltype(Vn), eltype(Gamma), eltype(P), size(Gamma), size(P)) = (Complex{Float64}, Float64, Complex{Float64}, (4436, 4436), (4436, 181))
```

This takes 4.958194 seconds (6 allocations: 336 bytes)

I could track it down to `_generic_matmatmul!`

. I believe tiled execution is selected. (There are some scary `# FIXME: This code is completely invalid!!!`

warnings around!) But that’s where I got stuck. I don’t see why changing the type of one of the arrays should speed up one product and slow down another one.