I was running some matrix diagonalization and processing the eigenvectors. I notice that the script takes an unusually long time and I realize that it could be a problem of the adjoint matrix multiplication.

Here is the testing script:

```
using LinearAlgebra, BenchmarkTools, Cthulhu
function adjoint_mul1(dim::Int)
A = rand(ComplexF64, dim, dim)
B = rand(Int, dim, dim)
A' * B
end
function adjoint_mul2(dim::Int)
A = rand(ComplexF64, dim, dim)
B = rand(ComplexF64, dim, dim)
A' * B
end
@btime a = adjoint_mul1(2^11)
@btime b = adjoint_mul2(2^11)
```

The result is as follows:

```
12.410 s (6 allocations: 160.00 MiB)
191.027 ms (6 allocations: 192.00 MiB)
```

When the matrix size grows even larger, say (4096,4096), basically `adjoint_mul1()`

will take forever (well actually 100+ seconds for 4096). By checking the function calltree, it appears that `adjoint_mul1()`

eventually uses `generic_matmatmul!`

```
# Line 455 in https://github.com/JuliaLang/julia/blob/master/stdlib/LinearAlgebra/src/matmul.jl
@inline function mul!(C::AbstractMatrix, adjA::Adjoint{<:Any,<:AbstractVecOrMat}, B::AbstractVecOrMat,
alpha::Number, beta::Number)
A = adjA.parent
return generic_matmatmul!(C, 'C', 'N', A, B, MulAddMul(alpha, beta))
end
```

and `adjoint_mul2()`

eventually uses `blas`

wrapper

```
# Line 446 in https://github.com/JuliaLang/julia/blob/master/stdlib/LinearAlgebra/src/matmul.jl
@inline function mul!(C::StridedMatrix{T}, adjA::Adjoint{<:Any,<:StridedVecOrMat{T}}, B::StridedVecOrMat{T},
alpha::Number, beta::Number) where {T<:BlasComplex}
A = adjA.parent
if A===B
return herk_wrapper!(C, 'C', A, MulAddMul(alpha, beta))
else
return gemm_wrapper!(C, 'C', 'N', A, B, MulAddMul(alpha, beta))
end
end
```

I wonder if this is by purpose or it should be opened as an issue for better performance? I understand that this is a problem of dispatching to BLAS when different types are used.

For me, I feel that it seems a type reinterpretation is missing such that when different types are used, a slow fallback is called. And my personal feeling is that code like `A' * B`

should take care of type promotion (not exactly what should be done there, perhaps more of reinterpretation) itself for generic users.