I understand how I can solve this, but was this the expected default behaviour? When I see the code_lowered I agree the ! suffix suggest no copies. Neverhteless from my “high level code” X*Y I already expect memory to be allocated. And I was expecting a Blas call !
N = 1000
X = rand(N,N)
Y = rand(N,N)
println("N = 1000 same type")
@btime X*Y
X = rand(N,N)
Y = rand(Float32, N ,N)
println("N = 1000 different type")
@btime X*Y
Gives me same allocations in MiB but a drastically worse performance
N = 1000 same type
15.822 ms (2 allocations: 7.63 MiB)
N = 1000 different type
1.326 s (8 allocations: 7.63 MiB)
I would rather have some extra copies and much better speed. I can see this as a personal preference but at the end of the day If I expected a call to a highly optimized Blas (and therefore speed).
This was the issue of the huge performance hit from here:
and I expect it to happen to a lot of people.
If that was the expected behaviour maybe there should be a line in performance tips stating “Matrix multiplication with different types will destroy your performance yet it won’t allocate memory. This was a design decision.”.
Arrays are a remarkable part of Julia, and I expect that for a lot of people LinearAlgebra will be also relevant. Since the language clearly shows speed as a big feature (the comparisson between languages here Julia Micro-Benchmarks does not show memory) I felt the fastest solution should be the default one.
https://docs.julialang.org/en/v1/manual/performance-tips/index.html
Certainly in other frameworks (pytorch/numpy/matlab) this is not the default behaviour.