Drastic performance hit matrix multiply different types. Internal cast julia vs numpy?

I understand how I can solve this, but was this the expected default behaviour? When I see the code_lowered I agree the ! suffix suggest no copies. Neverhteless from my “high level code” X*Y I already expect memory to be allocated. And I was expecting a Blas call !

N = 1000

X = rand(N,N)
Y = rand(N,N)
println("N = 1000 same type")
@btime X*Y

X = rand(N,N)
Y = rand(Float32, N ,N)
println("N = 1000 different type")
@btime X*Y

Gives me same allocations in MiB but a drastically worse performance

N = 1000 same type
  15.822 ms (2 allocations: 7.63 MiB)
N = 1000 different type
  1.326 s (8 allocations: 7.63 MiB)

I would rather have some extra copies and much better speed. I can see this as a personal preference but at the end of the day If I expected a call to a highly optimized Blas (and therefore speed).

This was the issue of the huge performance hit from here:

and I expect it to happen to a lot of people.

If that was the expected behaviour maybe there should be a line in performance tips stating “Matrix multiplication with different types will destroy your performance yet it won’t allocate memory. This was a design decision.”.

Arrays are a remarkable part of Julia, and I expect that for a lot of people LinearAlgebra will be also relevant. Since the language clearly shows speed as a big feature (the comparisson between languages here Julia Micro-Benchmarks does not show memory) I felt the fastest solution should be the default one.

https://docs.julialang.org/en/v1/manual/performance-tips/index.html

Certainly in other frameworks (pytorch/numpy/matlab) this is not the default behaviour.

2 Likes