Matrix-Vector multiplication complex/real

jamblejoe · March 14, 2021, 1:53pm

The functions executed for A*y are different for A either real or complex.The first code uses BLAS gemv! (checked via @code_native)

A = rand(ComplexF64, n,  n)
y = rand(Float64, n)
c = Vector{Complex64}(undef, n)

mul!(c,A,y)

n=20000
julia>     @btime mul!($c, $A, $y)
  254.250 ms (0 allocations: 0 bytes)

while

A = rand(Float64, n,  n)
y = rand(ComplexF64, n)
c = Vector{Complex64}(undef, n)

mul!(c,A,y)

n=20000
julia> @btime mul!($c, $A, $y)
  271.984 ms (0 allocations: 0 bytes)

julia> BLAS.set_num_threads(1)

julia> @btime mul!($c, $A, $y)
  276.834 ms (0 allocations: 0 bytes)

uses generic_matvecmul!. The performance for n=20000 and 4 threads, is still similar. But BLAS is multithreaded, while generic_matvecmul! isnt. I can not check how both cases scale for larger n and more threads right now.

I am interested in the latter case and wonder why it is not using BLAS and is not multithreaded? I thought generic_ functions are fallbacks in Julia and one should look for specialized functions when one is interested in performance.

Could someone give some insights in this?

Topic		Replies	Views
Puzzling performance when multiplying real and complex matrices General Usage	3	495	February 12, 2019
Vector{Float} times Vector{Complex} slower than Vector{Complex} times Vector{Complex} Performance question , linearalgebra	6	561	July 26, 2019
Optimizing Complex Batch Matrix Multiplication Performance question	2	420	October 25, 2023
Improve performance of vector-matrix multiplication Performance performance , linearalgebra	2	623	November 16, 2020
In-place multiplication is too much slower for ComplexF64 Performance complex-numbers	19	1231	February 10, 2022

Matrix-Vector multiplication complex/real

Related topics