I want to perform matrix vector multiplication of size (N^2, M) * (M), and then reshape the output vector of size (N^2) to a (N, N) matrix.
To do this without allocation, I tried using 1d view of the matrix.
Directly calling mul!
with views indeed did not allocate, but when I called mul!
inside a function, allocation occurs.
Why is this? Is there a way to keep using view inside the function without allocation?
I would also appreciate if there are better ways to achieve my goal of doing matrix vector multiplication and then reshaping the output vector to a matrix.
N = 50
M = 100
a1 = zeros(N*N)
a2 = zeros(N, N)
b = rand(N*N, M)
c = rand(M)
function test1!(a1, b, c)
mul!(a1, b, c)
end
function test2!(a2, b, c)
mul!((@view a2[:]), b, c)
end
@btime mul!($a1, $b, $c)
@btime mul!($(@view a2[:]), $b, $c)
@btime test1!($a1, $b, $c)
@btime test2!($a2, $b, $c)
Output:
53.083 μs (0 allocations: 0 bytes)
53.100 μs (0 allocations: 0 bytes)
53.023 μs (0 allocations: 0 bytes)
53.280 μs (2 allocations: 80 bytes)
EDIT: Following suggestions from the replies, I tried reshape(a2, :)
and vec(a2)
, but both still allocate.
function test3!(a2, b, c)
mul!(reshape(a2, :), b, c)
end
function test4!(a2, b, c)
mul!(vec(a2), b, c)
end
@btime mul!(reshape($a2, :), $b, $c)
@btime mul!($vec(a2), $b, $c)
@btime test3!($a2, $b, $c)
@btime test4!($a2, $b, $c)
Output:
53.223 μs (2 allocations: 80 bytes)
53.178 μs (2 allocations: 80 bytes)
53.303 μs (2 allocations: 80 bytes)
53.144 μs (2 allocations: 80 bytes)