I am writing a performance-critical part and would like to avoid any memory allocations in that part by creating caches beforehand. I then essentially need to multiply sub-arrays in a loop. Is there any way to do these matrix multiplications of sub-arrays without any allocations?
Here is a MWE:
using LinearAlgebra
A = rand(4,10,10)
B = rand(10,10)
C = zeros(4,10,10)
# This is the function I plan to loop over
function foo(A,B,C,dim)
mul!(C[dim,1:10,1:10], A[dim,1:10,1:10], B)
nothing
end
foo(A,B,C,1)
@time foo(A,B,C,1)
# 0.000010 seconds (2 allocations: 1.750 KiB)
# This is the function that loops over the first dimension
function wrapper_foo(A,B,C)
for i in 1:4
foo(A,B,C,i)
end
nothing
end
wrapper_foo(A,B,C)
@time wrapper_foo(A,B,C)
# 0.000037 seconds (12 allocations: 85.500 KiB)
In situations where I donāt use sub-arrays, mul!() works without allocating any temporary arrays. Any ideas?