I am writing a performance-critical part and would like to avoid any memory allocations in that part by creating caches beforehand. I then essentially need to multiply sub-arrays in a loop. Is there any way to do these matrix multiplications of sub-arrays without any allocations?
Here is a MWE:
using LinearAlgebra A = rand(4,10,10) B = rand(10,10) C = zeros(4,10,10) # This is the function I plan to loop over function foo(A,B,C,dim) mul!(C[dim,1:10,1:10], A[dim,1:10,1:10], B) nothing end foo(A,B,C,1) @time foo(A,B,C,1) # 0.000010 seconds (2 allocations: 1.750 KiB) # This is the function that loops over the first dimension function wrapper_foo(A,B,C) for i in 1:4 foo(A,B,C,i) end nothing end wrapper_foo(A,B,C) @time wrapper_foo(A,B,C) # 0.000037 seconds (12 allocations: 85.500 KiB)
In situations where I don’t use sub-arrays, mul!() works without allocating any temporary arrays. Any ideas?