tl;dr
Reduce the allocation for foo(u,...): Inside the function, given an array u of size m*1, get the first n=d*d*p elements of u where n<m is guaranteed, and reshape it as an array f of d-by-d matrices (access as f[j], j in 1…p). f can be read-only. Is there a minimum allocations that cannot be avoided?
For example, I need to take, say, 40 elements of an array r of size 60, reshape it into 10 2-by-2 matrices f[...], and do calculation on these matrices later (e.g. the remaining 20 elements of r is used somewhere else). The matrices in real applications is much bigger so using StaticArrays is not such a good fit. Sample code: [note that f[i] .* i here is a placeholder; in real code, it’s like df[:,:,i] .= c[j,k].*f[k]*f[j]+d[j]*f[j], where c and d are coefficients, and there’s another sum over j and k inside the i loop]
df=zeros(Complex{Float64},2,2,10)
r=rand(1,60)+1im*rand(1,60)
f=[zeros(ComplexF64,2,2) for i in 1:10]
function d(df,f,r)
@views f .= [reshape(r[(i-1)*4+1:4*i],2,2) for i in 1:10]
for i in 1:10
df[:,:,i] .= f[i] .* i
end
return 0
end
@btime shows 925.667 ns (53 allocations: 3.17 KiB); the df[:,:,i] loop does not allocate. This function gets called about 1e5 times so reducing the allocation can help a lot. Thanks!