Hello, I’ve been working on a solver where I have a nested loop and I want to improve the performance. I think that I’m making unnecessary allocations, for instance I have to write a=copy(a) to force the inner loop to update “a”. Any general tip on how to improve the performance of the loop or how to get rid of copying “a” every j iteration would be appreciated.
P.D I feed the function with arrays of length(Nx) then II fix the a boundary values in every step. My goal is to modify “a” with the inner loop without losing performance. For reference, I’m working in a Nx=12000,Nt=40000 grid.
This is a simplified version of the loop I use in my solver:
function TXloop(a,b,c)
for j = 2:Nt
c = b
b = a
a=copy(a)
@avxt for i = 2:Nx-1
a[i] =b[i]+c[i]
end
a[Nx]=c[1]
a[1]=c[2]-c[1]
push!(m, a)
end
end
copy and push! will both allocate, and usually you can avoid this by preallocating your memory. Work out how much memory you need first and create an array of that type. A crude example:
function TXloop(a,b,c)
# preallocate arrays at the start
a_buffer = similar(a)
m = zeros(eltype(a), Nt-1,length(a))
for j = 2:Nt
# same as copy, but doesn't allocate
a_buffer .= a
# calculate something
# copy into a section of m
m[j-1, :] .= a_buffer
end
end
The .= operator is your friend when working with arrays as it broadcasts element-wise, and anything on the right hand side is fused to avoid allocating arrays for intermediate results. There’s also no reason you couldn’t also use a for loop.
Look for methods with a ! at the end of the name as these are mutating methods that alter the first argument, which usually let you do operations on some preallocated memory.