Nested Loop optimization

Hello, I’ve been working on a solver where I have a nested loop and I want to improve the performance. I think that I’m making unnecessary allocations, for instance I have to write a=copy(a) to force the inner loop to update “a”. Any general tip on how to improve the performance of the loop or how to get rid of copying “a” every j iteration would be appreciated.

P.D I feed the function with arrays of length(Nx) then II fix the a boundary values in every step. My goal is to modify “a” with the inner loop without losing performance. For reference, I’m working in a Nx=12000,Nt=40000 grid.

This is a simplified version of the loop I use in my solver:


function  TXloop(a,b,c)
   for j = 2:Nt
      c = b
      b = a
      a=copy(a)
      @avxt for i = 2:Nx-1
         a[i] =b[i]+c[i]
         end
      a[Nx]=c[1]
      a[1]=c[2]-c[1]
      push!(m, a)
    end
   end

copy and push! will both allocate, and usually you can avoid this by preallocating your memory. Work out how much memory you need first and create an array of that type. A crude example:

function  TXloop(a,b,c)
   # preallocate arrays at the start
   a_buffer = similar(a)
   m = zeros(eltype(a), Nt-1,length(a))
   for j = 2:Nt
      # same as copy, but doesn't allocate
      a_buffer .= a
      # calculate something

     # copy into a section of m
      m[j-1, :] .= a_buffer
    end
end

The .= operator is your friend when working with arrays as it broadcasts element-wise, and anything on the right hand side is fused to avoid allocating arrays for intermediate results. There’s also no reason you couldn’t also use a for loop.

Look for methods with a ! at the end of the name as these are mutating methods that alter the first argument, which usually let you do operations on some preallocated memory.

1 Like

Thanks, this is what I was trying to understand. I’ll implement your tips on my routine and see how it goes.