Optimizing nested for loops and memory allocation in finite differences routine

Hi, guys. I’ve been working in a nonlinear wave equation solver using finite differences. My code works properly but I’d like to improve the performance, im working on a 12000*39000 grid and the average @btime is 15s. I’m pretty sure that I have been allocating memory unnecessarily. Im pretty new at the topic of performance, I’ve already implemented some of the tips on the performance page.

Here’s my code. Any tip on how to improve the performance would be appreciated.

P.D I’ve noticed that commenting the u=copy(u) line makes code10s faster but whenever I do that u stops updating, I’m sure that it’s due to scope issues but I haven’t found a way around.

function NLWaveSolver(I, V, G, L, dt, C, T, DX,bp,np)
   ##########################################
   Nt = Int(round(T / dt))
   t = range(0, Nt * dt, length = Nt + 1)
   dx = dt / C
   Nx = Int(round(2 * L / dx)) + Int(1)
   x = range(-L, L, length = Nx)
   m=Vector{Float64}[]
   sizehint!(m,Nt+1)
   C2 = C^2
   u_1 = zeros(Nx)
   u_2 = zeros(Nx)
   u = zeros(Nx)
###############################################################
######################## FIRST STEPS #########################################
	@avxt for i = 1:Nx
     u_1[i] = I[i]
   end
   push!(m, u_1)
   
	@avxt for i = 2:Nx-1
       u[i] =
         u_1[i] * (1.0 - C2) + dt*V[i] + 0.5 * C2 * (u_1[i+1] + u_1[i-1]) -
         0.5 * dt^2 * G(u_1[i],bp,np)
   end
   u[1] = DX[1]
   u[Nx] = DX[2]
   push!(m, u)
   ###############################################
   ##################LOOPS########################

  function  TXloop(a,b,c)
   for j = 2:Nt
      c = b
      b = a
      a=copy(a)
      @avxt for i = 2:Nx-1
         a[i] =
         2.0 * b[i] * (1.0 - C2) + C2 * (b[i+1] + b[i-1]) - c[i] -
         dt^2 * G(b[i],bp,np)
         end
      a[Nx]=c[Nx]-C*(c[Nx]-c[Nx-1])
      a[1]=c[1]+C*(c[2]-c[1])
      push!(m, a)
    end
   end

   TXloop(u,u_1,u_2)
   
   ##############################################
   ##############################################
   
   xv = range(-L, L, length = length(m[1]))
   tv = range(0.0, T, length = length(m))
   m2 = zeros(length(m[1]), length(m))
   
   for j in eachindex(m)
      @avxt   for i in eachindex(m[1])
            m2[i, j] = m[j][i]
      end
   end
   return m, tv, x, m2
end

One line that is killing your performance is m=[] This creates a Vector{Any} Instead write something like m=Vector{Float64}[] to create a Vector{Vector{Float64}}

1 Like

Thanks for the tip ! . I’m working on a 12000*39000 grid, by doing m=Vector{Float64}[] the average @btime went from 20s to 15s. I’ve also noticed that the nested loop is consuming at least 10s, I might be doing something wrong there (performancewise).

you should split into smaller functions, easier to optimize

1 Like