Implementing finite difference with periodic B.C

I am pretty new to Julia and ultimately I want to transfer my python code for solving a nonlinear PDE into Julia (and see whether I can get the speedup I hope).

The PDE I want to solve is similar to a nonlinear heat equation. In python I have a function which computes dh/dt. I use the method of lines to discretize the vector h(x,t) in spatial domain, and then use a stiff ODE solver scipy.integrate.solve_ivp(… method=‘BDF’) to solve the PDE. I am trying to adopt a similar approach in Julia with the DifferentialEquations.jl package.

The first step I need to do is to figure out a way to implement numerical differentiation using finite difference (I need first and third derivative, with either 2nd or 4th order accuracy) with periodic boundary condition applied to both ends of the vector. I have the following four versions of codes:

function ndiff1(u::AbstractVector, n::Integer, order::Integer, dx::Float64)
    Nx = length(u)
    if n==1
        if order==2
            du = 0.5*(circshift(u,-1) - circshift(u,1))
        elseif order==4
            du = (circshift(u,2) - circshift(u,-2) + 8*(circshift(u,-1) - circshift(u,1)))/12.0
        end
    elseif n==3
       # similar to above with just different stencils
    end
    dudx = du/dx^n
    return dudx
end

function ndiff2(u::AbstractVector, n::Integer, order::Integer, dx::Float64)
    Nx = length(u)
    dudx = similar(u)
    
    if n==1
        if order==2
            for i = 2:Nx-1
                dudx[i] = 0.5*(u[i+1] - u[i-1])/dx
            end
            dudx[1] = 0.5*(u[2]-u[end])/dx
            dudx[end] = 0.5*(u[1]-u[end-1])/dx
            
        elseif order==4
            # similar to above with different stencils
        end
        
    elseif n==3
        # similar to above with different stencils
    end
    return dudx
end


using DiffEqOperators

const periodic_bc = PeriodicBC(Float64)
const D1_O2 = CenteredDifference(1, 2, dx, Nx)*periodic_bc
const D1_O4 = CenteredDifference(1, 4, dx, Nx)*periodic_bc
const D3_O2 = CenteredDifference(3, 2, dx, Nx)*periodic_bc
const D3_O4 = CenteredDifference(3, 2, dx, Nx)*periodic_bc

function ndiff3(u::AbstractVector, n::Integer, order::Integer)
    if n==1
        if order==2
            dudx = D1_O2*u
        elseif order==4
            dudx = D1_O4*u
        end
    elseif n==3
        if order==2
            dudx = D3_O2*u
        elseif order==4
            dudx = D3_O4*u
        end  
    end
    return dudx
end


function ndiff4(u::AbstractVector, n::Integer, order::Integer, dx::Float64)
    Nx = length(u)
    dudx = CenteredDifference(n, order, dx, Nx)*periodic_bc*u
    return dudx
end

When I used @btime to benchmark the above four functions on an array of size 256 (typical for my problem), ndiff3 is a bit faster than ndiff2, which is a bit faster than ndiff1, and ndiff4 is close to an order of magnitude slower than the rest. My questions are:

  1. Are there anything noticeable areas for improvement for ndiff1 and ndiff2? That’s similar to how I implemented the finite difference in python (except I try to ‘vectorize’ it in python using numpy without the for loop for ndiff2) and that’s already close to 2 orders of magnitude of speedup which I’m pretty happy about, but I’m not sure if I there is anything significant I’m missing in Julia.

  2. I try to use the DiffEqOperators package for ndiff3 and ndiff4 so I don’t have to write everything explicitly from scratch myself. However I’m not sure what’s the best way to implement CenteredDifference(n, order, dx, Nx)*periodic_bc. If I place them inside the function as in ndiff4, does Julia need to initialize it every time the function is called (and hence leads to slower speed)? If I place them outside the function as I did for ndiff3, then obviously it’s very inflexible as I will need to define the problem size Nx beforehand.

any comment/help is appreciated. thanks

Others can answer questions about DiffEqOperators better than me, but the goto-method for differential operators on a uniform grid with periodic boundary conditions is FFT, i.e. transform into the spectral domain where the derivative operators are all diagonal (regardless of order), and then transform back. See e.g. these notes by @stevengj.

Of course, the complexity of an FFT-based approach is \mathcal{O}(N\log N) if I remember correctly, whereas a finite-difference operator represented as a banded matrix (or lazily, as you do) should be roughly \mathcal{O}(N), at least for moderate orders of convergence.

You have many alternatives as well:

  • Gridap
  • ApproxFun

On a side note, I will be releasing soon a tutorial for BifurcationKit.jl on the 1d Langmuir equation (no periodic BC).

Thanks for your reply. I have tried using FFT with my Python codes though I don’t see much performance gain overall. Nevertheless my focus right now is learning how to implement codes properly in the Julia way instead of choosing the most algorithm or solver for the specific problem.

my focus right now is learning how to implement codes properly in the Julia way

It depends a lot on the specificities of the problem if you want to make it fast.

You can use sparse arrays (obviously), BandedArrays (very good for 1d), FFT for matrix-free methods in Nd, collocation,…

I understand this, but those considerations are not Julia specific (I need to think about that for my Python implementations as well). I’m referring to those Julia-specific issues which don’t normally appear in Python. For example, in my ndiff4, I have that operator of CenteredDifference(n, order, dx, Nx)*periodic_bc inside the function. This leads to very simple and easy-to-understand code, but it is the closest among all four versions. In ndiff3, I pre-define that operator outside the function as a const global variable (I know the use of global variables are discouraged). That leads to a 10x speedup, but the code is must less general and I have to define that every time in my calculations once I have a different length(u). In the same computing session, I will likely have the same length(u) and order, so I really just need to pre-calculate that differential operator once when those two are known at the start of each session. So how do I achieve similar code reusability as ndiff4 while maintaining the speed performance of ndiff3?

And for the naive and labour-intensive implementation of ndiff1 and ndiff2, are there any obvious mistakes? I just know I need to avoid any red highlights which suggest type instability if I run @code_warntype ndiff1(h,n,order,dx). Any tips on how I can look for potential issues?

Global variables are only bad when they’re not passed as arguments to the function.
This is bad:

dx = 0.1
f(L, u) = 1/dx * L * u

f(D1_04, u)

This is good:

dx = 0.1
f(L, u, dx) = 1/dx * L * u

f(D1_O4, u, dx)

Just pass the differential operator as an argument to your function, or apply it directly:

julia> using BenchmarkTools

julia> @btime ndiff3($u, 1, 4)
  955.556 ns (1 allocation: 2.12 KiB)

julia> @btime $D1_O4 * $u
  913.636 ns (1 allocation: 2.12 KiB)
1 Like

Don’t allocate a new dudx array on every timestep of your heat equation; pre-allocate it and overwrite it in-place on every timestep. (If you have a heat equation, I’m not sure why you are computing dudx at all; don’t you want to compute ∂²u/∂x² in a single step? Do you have an advection term?)

(Your circshift approach is even worse because it allocates multiple new arrays just to get du.)

There are plenty of micro-optimizations you could try as well, of course, e.g. combining 0.5/dx into a precomputed multiplicative factor, trying @simd or @avx, etcetera, but I would think carefully about the overall structure of what you are doing first.

Matrix-free approaches (i.e. direct loops for implementing the stencil), can be more efficient that methods which allocate a sparse matrix. However, if you want to use implicit timestepping here, you’ll probably want to store your stencil in a banded matrix (tridiagonal for order 2) and save the LU or Cholesky factorization.

1 Like

If you type dump(D1_O2) you’ll see that it is actually quite a lightweight object — it doesn’t store a big sparse matrix or anything like that, just a handful of numbers. So it should be cheap to recompute as needed.

That being said, there is an intermediate between declaring a bunch of global variables and reconstructing objects on each call. You can define a data structure that holds all of the information you need for your simulation and pass that around. The Parameters.jl package may be helpful here.

1 Like

Here’s an example showing how to optimize a finite difference code:

And here’s an example showing how to use GPUs in a finite difference code:

You can also use DiffEqOperators to generate the operators:

If you’re lazy, you can write bad code and let ModelingToolkit.jl do some code analysis and code generation to fix it. Here’s a tutorial where it does automated parallelism on a finite difference code and builds a symbolic Jacobian as well:

And following what Steven said, here’s a tutorial that shows how to tell the implicit ODE solver what your sparsity pattern is:

Together those tutorials should be more than enough to piece together a solution.

2 Likes