Implementing finite difference with periodic B.C

yhchang · November 30, 2020, 5:13am

I am pretty new to Julia and ultimately I want to transfer my python code for solving a nonlinear PDE into Julia (and see whether I can get the speedup I hope).

The PDE I want to solve is similar to a nonlinear heat equation. In python I have a function which computes dh/dt. I use the method of lines to discretize the vector h(x,t) in spatial domain, and then use a stiff ODE solver scipy.integrate.solve_ivp(… method=‘BDF’) to solve the PDE. I am trying to adopt a similar approach in Julia with the DifferentialEquations.jl package.

The first step I need to do is to figure out a way to implement numerical differentiation using finite difference (I need first and third derivative, with either 2nd or 4th order accuracy) with periodic boundary condition applied to both ends of the vector. I have the following four versions of codes:

function ndiff1(u::AbstractVector, n::Integer, order::Integer, dx::Float64)
    Nx = length(u)
    if n==1
        if order==2
            du = 0.5*(circshift(u,-1) - circshift(u,1))
        elseif order==4
            du = (circshift(u,2) - circshift(u,-2) + 8*(circshift(u,-1) - circshift(u,1)))/12.0
        end
    elseif n==3
       # similar to above with just different stencils
    end
    dudx = du/dx^n
    return dudx
end

function ndiff2(u::AbstractVector, n::Integer, order::Integer, dx::Float64)
    Nx = length(u)
    dudx = similar(u)
    
    if n==1
        if order==2
            for i = 2:Nx-1
                dudx[i] = 0.5*(u[i+1] - u[i-1])/dx
            end
            dudx[1] = 0.5*(u[2]-u[end])/dx
            dudx[end] = 0.5*(u[1]-u[end-1])/dx
            
        elseif order==4
            # similar to above with different stencils
        end
        
    elseif n==3
        # similar to above with different stencils
    end
    return dudx
end


using DiffEqOperators

const periodic_bc = PeriodicBC(Float64)
const D1_O2 = CenteredDifference(1, 2, dx, Nx)*periodic_bc
const D1_O4 = CenteredDifference(1, 4, dx, Nx)*periodic_bc
const D3_O2 = CenteredDifference(3, 2, dx, Nx)*periodic_bc
const D3_O4 = CenteredDifference(3, 2, dx, Nx)*periodic_bc

function ndiff3(u::AbstractVector, n::Integer, order::Integer)
    if n==1
        if order==2
            dudx = D1_O2*u
        elseif order==4
            dudx = D1_O4*u
        end
    elseif n==3
        if order==2
            dudx = D3_O2*u
        elseif order==4
            dudx = D3_O4*u
        end  
    end
    return dudx
end


function ndiff4(u::AbstractVector, n::Integer, order::Integer, dx::Float64)
    Nx = length(u)
    dudx = CenteredDifference(n, order, dx, Nx)*periodic_bc*u
    return dudx
end

When I used @btime to benchmark the above four functions on an array of size 256 (typical for my problem), ndiff3 is a bit faster than ndiff2, which is a bit faster than ndiff1, and ndiff4 is close to an order of magnitude slower than the rest. My questions are:

Are there anything noticeable areas for improvement for ndiff1 and ndiff2? That’s similar to how I implemented the finite difference in python (except I try to ‘vectorize’ it in python using numpy without the for loop for ndiff2) and that’s already close to 2 orders of magnitude of speedup which I’m pretty happy about, but I’m not sure if I there is anything significant I’m missing in Julia.
I try to use the DiffEqOperators package for ndiff3 and ndiff4 so I don’t have to write everything explicitly from scratch myself. However I’m not sure what’s the best way to implement CenteredDifference(n, order, dx, Nx)*periodic_bc. If I place them inside the function as in ndiff4, does Julia need to initialize it every time the function is called (and hence leads to slower speed)? If I place them outside the function as I did for ndiff3, then obviously it’s very inflexible as I will need to define the problem size Nx beforehand.

any comment/help is appreciated. thanks

jagot · November 30, 2020, 7:40am

Others can answer questions about DiffEqOperators better than me, but the goto-method for differential operators on a uniform grid with periodic boundary conditions is FFT, i.e. transform into the spectral domain where the derivative operators are all diagonal (regardless of order), and then transform back. See e.g. these notes by @stevengj.

Of course, the complexity of an FFT-based approach is \mathcal{O}(N\log N) if I remember correctly, whereas a finite-difference operator represented as a banded matrix (or lazily, as you do) should be roughly \mathcal{O}(N), at least for moderate orders of convergence.

rveltz · November 30, 2020, 8:21am

You have many alternatives as well:

Gridap
ApproxFun

On a side note, I will be releasing soon a tutorial for BifurcationKit.jl on the 1d Langmuir equation (no periodic BC).

yhchang · November 30, 2020, 8:24am

Thanks for your reply. I have tried using FFT with my Python codes though I don’t see much performance gain overall. Nevertheless my focus right now is learning how to implement codes properly in the Julia way instead of choosing the most algorithm or solver for the specific problem.

rveltz · November 30, 2020, 8:28am

my focus right now is learning how to implement codes properly in the Julia way

It depends a lot on the specificities of the problem if you want to make it fast.

You can use sparse arrays (obviously), BandedArrays (very good for 1d), FFT for matrix-free methods in Nd, collocation,…

yhchang · November 30, 2020, 4:40pm

I understand this, but those considerations are not Julia specific (I need to think about that for my Python implementations as well). I’m referring to those Julia-specific issues which don’t normally appear in Python. For example, in my ndiff4, I have that operator of CenteredDifference(n, order, dx, Nx)*periodic_bc inside the function. This leads to very simple and easy-to-understand code, but it is the closest among all four versions. In ndiff3, I pre-define that operator outside the function as a const global variable (I know the use of global variables are discouraged). That leads to a 10x speedup, but the code is must less general and I have to define that every time in my calculations once I have a different length(u). In the same computing session, I will likely have the same length(u) and order, so I really just need to pre-calculate that differential operator once when those two are known at the start of each session. So how do I achieve similar code reusability as ndiff4 while maintaining the speed performance of ndiff3?

And for the naive and labour-intensive implementation of ndiff1 and ndiff2, are there any obvious mistakes? I just know I need to avoid any red highlights which suggest type instability if I run @code_warntype ndiff1(h,n,order,dx). Any tips on how I can look for potential issues?

stillyslalom · November 30, 2020, 5:28pm

Global variables are only bad when they’re not passed as arguments to the function.
This is bad:

dx = 0.1
f(L, u) = 1/dx * L * u

f(D1_04, u)

This is good:

dx = 0.1
f(L, u, dx) = 1/dx * L * u

f(D1_O4, u, dx)

Just pass the differential operator as an argument to your function, or apply it directly:

julia> using BenchmarkTools

julia> @btime ndiff3($u, 1, 4)
  955.556 ns (1 allocation: 2.12 KiB)

julia> @btime $D1_O4 * $u
  913.636 ns (1 allocation: 2.12 KiB)

stevengj · November 30, 2020, 5:35pm

Don’t allocate a new dudx array on every timestep of your heat equation; pre-allocate it and overwrite it in-place on every timestep. (If you have a heat equation, I’m not sure why you are computing dudx at all; don’t you want to compute ∂²u/∂x² in a single step? Do you have an advection term?)

(Your circshift approach is even worse because it allocates multiple new arrays just to get du.)

There are plenty of micro-optimizations you could try as well, of course, e.g. combining 0.5/dx into a precomputed multiplicative factor, trying @simd or @avx, etcetera, but I would think carefully about the overall structure of what you are doing first.

Matrix-free approaches (i.e. direct loops for implementing the stencil), can be more efficient that methods which allocate a sparse matrix. However, if you want to use implicit timestepping here, you’ll probably want to store your stencil in a banded matrix (tridiagonal for order 2) and save the LU or Cholesky factorization.

stevengj · November 30, 2020, 5:44pm

If you type dump(D1_O2) you’ll see that it is actually quite a lightweight object — it doesn’t store a big sparse matrix or anything like that, just a handful of numbers. So it should be cheap to recompute as needed.

That being said, there is an intermediate between declaring a bunch of global variables and reconstructing objects on each call. You can define a data structure that holds all of the information you need for your simulation and pass that around. The Parameters.jl package may be helpful here.

ChrisRackauckas · November 30, 2020, 9:09pm

Here’s an example showing how to optimize a finite difference code:

https://tutorials.sciml.ai/html/introduction/03-optimizing_diffeq_code.html

And here’s an example showing how to use GPUs in a finite difference code:

You can also use DiffEqOperators to generate the operators:

If you’re lazy, you can write bad code and let ModelingToolkit.jl do some code analysis and code generation to fix it. Here’s a tutorial where it does automated parallelism on a finite difference code and builds a symbolic Jacobian as well:

https://mtk.sciml.ai/dev/tutorials/auto_parallel/

And following what Steven said, here’s a tutorial that shows how to tell the implicit ODE solver what your sparsity pattern is:

https://diffeq.sciml.ai/stable/tutorials/advanced_ode_example/#Automatic-Sparsity-Detection

Together those tutorials should be more than enough to piece together a solution.

Topic		Replies	Views
Solving the diffusion equation in Julia Performance	23	5330	December 22, 2022
Optimization of differentiation using Fourier transforms General Usage	8	286	December 5, 2023
2d finite differences implementation for a Numerical Analysis lecture Performance numerics , finitediff	4	2240	June 23, 2022
Generating finite-difference stencils Numerics differentiation	19	3472	July 23, 2024
(seems) a more efficient way to use diffEqOperator.jl Modelling & Simulations	5	517	July 2, 2021

Implementing finite difference with periodic B.C

Related topics