How to tackle 2D reaction-diffusion equation?

Can anyone point me in the right direction of how to solve a 2D reaction-diffusion equation of the following kind:

p-d*c=-D * (d^2/dx^2 + d^2/dy^2) * c,

where p is the production rate, d the degradation rate, c the concentration and D the diffusivity. I have tables for both d(x,y) and p(x,y) where I have the parameter values per x,y pair stored. (Later I’d also vary D, thus potentially also D(x,y))

I have tried the simplest approach of using a finite difference solver, but I’ve struggled so far to employ the currently available packages.

Grateful for any help.

1 Like

See here a diffusion example:

You can add reaction to it. It is using the package ParallelStencil.jl which empowers domain scientists to write architecture-agnostic high-level code for parallel high-performance stencil computations on GPUs and CPUs.

In addition, you might want to use implicit time stepping and convergence accelerators. Here you find a presentation of a methodology using the pseudo-transient method and ParallelStencil.jl (and ImplicitGlobalGrid.jl).


In the diffusion example, code on how to visualize the result was missing, but I found it here.

There is a file in the examples including visualization for the 50-lines-multi-xpu-example. You can extract from there what you need to visualize the single-XPU example.

I added a link to it in the documentation: #9 . Thanks for pointing out that this link was missing.

1 Like

I’ve rewritten the example code to 2D, but I found that init_global_grid requires 3D, thus I’ve been setting nz=0. Is there a wiser way to do this?

That example is a highly stiff 2D reaction-diffusion equation solved in about 1 second using a mixture of automated sparsity detection, high order adaptive implicit methods, and just a bunch of optimizations in DifferentialEquations.jl. In 2021 you might want to replace the by-hand discretization with ParallelStencil.jl (or use DiffEqOperators.jl which I hope will soon use ParallelStencil).


Also if you’re new to Julia, the following has a tutorial on optimizing differential equation code, where the big example is a 2D reaction-diffusion. It uses different tricks (no sparsity tricks shown here), and focuses more on the code optimization.

This blog post focuses on GPUs:

And the MTK automatic optimization and parallelism tools all showcase 2D reaction diffusion.


You don’t need to define nz explicitly, but yes, pass 1 as third argument: init_global_grid(nx, ny, 1)

We have found over the years that considering 2-D (and 1-D) problems as 3-D problems with dimension z (and y) of size 1 (1 cell) as a very practical pragmatic approach for both library development and user code development. It makes it often simpler for us to create a library/pre-/post-processing code that works for 1-D to 3-D problems and leads to shorter code (less cases to consider).


You have many ways: ApproxFun.jl, Gridap.jl, look at JuAFEM also, to name a few.

You can also look at SurveyofPDEPackages

Shameless plug: for bifurcation analysis of 2d reaction diffusion, you can. look at BifurcationKit which relies on the aforementioned packages (Chris, Samo,…)


Since we are in 2021, is there any example available of a 2D diffusion equation using the solvers for stiff ODEs and PDEs in DifferentialEquations.jl combined with ParallelStencil.jl?

Thanks in advance!

ParallelStencil seems to have not been as good of an idea as splitting by the equations in a fully symbolic analysis.

shows the state of it all, and you can tell it to generate parallelized sparse Jacobians to use with the stiff ODE solvers. This area is getting the most churn in the ecosystem right now though so I’d say, wait for the summer to be done unless you want to be a guinea pig.

1 Like

Hmm interesting. I’m still playing with a toy model, so I’ll definitely wait until summer for a proper implementation. Do you have any material which discusses the approach/advantages of the MOL vs ParallelStencil.jl? I’d like to educate myself in the topic.

Thanks a lot!

The problem is the memory handling. If you know about stencil compilers and BLAS things, you know that they make a lot of deep assumptions about working on “easy” linear operators. So u' = Au + f(u) would have to only do the stencil compiler on Au and then loop down f(u). It turns out that just fusing those two operations can be more beneficial, and because of the cost of f there’s no clear impact on the cache walking structure because it can catch it in time if there’s any non-trivial operation. So it seems like it would be better to fuse the nonlinearity in, in which case a stencil compiler doesn’t make much sense for the nonlinear PDE case. It makes a ton of sense still with linear PDE though.

1 Like

OK, so after doing some reading, if I got it right: ParallelStencil.jl uses a 100% numerical approach with no tricks regarding sparsity and no previous matrix assumptions. Basically, it is a very generic approach whose strong point is heavy parallelization based on xCPUs. On the other hand, the new stuff from DiffEqOperators.jl based on a symbolic MOL, uses symbolic specialization with equation-level knowlege in order to largely reduce the number of operations to be performed, with a trade-off in having a larger compiled code compared to a generic approach.

I guess the symbolic MOL can still take advantage of heavy parallelization based on xCPUs? How much faster do you expect it to be compared to ParallelStencil.jl?

No. Stencil compilers definitely make use of sparsity. What they are really doing is specializing on iteration to take into account precaching behavior. If you’re not familiar with BLAS internals, see things like:

So it matters a lot in matrix multiplication. Stencil compilers can do similar tricks:

However, notice that all of this is on simple trivial linear operations: if you are only doing * and +, and you’re only doing a single one of them, then this can matter a lot. So this shows up in linear PDEs because they boil down to A*x for some stencil A.

But when you have a nonlinear equation you have A*x .+ f.(x), things change. If you have a single exp or something in f then f will already dominate the cost and so you’re wasting your time optimizing the * and + part. Remember, even floating point division is costly in comparison, so if / you’re already in the zone where optimizing the stencil part isn’t your biggest worry. For reaction-diffusion, this means any case with Michaelis-Mentons or Hill functions.

So assume f is “trivial enough” that you still want to optimize the stencil part all of the way (f(x) = -x^2 for example). There are two ways to approach this:

  • Stencil compilers are fast, so calculate y=A*x then do y .+ f.(x)
  • Write it out in dot products and go linearly down the x

In the case of one semilinear PDE it really comes down to implementation as to which will be faster. But now what about when you have systems of semilinear PDEs, like in reaction-diffusion? Now you have something like: A*x + f.(x[:,:,1],x[:,:,2]) as your standard mathematical operation. You need two very different points of the array at all times in every line, so some of the assumptions of the cache oblivious algorithms are now broken: if you have a system of n dependent variables, the precache each of them could have is now 1/n the size of what you had before, and very quickly the loop size for L1 or L2 cache start to drop off. So when you have big enough systems of PDEs, or if f is costly enough in any of them, the stencil compiler is not the big savior you’d hope it to be.

So then at the end of the day, stencil compilers are amazing tools, but in so many of these different edge cases the standard case that they optimize becomes rounding error, and sometimes just fusing f into the other part of the loop just becomes a better idea.

We show an example in the tutorials of devectorizing for manual fusing:

We’re playing around with ideas for super SLP-vectorization and using reinforcement learning to reorder equations to better schedules on the fused code, and also parallelizing on the fused code (like in Automated Sparse Parallelism of Julia Functions via Tracing · Symbolics.jl), but this is all requiring the symbolic analysis of Symbolics.jl. So this is what leads to the direction to then perform the lowering in this way.

There’s still lots of research to do here, and some of this is based off of quick calculations that probably should get benchmarked in more detail, but hopefully that gives the full picture as to why months later we’ve switched away worrying about stencil compilers to worrying about symbolic computing.


Thanks a lot for taking the time to explain this is such detail, Chris! OK, so if I got this correctly, what I originally thought was the difference between ParallelStencil.jl and symbolic MOL only applies to the nonlinear f(x) part. The A*x part is already very well optimized with stencil compilers.

I totally see the sense in going for such a solution in DifferentialEquations.jl, since it will generally perform better for both simple and complex cases. However, more particularly in my case I’m interested in solving a nonlinear 2D diffusion equation (without reaction), which is probably a rather simple problem compared to the examples you are referring to. That sort of problem happens to be the one used in the ParellelSencil.jl tutorial (i.e. glacier ice flow), perhaps hinting at why ParallelStencil.jl might perform well for such a problem. What I’m wondering now is if for such a problem it would be worth going (waiting) for a symbolic MOL in DiffEqOperators.jl or using ParallelStencil.jl would be almost as good. Or would the nonlinear diffusion term (with x^n and / operations) be already way too expensive for a stencil compiler? Just as some extra context: the end goal of this would be turn this problem into an UDE with neural networks (in case one of the two solutions works better with DiffEqFlux.jl).

Yes, at that point the computation is too much so the stencil compiler doesn’t add much.

Yeah… performance testing in this domain is what led me down this path.

Let’s clarify what ParallelStencil is and what not, to lift any confusion related to previous messages.

  • provides an abstraction layer empowering domain scientists to write architecture-agnostic stencil-based computations (finite-differences, finite-volumes).
  • targets matrix-free implementation and iterative solving approaches to take advantage of parallel hardware such as GPUs and many-core CPUs in order to solve complex nonlinear real-world multi-physics problems.
  • permits to achieve close to hardware limit performance in memory-bounded applications.
  • combined to ImplicitGlobalGrid.jl scales on thousands of GPUs with a parallel efficiency of over 95%.

ParallelStencil is not:

  • a black-box PDE solver.
  • an interface for symbolic calculations (but does not exclude any workflow involving it).
  • related to BLAS or any matrix-based A*x + b “direct” solvers.
  • designed in view of solving linear problems or toy problems even though it enables it naturally as well.

In conclusion, ParallelStencil does not constitute a stand-alone PDE-solving blackbox, but should rather be considered as tool enabling to write high-performance building blocks to solve a wide range of systems of PDEs with focus on large non-linear and coupled multi-physics systems. A successful and performant approach for solving such systems as well as a real-world use case were presented at JuliaCon 2020. The approach consists in using Parallelstencil (and ImplicitGlobalGrid) in combination with the pseudo-transient method and convergence accelerators like numerical damping. Moreover, the former JuliaCon presentation discusses the performance of a non-linear shallow ice application (containing exactly the case H^n discussed in this thread): it is shown that a performance representing 72% of the theoretical upper bound is achieved, demonstrating the high performance that can be achieved with ParallelStencil when solving nonlinear problems.


Thanks Ludovic for taking the time to clarify all this. This is a nice complement to our discussion from last week.

I will try both methods and see how I feel about each one. I’m curious to see if it would be possible to create Universal Differential Equations using this approach in ParallelStencil.jl without the DiffEqFlux.jl functions.

BTW @Elrod demonstrated what I was saying against the LoopVectorization.jl stencil compilation tooling:

and that stencil compiler is fairly well optimized: Image Filtering · LoopVectorization.jl . But again, it’s pretty clear why the theoretical assumptions of stencil compilers breaks down in this case and why alternative forms of loop fusion would make more sense once you end up in the nonlinear case, so it’s really just benchmarks to showcase the clear theory above. It also makes sense from a parallel and GPU codegen sense because it would reduce the total number of kernel calls.

But anyways, I don’t quite know why that would cause controversy :sweat_smile:, it’s just a theory-backed repeatable measurement. DiffEqOperators.jl is a black-box PDE solver which uses a symbolic compiler to generate fast lowering to ODE and nonlinear solvers, and it doesn’t lower to using stencil compilers in a split convolution form because that does not benchmark as well as other forms of loop fusion on nonlinear PDEs. Using a stencil compiler is probably the easiest way to build a PDE discretization by hand, but there are good reasons to see why that can be suboptimal when nonlinearities are involved (in the fully linear case, that will be the optimal thing to do though!).

1 Like