Solving the diffusion equation in Julia

I’m learning how to solve PDEs numerically.
The code I wrote in Julia 1.2.0 is as follows:

using PyCall
so = pyimport("scipy.optimize")

using PyPlot

I = im

h_s = 0.1
h_t = 0.02
T = 100.0
L_x = 30.0
L_y = 30.0

ddx(u) = (circshift(u, (1,0)) - circshift(u, (-1,0)))/(2*h_s)
ddy(u) = (circshift(u, (0,1)) - circshift(u, (0,-1)))/(2*h_s)

curl(v_x, v_y) = ddx(v_y) - ddy(v_x)
div(v_x, v_y) = ddx(v_x) + ddy(v_y)
laplacian(u) = (circshift(u, (1,0)) + circshift(u, (-1,0)) + circshift(u, (0, 1)) + circshift(u, (0,-1)) - 4*u)/(h_s^2)

# Define the mesh
mesh_x = 0:h_s:L_x
mesh_y = 0:h_s:L_y
mesh_T = 0:h_t:T

# Indices for start and end points of the domain
x_i = 1
x_f = length(mesh_x)
y_i = 1
y_f = length(mesh_x)

# Declare arrays storing the result
gridSize = (length(mesh_x), length(mesh_y))
u = [zeros(gridSize) for t=mesh_T]

# Initial condition
u[1] = [exp(-((x-2)^2+(y-2)^2)) for x=mesh_x, y=mesh_y]

function BC(U)
    U[x_i,:] .= 0.0
    U[x_f,:] .= 0.0
    U[:,y_i] .= 0.0
    U[:,y_f] .= 0.0
    return U
end

function F(U)
    U = reshape(U, (x_f, y_f))
    return vec(BC(laplacian(U)))
end

function Crank_Nicolson(U)
    U = vec(BC(U))
    function f(u_next)
        return u_next - U - h_t/2*(F(U)+F(u_next))
    end
    sol = so.root(f, U, method="krylov")
    return reshape(sol["x"], (x_f, y_f))
end

for t in 1:length(mesh_T)-1
    u[t+1] = Crank_Nicolson(u[t])
end
        
# imshow(u[end])
# show()
        

I have a python version of this code.
The time of running for julia is about 6m30s, and 4m40 for python code.
I noticed there is a “warmup” period for julia and it’s about 20s.
Even considering this, it seems python is still faster than julia.

The most time-consuming part of this program is solving the time evolution.
Crank-Nicolson is an implicit method so there is an nonlinear equation to be solved.
For this part, I use the scipy.optimize.root function for both julia and python version.
Thus I would expect both versions should spend similar time,
but the result is julia spent more than 25% time than python.

Does anyone have ideas on this?
Thank you.

Hi there - lots of performance pitfalls! Too much in the global scope. It should all become obvious if you read through the performance tips:

https://docs.julialang.org/en/v1/manual/performance-tips/index.html

FWIW: it is very easy to write slow code in Julia and not quite as easy as one would like to write fast code. It takes some getting used to…

2 Likes

Maybe the most important thing : make all your global variables const .

There’s a lot of unnecessary memory allocation happening here–laplacian(u) will, for example, allocate five new domain-sized arrays that are then immediately discarded. Try to rewrite your inner loops so that they don’t allocate memory. You should also be able to cast your F(U) as a linear operator and solve it with A\b.

For debugging purposes, I’d recommend shrinking the domain and the number of timesteps considerably to allow for quicker iteration.

2 Likes

You are writing numpy code in Julia. That’s OK, you will get roughly numpy performance.

If you want good performance, you need to change your coding style. Don’t be afraid of loops, avoid temporary allocations, and mind the column major memory layout.

2 Likes

With your settings (300x300 mesh, 500 time steps), FinEtools takes 31 seconds.

julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
  JULIA_EDITOR = "C:\Users\PK\AppData\Local\atom\app-1.40.1\atom.exe"  -a
  JULIA_NUM_THREADS = 2
1 Like

Thanks for all the replies.
I am used to numpy, and indeed the code is a direction translation from numpy.
One thing I’m confused is about the arrays in julia.
Should I use more loops or array-wise operations?
Since I’m from python, I try to avoid loop as much as possible.
For example, I would use circshift to implement the laplacian rather than loops.
Is this bad?

Thank you.

@PetrKryslUCSD
Thank you.
I considered FEM before, but it seems a pretty advanced method and I cannot even understand the basic principle.
Is FinEtools an easy tool to solve PDEs?
The performance looks amazing.

Numpy code is written in ‘vectorized’ style to amortize the overhead of the Python interpreter, but that’s not an issue for Julia (loops are fast!). You can also use loop fusion to express vectorized operations if it improves clarity.

It depends, you can often use dot-vectorized operations and get great performance, especially when you completely avoid unnecessary allocations.

But in cases like ddx, ddy, laplacian, curl, div, etc. it is indeed bad, performance-wise, because you get lots of unnecessary temporary arrays. You would be better off writing loops, and maybe even doing them in-place.

My example Julia code for time-integration of a 1d nonlinear PDE (Kuramaoto-Sivashinksy) illustrates how to do this kind of calculation more naturally in Julia. The problem and algorithm are different: 1d instead of 2d, and spectral in space rather than finite differences. But I do show in the “in-place” and “unrolled” versions of the code that eliminating allocation of temporaries and unrolling vector operations into for loops gets performance within a few percent of the Fortran equivalent.

@stillyslalom @DNF
I just tested the performance of different ways to implement the laplacian.
Loop is the fastest. View is kinds of the same but uses more memory.
Thank you so much.

@John_Gibson
Thank you. That is very useful information.

1 Like

All of the things I’ve written are for nonlinear PDEs, but it should be helpful. This has a discussion on reaction-diffusion PDEs which shows how devectorizing can really help, along with explaining non-allocating coding styles.

http://tutorials.juliadiffeq.org/html/introduction/03-optimizing_diffeq_code.html

And if you want things to auto-GPU, here’s a formulation of the Brusselator reaction-diffusion PDE discretization in a way that is amenable to auto-GPUification with tools like GPUifyLoops.jl

https://github.com/JuliaDiffEq/DiffEqProblemLibrary.jl/blob/master/src/ode/brusselator_prob.jl

2 Likes

Thank you. My ultimate goal is also a nonlinear problem. This is very helpful.

Hi Chris - very much admire your intellect. You’ve posted in several places examples of Julia + reaction-diffusion PDEs. Some links are broken (including the one on brusselator here), some articles are outdated given the rapid developments in Julia.

What in your opinion (taking this answer as a hint) is the simplest way to define and solve a reaction diffusion equation that is very large, stiff and solved on the GPU? You hinted here (Solving PDEs in Julia - Nextjournal) at the end of the article that things were headed in a particular direction, but I’ve been unable to find an example of this.

If you are interested in solving PDEs, check out MethodOfLines.jl. Possibly the simplest way to define and solve a PDE in Julia if you just want an answer fast.

It currently suffers from high compile time limitations for big problems and is GPU incompatible, but work is ongoing to rectify this when 1.0 comes out.

Remind me to answer in a few days. I am trying to get the new SciML docs showcase launched, and that’s where these things should link, and I’ll have a much more comprehensive answer. I’ve been holding off until this release though since it’s so close.

1 Like

Will do!

I think if I could broaden the question:

I am your ‘target audience’, that is, those modeling life processes with differential equations, but only amateur (in the sense of amateur radio, perhaps) at discretization and parallelization.

Specifically I work in cardiac electrophysiology and am interested in reaction diffusion systems for cardiac wave propagation. This as you know well is no different than any of the other treatments you’ve posted (I think I’ve spent the last year reading just about everything you’ve posted), except that the PDEs are generally modeled as ODEs, and these ODE functions are prone to generating stiffness and are highly branching, messy things that don’t fit any of the examples for optimization/parallelism I’ve found posted.

It seems the modeling toolkit is the way to go. When you are done with your awesome work, I’d be interested in knowing, and hopefully the general community would as well, as this probably represents one of the more complex cases, on what best practice is for solving this — JuliasHeart/Tor_Ord_Land.jl at main · jtgreen/JuliasHeart · GitHub — in 2d at least, GPU parallelized, using a five point stencil or something similar, recognizing that the work done previously has already broken the PDE down into an ODE solution.

I’ll ping you back later as requested.

I’m proselytizing Julia heavily for what it’s worth!

Cheers,
JT

Can you post a link to the definition of this PDE system? I’d like to make sure MethodOfLines.jl supports this.