Solving the diffusion equation in Julia

I’m learning how to solve PDEs numerically.
The code I wrote in Julia 1.2.0 is as follows:

using PyCall
so = pyimport("scipy.optimize")

using PyPlot

I = im

h_s = 0.1
h_t = 0.02
T = 100.0
L_x = 30.0
L_y = 30.0

ddx(u) = (circshift(u, (1,0)) - circshift(u, (-1,0)))/(2*h_s)
ddy(u) = (circshift(u, (0,1)) - circshift(u, (0,-1)))/(2*h_s)

curl(v_x, v_y) = ddx(v_y) - ddy(v_x)
div(v_x, v_y) = ddx(v_x) + ddy(v_y)
laplacian(u) = (circshift(u, (1,0)) + circshift(u, (-1,0)) + circshift(u, (0, 1)) + circshift(u, (0,-1)) - 4*u)/(h_s^2)

# Define the mesh
mesh_x = 0:h_s:L_x
mesh_y = 0:h_s:L_y
mesh_T = 0:h_t:T

# Indices for start and end points of the domain
x_i = 1
x_f = length(mesh_x)
y_i = 1
y_f = length(mesh_x)

# Declare arrays storing the result
gridSize = (length(mesh_x), length(mesh_y))
u = [zeros(gridSize) for t=mesh_T]

# Initial condition
u[1] = [exp(-((x-2)^2+(y-2)^2)) for x=mesh_x, y=mesh_y]

function BC(U)
    U[x_i,:] .= 0.0
    U[x_f,:] .= 0.0
    U[:,y_i] .= 0.0
    U[:,y_f] .= 0.0
    return U
end

function F(U)
    U = reshape(U, (x_f, y_f))
    return vec(BC(laplacian(U)))
end

function Crank_Nicolson(U)
    U = vec(BC(U))
    function f(u_next)
        return u_next - U - h_t/2*(F(U)+F(u_next))
    end
    sol = so.root(f, U, method="krylov")
    return reshape(sol["x"], (x_f, y_f))
end

for t in 1:length(mesh_T)-1
    u[t+1] = Crank_Nicolson(u[t])
end
        
# imshow(u[end])
# show()
        

I have a python version of this code.
The time of running for julia is about 6m30s, and 4m40 for python code.
I noticed there is a “warmup” period for julia and it’s about 20s.
Even considering this, it seems python is still faster than julia.

The most time-consuming part of this program is solving the time evolution.
Crank-Nicolson is an implicit method so there is an nonlinear equation to be solved.
For this part, I use the scipy.optimize.root function for both julia and python version.
Thus I would expect both versions should spend similar time,
but the result is julia spent more than 25% time than python.

Does anyone have ideas on this?
Thank you.

Hi there - lots of performance pitfalls! Too much in the global scope. It should all become obvious if you read through the performance tips:

https://docs.julialang.org/en/v1/manual/performance-tips/index.html

FWIW: it is very easy to write slow code in Julia and not quite as easy as one would like to write fast code. It takes some getting used to…

2 Likes

Maybe the most important thing : make all your global variables const .

There’s a lot of unnecessary memory allocation happening here–laplacian(u) will, for example, allocate five new domain-sized arrays that are then immediately discarded. Try to rewrite your inner loops so that they don’t allocate memory. You should also be able to cast your F(U) as a linear operator and solve it with A\b.

For debugging purposes, I’d recommend shrinking the domain and the number of timesteps considerably to allow for quicker iteration.

2 Likes

You are writing numpy code in Julia. That’s OK, you will get roughly numpy performance.

If you want good performance, you need to change your coding style. Don’t be afraid of loops, avoid temporary allocations, and mind the column major memory layout.

2 Likes

With your settings (300x300 mesh, 500 time steps), FinEtools takes 31 seconds.

julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
  JULIA_EDITOR = "C:\Users\PK\AppData\Local\atom\app-1.40.1\atom.exe"  -a
  JULIA_NUM_THREADS = 2
1 Like

Thanks for all the replies.
I am used to numpy, and indeed the code is a direction translation from numpy.
One thing I’m confused is about the arrays in julia.
Should I use more loops or array-wise operations?
Since I’m from python, I try to avoid loop as much as possible.
For example, I would use circshift to implement the laplacian rather than loops.
Is this bad?

Thank you.

@PetrKryslUCSD
Thank you.
I considered FEM before, but it seems a pretty advanced method and I cannot even understand the basic principle.
Is FinEtools an easy tool to solve PDEs?
The performance looks amazing.

Numpy code is written in ‘vectorized’ style to amortize the overhead of the Python interpreter, but that’s not an issue for Julia (loops are fast!). You can also use loop fusion to express vectorized operations if it improves clarity.

It depends, you can often use dot-vectorized operations and get great performance, especially when you completely avoid unnecessary allocations.

But in cases like ddx, ddy, laplacian, curl, div, etc. it is indeed bad, performance-wise, because you get lots of unnecessary temporary arrays. You would be better off writing loops, and maybe even doing them in-place.

My example Julia code for time-integration of a 1d nonlinear PDE (Kuramaoto-Sivashinksy) illustrates how to do this kind of calculation more naturally in Julia. The problem and algorithm are different: 1d instead of 2d, and spectral in space rather than finite differences. But I do show in the “in-place” and “unrolled” versions of the code that eliminating allocation of temporaries and unrolling vector operations into for loops gets performance within a few percent of the Fortran equivalent.

@stillyslalom @DNF
I just tested the performance of different ways to implement the laplacian.
Loop is the fastest. View is kinds of the same but uses more memory.
Thank you so much.

@John_Gibson
Thank you. That is very useful information.

1 Like

All of the things I’ve written are for nonlinear PDEs, but it should be helpful. This has a discussion on reaction-diffusion PDEs which shows how devectorizing can really help, along with explaining non-allocating coding styles.

http://tutorials.juliadiffeq.org/html/introduction/03-optimizing_diffeq_code.html

And if you want things to auto-GPU, here’s a formulation of the Brusselator reaction-diffusion PDE discretization in a way that is amenable to auto-GPUification with tools like GPUifyLoops.jl

Thank you. My ultimate goal is also a nonlinear problem. This is very helpful.