How to solve implicitly a 2D Heat diffusion equation / ParallelStencil / DifferentialEquation /

Eurel · May 5, 2022, 1:25pm

Context

I am no simulation nor numerical expert and am absolutely not familiar with differential Equations.
So I am trying to simulate heat diffusion in a 2D non uniform material on a Cartesian rectilinear mesh (varying Δy and Δx) with the given properties for each cell at l, m coordinates

Ω : surface
ρ : volumic mass
cp : thermal capcity
λ : thermal conductivity

I intend to run it on really really large domains, with distributed memory multiple CPUs and possibly multiple GPUs.
Thus i choose ParallelStencil.jl coupled with ImplicitGlobalGrid.jl .

The Right hand side (laplacian?) of the equation is the following

Explicit

I managed to implement easily a fairly efficient explicit version using with the following kernel

@parallel_indices (ix,iy) function step_explicit_2D_indices!(
H :: AbstractArray, # Temperature at time n
H2 :: AbstractArray, # Temperature at time n+1
λ ,
Δx ,
Δy ,
_ρcpΩ ,
Δt ,
_avΔx,
_avΔy)
if (ix>1 && ix<size(H,1) && iy>1 && iy<size(H,2))
    H2[ix,iy] =  H[ix,iy] +

        # Diffusivity
        (Δt * _ρcpΩ[ix,iy]) *

        #Right Hand Side/laplacian
        # X flux
        ((@avHarm_R(λ) * @d_R(H, ix, iy)    * _avΔx[ix]) + 
            (@avHarm_L(λ) * @d_L(H, ix, iy) * _avΔx[ix-1])* Δy[iy-1] + 

        # Y Flux
        ((@avHarm_T(λ) * @d_T(H, ix, iy)    * _avΔy[iy]) + 
            (@avHarm_B(λ) * @d_B(H, ix, iy) * _avΔy[iy-1])* Δx[ix-1]));
end
return nothing
end

with

@avHarm<S> Harmonic average of λ on a given Side
@d_<S>     Heat difference on a given side
_avΔx      Arithmetic average of Deltas x/y

Implicit

However I’d want it to be unconditionally stable given the time step Δt, so i have to solve it Implicitly.

Direct Method

I think (i might be wrong) that direct linear solver (Tn+1 = A \ Tn ) will not work well at all for large domains, I tried this approach, constructing a sparse five banded laplacian matrix (CSC format) ( nx* ny by nx*ny), the backslash Operator solving the linear system, utilizes a single thread.

Is it possible to solve linear systems in parallel?

Itterative

I did find an example of Implicit iterative solver on the following Github repo of ParallelStencil workshop at Juliacon 2021
But I don’t have the knowledge to adapt it to my situation and fail to gasp the concept of iterative solving.

How could i iteratively solve the heat equation with ParallelStencil given my model?

There is also DifferentialEquations.jl but i fail to understand how to discretize my equation with it and it appears complicated to me. I am not sure if it can do distributed computing.
Could DifferentialEquation.jl be used in my case?

Thanks for any help

luraess · May 6, 2022, 10:23am

Hi @Eurel, welcome and thanks for posting your question. If you are interested in running simulations on large domains using multi-CPUs/GPUs, ParallelStencil and ImplicitGlobalGrid may be very appropriate.

How could i iteratively solve the heat equation with ParallelStencil given my model?

I prepared 3 codes solving the equation you are interested in using an explicit and implicit approach. You can find the codes in this GitHub PTdiffusion_mwe repository. The repo contains 3 codes:

diffusion_2D_expl_xpu.jl the explicit solver
diffusion_2D_impl_xpu.jl the implicit pseudo-transient solver
diffusion_2D_expl_mxpu.jl the explicit multi-CPU/GPU solver

Some additional info is in the repo’s README, including further references to a paper on the pseudo-transient method and a Julia GPU course (https://pde-on-gpu.vaw.ethz.ch) which may be a valuable source of information.

ChrisRackauckas · May 6, 2022, 12:27pm

Yes, and there’s an issue which shows how to do it:

github.com/omlins/ParallelStencil.jl

Handling boundaries

opened 07:35AM - 22 Jun 21 UTC

closed 09:04AM - 22 Jun 21 UTC

ChrisRackauckas

For fun I was putting together some method of lines demos. Here's ParallelStenci…l.jl used with Runge-Kutta Chebyshev methods: ```julia const USE_GPU = true using ParallelStencil, OrdinaryDiffEq using ParallelStencil.FiniteDifferences3D @static if USE_GPU @init_parallel_stencil(CUDA, Float64, 3); else @init_parallel_stencil(Threads, Float64, 3); end @parallel function diffusion3D_step!(T2, T, Ci, lam, dx, dy, dz) @inn(T2) = lam*@inn(Ci)*(@d2_xi(T)/dx^2 + @d2_yi(T)/dy^2 + @d2_zi(T)/dz^2); return end function diffusion3D(alg) # Physics lam = 1.0; # Thermal conductivity cp_min = 1.0; # Minimal heat capacity lx, ly, lz = 10.0, 10.0, 10.0; # Length of domain in dimensions x, y and z. # Numerics nx, ny, nz = 256, 256, 256; # Number of gridpoints dimensions x, y and z. nt = 100; # Number of time steps dx = lx/(nx-1); # Space step in x-dimension dy = ly/(ny-1); # Space step in y-dimension dz = lz/(nz-1); # Space step in z-dimension # Array initializations T = @zeros(nx, ny, nz); T2 = @zeros(nx, ny, nz); Ci = @zeros(nx, ny, nz); # Initial conditions (heat capacity and temperature with two Gaussian anomalies each) Ci .= 1.0./( cp_min .+ Data.Array([5*exp(-(((ix-1)*dx-lx/1.5))^2-(((iy-1)*dy-ly/2))^2-(((iz-1)*dz-lz/1.5))^2) + 5*exp(-(((ix-1)*dx-lx/3.0))^2-(((iy-1)*dy-ly/2))^2-(((iz-1)*dz-lz/1.5))^2) for ix=1:size(T,1), iy=1:size(T,2), iz=1:size(T,3)]) ) T .= Data.Array([100*exp(-(((ix-1)*dx-lx/2)/2)^2-(((iy-1)*dy-ly/2)/2)^2-(((iz-1)*dz-lz/3.0)/2)^2) + 50*exp(-(((ix-1)*dx-lx/2)/2)^2-(((iy-1)*dy-ly/2)/2)^2-(((iz-1)*dz-lz/1.5)/2)^2) for ix=1:size(T,1), iy=1:size(T,2), iz=1:size(T,3)]) T2 .= T; # Assign also T2 to get correct boundary conditions. function f(du,u,p,t) @show t @parallel diffusion3D_step!(du, u, Ci, lam, dx, dy, dz); end prob = ODEProblem(f, T, (0.0,1.0)) sol = solve(prob, alg, save_everystep = false, save_start = false) end sol = diffusion3D(ROCK2()) using Plots heatmap(Array(sol[end])) ``` However, I couldn't figure out the API for specifying BCs. Is there a way to specifically mention the boundary stencil or do you have to just write a loop after the interior calculation?

We generally find using a Runge-Kutta Chebyshev method as quite a bit faster than the other techniques in this kind of scenario, so we’d recommend looking at ROCK2 or ROCK4 before looking at something implicit, though things like FBDF (maybe requires FBDF(autodiff=false)) or CVODE_BDF (also look at CVODE_BDF(linear_solver=:GMRES) or CG, depending on positive-definiteness of the Jacobian). These should be much much faster than the handwritten time stepping loops, last time I checked it was >10x.

For more details on optimizing time-step choices for large PDE codes, you may want to check out this tutorial:

https://diffeq.sciml.ai/stable/tutorials/advanced_ode_example/

It should parallelize using UMFPACK, though using MKLPardiso might be a good choice here:

http://linearsolve.sciml.ai/dev/solvers/solvers/#Pardiso.jl

Though most likely that will be outperformed by a good tolerance choice of iLU (or multigrid) mixed with GMRES/CG.

rveltz · May 6, 2022, 1:37pm

for iterative methods, ie for inverting I-dt Delta, you might need a preconditioner

ChrisRackauckas · May 6, 2022, 1:40pm

Yes, and the preconditioners are defined in the tutorial https://diffeq.sciml.ai/stable/tutorials/advanced_ode_example/#Adding-a-Preconditioner. If you can get the sparsity pattern (which is generated automatically in the tutorial using Symbolics.jl), then iLU and algebraic multigrid preconditioners are pretty easy to specify.

Eurel · May 9, 2022, 9:09am

So by using DifferentialEquations, if i got your code example right, I do not have to write the explicit time stepping anymore, I just have to write a kernel exhibiting the evolution of du (T2) over u (T) independently of the Δt ? And then the algorithm I choose will dynamically tune the the time stepping over the solving part ?
It is indeed way faster than iterating by hand !

If I want to solve this implicitly using DifferentialEquations, do i have simply to declare the problem as a SplitODEProblem and use other algorithms listed here ?

ChrisRackauckas · May 9, 2022, 12:14pm

Yes indeed.

No, if you just do something like solve(prob,CVODE_BDF(linear_solver=:GMRES)) then it’s already going to do implicit with Newton-Krylov. You just change the solver, nothing else is required.

SplitODEProblem is only required if you want to be doing IMEX (mixing implicit and explicit time steps).

Eurel · May 13, 2022, 8:43am

I contented myself of copypasting the sparse jacobian declaration and algebraic multigrid the preconditionner here as I thought it would do it automaticaly. I found algebraicmultigrid better than iLU

using ModelingToolkit,DifferentialEquations, Plots, OrdinaryDiffEq
using LinearAlgebra, SparseArrays , Symbolics
using BenchmarkTools
using AlgebraicMultigrid


function algebraicmultigrid2(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
    if newW === nothing || newW
      A = convert(AbstractMatrix,W)
      Pl = AlgebraicMultigrid.aspreconditioner(AlgebraicMultigrid.ruge_stuben(A, presmoother = AlgebraicMultigrid.Jacobi(rand(size(A,1))), postsmoother = AlgebraicMultigrid.Jacobi(rand(size(A,1)))))
    else
      Pl = Plprev
    end
    Pl,nothing
end

using IncompleteLU
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = ilu(convert(AbstractMatrix,W), τ = 0.01)
  else
    Pl = Plprev
  end
  Pl,nothing
end
Base.eltype(::IncompleteLU.ILUFactorization{Tv,Ti}) where {Tv,Ti} = Tv

With the following diffusion function

# Harmonic average
macro Harm(A,B) esc(:(2 .*($A .* $B)./($A .+ $B) ))  end
# Arithmetic average
macro Avg(A,B)  esc(:(($A .+ $B).*0.5)) end
# Difference
macro Dif(A,B)  esc(:($A .- $B)) end

function diffuse!(dT,T,p,t)
        nx, ny, λ, _ρcpΩ, Δx, Δy = p
        _ρcpΩInn = view(_ρcpΩ,2:nx-1,2:ny-1)

        ΔxLeft  = view(Δx,1:nx-2) 
        ΔxRight = view(Δx,3:nx) 
        ΔxInn   = view(Δx,2:nx-1)
        ΔyBot   = view(Δy,1:ny-2)
        ΔyTop   = view(Δy,3:ny)
        ΔyInn   = view(Δy,2:ny-1)

        λLeft   = view(λ,1:nx-2 ,2:ny-1) 
        λRight  = view(λ,3:nx   ,2:ny-1)
        λTop    = view(λ,2:nx-1 ,3:ny)
        λBot    = view(λ,2:nx-1 ,1:ny-2)
        λInn    = view(λ,2:nx-1,2:ny-1)

        TLeft   = view(T,1:nx-2 ,2:ny-1) 
        TRight  = view(T,3:nx   ,2:ny-1)
        TTop    = view(T,2:nx-1 ,3:ny)
        TBot    = view(T,2:nx-1 ,1:ny-2)
        TInn    = view(T,2:nx-1,2:ny-1)
        
        dTInn   = view(dT,2:nx-1,2:ny-1) 

        @. dTInn =  _ρcpΩInn * ( 
            (( (@Harm(λRight,λInn)  * @Dif(TRight,TInn) / @Avg(ΔxRight,ΔxInn)) +
                (@Harm(λInn,λLeft)  * @Dif(TLeft,TInn)  / @Avg(ΔxInn,ΔxLeft))) * ΔyInn )+
            (( (@Harm(λTop,λInn)    * @Dif(TTop,TInn)   / @Avg(ΔyTop,ΔyInn)) +
                (@Harm(λInn,λBot)   * @Dif(TBot,TInn)   / @Avg(ΔyInn,ΔyBot))) * ΔxInn )
            )

end

Runnable script:

# Domain Definition
tmax    = 0.001
tspan   = (0.0,tmax)
d       = 128
nx, ny  = d, d
lx, ly  = 1.0, 1.0
dx, dy  = (lx,ly) ./ (nx,ny)
Δx = fill(dx,nx)
Δy = fill(dy,ny)
λ      = 1 .*ones(nx,ny)
_ρcpΩ  = 1.0 ./( Δx' .* Δy)

# Init Cond
T = zeros(nx,ny)
T  .= Array([100*exp(-(((ix)*dx-lx/2)/2)^2-(((iy)*dy-ly/2)/2)^2) +
                          50*exp(-(((ix)*dx-lx/2)/2)^2-(((iy)*dy-ly/2)/2)^2) for ix=1:size(T,1), iy=1:size(T,2)])


# For Visualisation
Tmin = minimum(T)
Tmax = maximum(T)

# Params
p = [nx, ny, λ, _ρcpΩ, Δx, Δy]

# Sparsity patern of Jacobian
dT = copy(T)
@time jac_sparsity = Symbolics.jacobian_sparsity((du,u)->diffuse!(du,u,p,0.0),dT,T)
f     = ODEFunction(diffuse!;jac_prototype=float.(jac_sparsity))
prob  = ODEProblem(f,T,tspan,p)

# Solver preconditioner
prec = algebraicmultigrid2

# 0.016s  Single threaded
@btime sol = solve(prob,ROCK4(),save_everystep=false)

# 0.138s Single threaded
@btime sol = solve(prob,RK4(),save_everystep=false)

##  autodiff=true

# 0.327s Multithreaded
@btime sol =  solve(prob,FBDF(linsolve=UMFPACKFactorization(),precs=prec,concrete_jac=true),save_everystep=false)
# 2.572s Multithreaded
@btime sol =  solve(prob,FBDF(linsolve=KrylovJL_GMRES(),precs=prec,concrete_jac=true),save_everystep=false)

## autodiff=false

# 0.304s Multithreaded
@btime sol = solve(prob,FBDF(linsolve=UMFPACKFactorization(),autodiff=false,precs=prec,concrete_jac=true),save_everystep=false)
#0.409s Multithreaded
@btime sol = solve(prob,FBDF(linsolve=KrylovJL_GMRES(),autodiff=false,precs=prec,concrete_jac=true),save_everystep=false)

# 0.257s  Multithreaded
@btime sol = solve(prob,TRBDF2(linsolve=UMFPACKFactorization(),precs=prec,concrete_jac=true),save_everystep=false)

and then there is Sundials CVODE_BDF that is blazing fast

using Sundials
#0.010s Singlethreaded
@btime sol = solve(prob,CVODE_BDF(linear_solver=:GMRES),save_everystep=false)

Is there something wrong with my discretisation of the Heat Diffusion equation, with my Jacobian or preconditionners for FBDF, QNDF and TRBDF2 to be that much outperformed?

Why are ROCK4 and RK4 and CVODE_BDF runing on only one thread whereas TRBDF and FBDF are using (some of) the ones I launched Julia with?

ChrisRackauckas · May 13, 2022, 9:38am

What did QNDF and KenCarp47 give? ROCK2? It looks like this is in the “essentially non-stiff” regime, so look at Vern7, Tsit5, and BS3 as well.

It depends on the solver process. Since your f function is not parallelized in this definition, it’s only going to multithread if the solver does it. Things like UMFPACK factorizations will multithread in the factorization, and that’s why you see it.

Eurel · May 13, 2022, 10:00am

QNDF exceeded my patience
here are the other ones

# 0.514s
@btime sol = solve(prob,KenCarp47(linsolve=UMFPACKFactorization(),precs=prec,concrete_jac=true),save_everystep=false)

# 0.034s
@btime sol = solve(prob,ROCK2(),save_everystep=false)

# 0.154s
@btime sol = solve(prob,Vern7(),save_everystep=false)

# 0.105s
@btime sol = solve(prob,Tsit5(),save_everystep=false)

# 0.058s
@btime sol = solve(prob,BS3(),save_everystep=false)

I see i have to redefine the problem as an EsembleProblem to get Multithreading or GPU computing working, gonna check it out

ChrisRackauckas · May 13, 2022, 10:07am

Okay yes, this is essentially non-stiff. Look at VCABM

rveltz · May 13, 2022, 12:13pm

For iterative methods to match sundials, you have to tune the precision of the linear solver. I recently found that the defaults are too “precise”, hence requires a lot of (linear) iterations. Basically, you have to match the settings of GMRES from cvode

Eurel · May 13, 2022, 12:38pm

I have to tune the precision of KrylovJL_GMRES / UMFPACKFactorization / KLUFactorization ?

rveltz · May 13, 2022, 1:24pm

of KrylovJL_GMRES

Eurel · May 13, 2022, 1:38pm

I don’t know how to proceed
Does it takes parameters or how could I retrieve settings of GMRES from cvode ?

rveltz · May 13, 2022, 1:53pm

Here is an example that I use

algMF = Rodas4P(linsolve = KrylovJL_GMRES(rtol = 1e-12, atol = 1e-10, itmax = 20, verbose = 0), concrete_jac = false, precs = precFwave)

By turning versbose = 1, you can monitor the number of iterations. You can also limit itmax and see if solve returns instability. If not, decrease itmax or rtol

Eurel · May 13, 2022, 2:21pm

Is the solver supposed to prompt it when there’s an instability?

ChrisRackauckas · May 13, 2022, 7:25pm

It does. Question then becomes about balance. If you iterate too long, then you are doing useless compute. If you don’t iterate long enough, then a solve is unstable, it causes a step rejection and a smaller time step, and thus more compute. There’s a happy middle, and we haven’t optimized the setups of the Krylov methods to automatically hit that at this point.

Topic		Replies	Views
DifferentialEquations.jl FEM example General Usage question	31	4448	September 4, 2019
The best linear solver for sparse bandedblockbandedmatricies Performance diffeq , linearalgebra , sparse	30	5636	May 24, 2020
Diffusion by using CUDA.jl, DifferentialEquations and DiffEqGPU GPU question , diffeq	6	1250	March 29, 2022
How to tackle 2D reaction-diffusion equation? Numerics	19	4238	June 13, 2021
DifferentialEquations -- what goes wrong? General Usage	14	2428	November 13, 2019