Help me outperform Matlab in numerical solution of semidiscretized PDE system as much as possible

I am new to Julia and I am considering to adopt it in my professional life, where I often have to solve systems of partial differential equations (often mixed with algebraic and ordinary differential equations).

I typically discretize such systems in space manually and solve the semidiscretized system in time with some differential equation solver. So far, I am doing such work exclusively in Matlab and I would like to find out by how much faster (and potentially more elegant & readable) Julia can be for such endeavours.

To approach this question, I designed an example physical system that is similar in nature to what I encounter frequently in my professional life. The system of equations describes (under some assumptions) one dimensional flow of a gas through a porous solid structure, where heat is exchanged from gas to the solid matrix. In the solid matrix, a temperature-dependent chemical reaction is triggered. The governing equations are as follows:

(1) Energy balance of the solid material: (\varrho c)_{sol} \space \dot{T}_{sol} = + \dot{q}
(2) Energy balance of the gas: 0 = \varrho u \space c_{p,gas}\frac{\partial T_{gas}}{\partial x} - \dot{q}
(3) Momentum balance determining gas flow rate: \varrho u = \zeta \sqrt{\triangle p}
(4) Solid reactand species balance: \dot{c}_{R} = - r_R
(5) Constitutive equation for heat flow from gas to solid matrix: \dot{q} = \alpha A (T_{gas} - T_{sol})
(6) Constitutive equation for reactand reaction rate: r_R = 5.5\cdot 10^6 \cdot \exp(\frac{-5680}{273+T_{gas}})

(\varrho c)_{sol} = 2\cdot 10^6 \frac{J}{m^3 K}, \zeta = 5\cdot10^{-3} \frac{kg \sqrt{Pa}}{m^2 s}, \alpha A = 10^4 \frac{W}{K}, c_{p,gas} = 10^3 \frac{J}{kg K} are parameters of the system.

Boundary condition for equation (2) is a gas inlet temperature T_{gas}(t,x=0) = T_{in}.
\triangle p in equation (3) is a boundary condition as well, which can in the general case depend on time (but in the following will not).

In the following, I will consider a discretization of above system, for the time being with a spatial discretization into N = 100 control volumes, using upwind differencing. This yields a system of DAEs with gas temperature T_{gas} (in N discrete volumes) and mass flux \varrho u being algebraic solution variables and solid temperature T_{sol} and reactant concentration c_R being differential variables (both in N discrete volumes). For reasons of simplicity, I model gas mass flux \varrho u to be constant in the whole domain, so one equation is enough to describe that in the discretized state. In the case of N = 100, the overall DAE will have 301 equations. The system can be formulated in mass matrix form. Being a DAE system originating from a PDE semidiscretizations, my problem does not fully match the according DifferentialEquations.jl documentation, since they (as far as I have seen) mostly talk about ODE systems resulting from PDE semidiscretizations.

Baseline Matlab implementation
For systems of DAEs in mass matrix form, Matlab only offers two solvers (ode15s and ode23t), which is of course a huge disadvantage to how many solvers Julia offers.

On my machine, solving the DAE problem with the two mentioned solvers takes about 3.5 seconds. If in the course of discussion, sharing the Matlab scripts becomes necessary, I will do so.

Implementation of numerical solution in Julia
I read up a little on how to make differential equation solving in Julia fast (mainly from documentation of DifferentialEquations.jl and texts/talks by @ChrisRackauckas from here and there).

Based on that, I came up with a first implementation of the solution of above system in Julia, which I formulate in mass matrix form and solve using a variety of different solvers. I compare the benchmarked performances of that with a Matlab script that does the same, and I would like to see how much faster Julia can be for that task.

And here is the Julia script:

using DifferentialEquations
using LinearAlgebra
using BenchmarkTools
using Sundials
using DASKR
using DASSL
using MKL # read somewhere that it may be faster than using BLAS, and it indeed was.
using Plots

# Generate the constants
N = 100
dx = 0.5 / N
q = dualcache(zeros(N,1))
rR = dualcache(zeros(N,1))
ϱu = 0
dp = 100e2
Tin = 200

# generate index lists to find the solution variables easily in solution vector
indices_Tgas = 1:N
indices_Tsol = indices_Tgas .+ N
indices_mflux = indices_Tsol[end] + 1
indices_RConcentrationSolid = indices_mflux[end] .+ (1:N)

indices = (indices_Tgas, indices_Tsol, indices_mflux, indices_RConcentrationSolid)

p = (N, # discretization intervals
     dx, # control volume length
     dp, # boundary condition: pressure differential across domain
     Tin, # boundary condition: Gas inlet temperature
     q, # caching variable for q
     rR, # caching variable for rR
     ϱu, # caching variable for ϱu (mass flux)
     indices_Tgas, # indices for gas temperature in solution vector
     indices_Tsol, # indices for solids temperature in solution vector
     indices_mflux,
     indices_RConcentrationSolid) # index for mass flux in solution vector



function DiscretizedSystemMassMatrixForm!(du,u,p,t)
  N,dx, dp, Tin, q, rR, ϱu, indices_Tgas, indices_Tsol, indices_mflux, indices_RConcentrationSolid = p
  Tgas = @view u[indices_Tgas]
  Tsol = @view u[indices_Tsol]
  dTgas = @view du[indices_Tgas]
  dTsol = @view du[indices_Tsol]
  mflux = @view u[indices_mflux]
  dmflux = @view du[indices_mflux]
  RConcsol  = @view u[indices_RConcentrationSolid]
  dRConcsol  = @view du[indices_RConcentrationSolid]

  # I need to get the right cache variables from dual cache to avoid exceptions during automatic differentiation
  q = get_tmp(q,u)
  rR = get_tmp(rR,u)

  ϱu = mflux[1]
  
  @. q = 1e4 * (Tgas - Tsol)
  @. rR = 5.5e6 * exp(-5680 /(273 + Tsol)) * (1 - exp(-RConcsol/50))

  # gas energy balance
  @inbounds begin
    dTgas[1] =  ϱu * 1000 * (Tin - Tgas[2]) - q[1] * dx
  end
  @inbounds for k in 2:N
    dTgas[k] = ϱu * 1000 * (Tgas[k-1] - Tgas[k]) - q[k] * dx
  end

  
  # solids energy balance
  @inbounds for k in 1:N
    dTsol[k] = q[k] / (1000 * 2000)
  end

  # gas momentum balance
  @inbounds begin
    dmflux[1] = - mflux[1] + 5e-3 * sqrt(dp)
  end

  # reactant concentration equations
  @inbounds for k in 1:N
    dRConcsol[k] = -rR[k]
  end
  
end

# build initial conditions
u0 = zeros(N*3 + 1)
u0[indices_Tgas] .= Tin
u0[indices_Tsol] .= 20
u0[indices_mflux] = 0.5
u0[indices_RConcentrationSolid] .= 3000

# build mass matrix
M = zeros(3*N+1,3*N+1)
M[indices_Tsol,indices_Tsol] = I(N)
M[indices_RConcentrationSolid,indices_RConcentrationSolid] = I(N)

# build ODE Problem (with mass matrix, so it is a DAE)
f = ODEFunction(DiscretizedSystemMassMatrixForm!, mass_matrix=M)
probDAE = ODEProblem(f,u0,(0.0,1500),p)

# solve the problem, e.g. with Rodas4
solDAE  = solve(probDAE,Rodas4(), abstol=1e-6, reltol=1e-6)

# benchmark the solution. Best result (0.13s) with QBDF, worst with RadauIIA5 (1.3s)
@benchmark solve(probDAE,QBDF()) 

# I could not get modellingtoolkitize to work
#using ModelingToolkit
#de = modelingtoolkitize(probDAE)
#probDAE_MTK = ODEProblem(de,[],(0.0,1500),jac=true,sparse=true)

Some notes and questions about my implementation:

  • With above code, I typically already achieve a run time of the solve of 0.3 - 0.5 seconds. QBDF performs especially well with about 0.14s. This is of course already much better than the Matlab baseline, but I would like to see and learn whether more can be achieved.

  • In the equation for r_R, you see that I added a smooth multiplicative term making sure that the reaction rate approaches zero when the reactant concentration reaches zero. This is my standard approach, which usually works in my Matlab implementations. In the vicinity of zero concentration, when the smooth multiplicator becomes active, the solvers are forced to take small steps, leading to reduced performance. I am very open to criticism here and would love to hear better approaches (callbacks?) that work well in Julia in such situations.

  • [Answered by @vettert - thank you] I know that in order to get rid of the last few allocations in my DAE function, I need to broadcast / make broadcast fusion on the equations of Q and rR. However, when doing so, the solvers abort with MethodError: no method matching... followed by an - for me - uninterpretable and massive error message. What am I doing wrong?

  • I tried to modellingtoolkitize my DAE function, but that failed with the following error: LoadError: ArgumentError: invalid index: α₈ of type Num. What did I do wrong? And by the way, what benefit can I expect when using the modellingtoolkitized system?

  • [Answered by @ChrisRackauckas - thank you] Can and should I make my mass matrix sparse? How to do so?

  • What other approaches and code tweaks can you suggest to improve the performance even more? How far can we push it?

To stay as concise as possible, I will leave my code implementing the solution of above system in the fully implicit form out of the discussion. That code does in my view not add much to the discussion, despite enabling me to use Sundial’s IDA (which performed very well, 0.15s, surprisingly in view of the more general nature of DAE formulations it aims at).

Sorry for the lengthy post. I worked on making it concise for quite a bit. I am very interested to get comments and suggestions. Thanks in advance.

5 Likes

Good to see a somewhat familiar domain example on here! Welcome to the community!

I’ll have a closer look at the whole thing later on, but let me comment on this bit here:

The error message comes from the fact that the solver tries to perform automatic differentiation using ForwardDiff.jl to calculate Jacobians etc.

To do this, it uses Dual numbers; think about them as a point value and partial derivatives at that point all in one number. The problem now is your cache array is made for Floats and the Dual numbers won’t fit into it.

The problem is solvable by supplying cache variables that automatically change between the necessary dualcache and the float cache; depending on what is needed in a given function evaluation. There’s a package for this:

And to see a concrete usage, there is a FAQ entry on diffeq.jl:
https://diffeq.sciml.ai/stable/basics/faq/#Autodifferentiation-and-Dual-Numbers

Hope it helps to resolve at least the bit with the allocations :slight_smile:

4 Likes

You can try to use Octavian instead of MKL for matrix multiply. It outperforms MKL a lot for numerical evolution of some PDEs on my machine.

Thank you for the warm welcome and especially for your assistance. In fact I was able to do the necessary changes you proposed to my code and now my DAE function is fully non-allocating :slight_smile: That increased my performance by approximately 10%.

I will edit my above post to reflect the change.

Thank you for the suggestion. I don’t think that I use matrix multiplication in my DAE code. Even the line of code evaluating the r_R equation is applying element-wise operations on vectors, no?

If this is the case, can I expect a speedup of using Octavian vs. MKL? This might be the case if the solver would be able to make use of this under the hood, I guess.

don’t you need to make these const? see:
https://docs.julialang.org/en/v1/manual/performance-tips/

because you do reference them directly in a loop instead of passing them as arguments?

No, elementwise operations don’t need any linear algebra libraries like Octavian or MKL.

1 Like

Don’t I unpack these variables from the parameter argument passed to the DAE function and hence, they are local? Or am I missing something?

Oh you’ve right, this should be fine

sparse(M). Do it if it’s big and sparse.

What’s more important is Jacobian sparsity. Did you read this tutorial?

https://diffeq.sciml.ai/stable/tutorials/advanced_ode_example/

1 Like

Thank you, I must have missed this part of the tutorial earlier. It seems promising.

I tried:

using Symbolics
du0 = copy(u0)
jac_sparsity = Symbolics.jacobian_sparsity((du,u)->DiscretizedSystemMassMatrixForm!(du,u,p,0.0),du0,u0)

This gives me the following error:

MethodError: no method matching Float64(::Num)

I suspect that this is due to how I defined parts of my parameter vector as dualcache. If I change my DAE function and parameter vector to not use dual cache variables at all, getting the sparsity works.

What can I do to get above lines of code working with my parameter vector definition and DAE function?

Just have an allocating version or something to make it easy.

Thanks, I followed your advice and indeed was able to derive and use the jacobian sparsity pattern for a decent speed up.

I was able to do so by having a second version of the DAE function which does not use dual cache, but otherwise is a 100% copy of the original DAE function. This is of course quite ugly, as I now have pretty much the same function twice. Is there a better way?

Additionally, please allow me to repeat some questions from my original post that have not been adressed in the thread yet. Would love to get some feedback on those :slight_smile:

It’s ugly, and we really just need to improve our sparsity interpreter to avoid it. @shashi can you think of any tricks that would work?

This would be the same issue as the sparsity one.

In general, the FAQ and PSA are helpful.

https://diffeq.sciml.ai/stable/basics/faq/

Thanks a lot. That is a lot of material to digest.

Thanks for the hint. I tried to do exactly that: I used a version of the DAE function which does not use dual caches and tried to modellingtoolkitize the ODEProblem defined by that DAE function:

using DifferentialEquations
using LinearAlgebra

# Generate the constants
N = 100
dx = 0.5 / N
q = dualcache(zeros(N,1))
rR = dualcache(zeros(N,1))
ϱu = 0
dp = 100e2
Tin = 200

# generate index lists to find the solution variables easily in solution vector
indices_Tgas = 1:N
indices_Tsol = indices_Tgas .+ N
indices_mflux = indices_Tsol[end] + 1
indices_RConcentrationSolid = indices_mflux[end] .+ (1:N)

indices = (indices_Tgas, indices_Tsol, indices_mflux, indices_RConcentrationSolid)

p = (N, # discretization intervals
     dx, # control volume length
     dp, # boundary condition: pressure differential across domain
     Tin, # boundary condition: Gas inlet temperature
     q, # caching variable for q
     rR, # caching variable for rR
     ϱu, # caching variable for ϱu (mass flux)
     indices_Tgas, # indices for gas temperature in solution vector
     indices_Tsol, # indices for solids temperature in solution vector
     indices_mflux,
     indices_RConcentrationSolid) # index for mass flux in solution vector


function DiscretizedSystemMassMatrixForm_noDualcache!(du,u,p,t)
  N,dx, dp, Tin, q, rR, ϱu, indices_Tgas, indices_Tsol, indices_mflux, indices_RConcentrationSolid = p
  Tgas = @view u[indices_Tgas]
  Tsol = @view u[indices_Tsol]
  dTgas = @view du[indices_Tgas]
  dTsol = @view du[indices_Tsol]
  mflux = @view u[indices_mflux]
  dmflux = @view du[indices_mflux]
  RConcsol  = @view u[indices_RConcentrationSolid]
  dRConcsol  = @view du[indices_RConcentrationSolid]
  
  q = 1e4 * (Tgas - Tsol)
  ϱu = mflux[1]

  rR = 5.5e6 .* exp.(.-5680 ./(273 .+ Tsol)) .* (1 .- exp.(.-RConcsol./50))

  # gas energy balance
  @inbounds begin
    dTgas[1] =  ϱu * 1000 * (Tin - Tgas[1]) - q[1] * dx
  end
  @inbounds for k in 2:N
    dTgas[k] = ϱu * 1000 * (Tgas[k-1] - Tgas[k]) - q[k] * dx
  end

  
  # solids energy balance
  @inbounds for k in 1:N
    dTsol[k] = q[k] / (1000 * 2000)
  end

  # gas momentum balance
  @inbounds begin
    dmflux[1] = - mflux[1] + 5e-3 * sqrt(dp)
  end

  # reactant concentration equations
  @inbounds for k in 1:N
    dRConcsol[k] = -rR[k]
  end
  
  
end

# build initial conditions
u0 = zeros(N*3 + 1)
u0[indices_Tgas] .= Tin
u0[indices_Tsol] .= 20
u0[indices_mflux] = 0.5
u0[indices_RConcentrationSolid] .= 3000

# build mass matrix
Ii = [indices_Tsol; indices_RConcentrationSolid]; 
Jj = [indices_Tsol; indices_RConcentrationSolid]; 
V = ones(length(Ii));
Msparse = sparse(Ii,Jj,V, 3*N+1, 3*N+1)


using ModelingToolkit
f_DAE = ODEFunction(DiscretizedSystemMassMatrixForm_noDualcache!, mass_matrix=Msparse)
probDAE = ODEProblem(f_DAE,u0,(0.0,1500),p)
de = modelingtoolkitize(probDAE)

Unfortunately, modelingtoolkitize still throws the same error as before:

ArgumentError: invalid index: α₈ of type Num

What am I doing wrong?

It looks like you can’t pass the index definitions through the parameters since ModelingToolkit wants to make all the parameters symbolic. I think this code is missing a couple of using statements too. This seems to work:

using DifferentialEquations
using DifferentialEquations.DiffEqBase: dualcache
using LinearAlgebra
using ModelingToolkit
using SparseArrays

# Generate the constants
const N = 100
dx = 0.5 / N
q = dualcache(zeros(N,1))
rR = dualcache(zeros(N,1))
ϱu = 0
dp = 100e2
Tin = 200

# generate index lists to find the solution variables easily in solution vector
const indices_Tgas = 1:N
const indices_Tsol = indices_Tgas .+ N
const indices_mflux = indices_Tsol[end] + 1
const indices_RConcentrationSolid = indices_mflux[end] .+ (1:N)

# indices = (indices_Tgas, indices_Tsol, indices_mflux, indices_RConcentrationSolid)

p = (# N, # discretization intervals
     dx, # control volume length
     dp, # boundary condition: pressure differential across domain
     Tin, # boundary condition: Gas inlet temperature
     q, # caching variable for q
     rR, # caching variable for rR
     ϱu, # caching variable for ϱu (mass flux)
     # indices_Tgas, # indices for gas temperature in solution vector
     # indices_Tsol, # indices for solids temperature in solution vector
     # indices_mflux,
     # indices_RConcentrationSolid, # index for mass flux in solution vector
)

function DiscretizedSystemMassMatrixForm_noDualcache!(du,u,p,t)
  # N,dx, dp, Tin, q, rR, ϱu, indices_Tgas, indices_Tsol, indices_mflux, indices_RConcentrationSolid = p
  dx, dp, Tin, q, rR, ϱu = p
  Tgas = @view u[indices_Tgas]
  Tsol = @view u[indices_Tsol]
  dTgas = @view du[indices_Tgas]
  dTsol = @view du[indices_Tsol]
  mflux = @view u[indices_mflux]
  dmflux = @view du[indices_mflux]
  RConcsol  = @view u[indices_RConcentrationSolid]
  dRConcsol  = @view du[indices_RConcentrationSolid]

  q = 1e4 * (Tgas - Tsol)
  ϱu = mflux[1]

  rR = 5.5e6 .* exp.(.-5680 ./(273 .+ Tsol)) .* (1 .- exp.(.-RConcsol./50))

  # gas energy balance
  @inbounds begin
    dTgas[1] =  ϱu * 1000 * (Tin - Tgas[1]) - q[1] * dx
  end
  @inbounds for k in 2:N
    dTgas[k] = ϱu * 1000 * (Tgas[k-1] - Tgas[k]) - q[k] * dx
  end


  # solids energy balance
  @inbounds for k in 1:N
    dTsol[k] = q[k] / (1000 * 2000)
  end

  # gas momentum balance
  @inbounds begin
    dmflux[1] = - mflux[1] + 5e-3 * sqrt(dp)
  end

  # reactant concentration equations
  @inbounds for k in 1:N
    dRConcsol[k] = -rR[k]
  end


end

# build initial conditions
u0 = zeros(N*3 + 1)
u0[indices_Tgas] .= Tin
u0[indices_Tsol] .= 20
u0[indices_mflux] = 0.5
u0[indices_RConcentrationSolid] .= 3000

# build mass matrix
Ii = [indices_Tsol; indices_RConcentrationSolid];
Jj = [indices_Tsol; indices_RConcentrationSolid];
V = ones(length(Ii));
Msparse = sparse(Ii,Jj,V, 3*N+1, 3*N+1)


f_DAE = ODEFunction(DiscretizedSystemMassMatrixForm_noDualcache!, mass_matrix=Msparse)
probDAE = ODEProblem(f_DAE,u0,(0.0,1500),p)
de = modelingtoolkitize(probDAE)
2 Likes

Thank you for the assistance. I got the MTKized system to work


# build initial conditions
u0 = zeros(N*3 + 1)
u0[indices_Tgas] .= Tin
u0[indices_Tsol] .= 20
u0[indices_mflux] = 0.5
u0[indices_RConcentrationSolid] .= 3000

# build mass matrix
Ii = [indices_Tsol; indices_RConcentrationSolid];
Jj = [indices_Tsol; indices_RConcentrationSolid];
V = ones(length(Ii));
Msparse = sparse(Ii,Jj,V, 3*N+1, 3*N+1)


f_DAE = ODEFunction(DiscretizedSystemMassMatrixForm_noDualcache!, mass_matrix=Msparse)
probDAE = ODEProblem(f_DAE,u0,(0.0,1500),p_noDualCache)
de = modelingtoolkitize(probDAE)

# MTKized system back to numerical problem:
probDAE_MTK1 = ODEProblem(de,[],(0.0,1500),p_noDualCache,jac=true,sparse=true)

# solve the numerical problem - works
sol_MTK1 = solve(probDAE_MTK1,QBDF())

# pretty much as fast as solving non-MTKized system with Jacobian sparsity, as in above posts.
@benchmark solve(probDAE_MTK1,QBDF()) 

To my surpise, the MTKized system does not solve faster than the original system with Jacobian sparsity information, although the MTKized system should use the analytic Jacobian, or am I mistaken?

Anyway. In a next step, I tried to structual_simplify the system and solve it:

# Now try to structural_simplify the system before solving it
de2 = structural_simplify(de)

probDAE_MTK2 = ODEProblem(de2,[],(0.0,1500),p_noDualCache,jac=true,sparse=true)

sol_MTK2 = solve(probDAE_MTK2,QBDF())

Unfortunately, I get an UndefVarError: x201 not defined. Why is that? What can I do to get it to work?

Thank you!