I am new to Julia and I am considering to adopt it in my professional life, where I often have to solve systems of partial differential equations (often mixed with algebraic and ordinary differential equations).
I typically discretize such systems in space manually and solve the semidiscretized system in time with some differential equation solver. So far, I am doing such work exclusively in Matlab and I would like to find out by how much faster (and potentially more elegant & readable) Julia can be for such endeavours.
To approach this question, I designed an example physical system that is similar in nature to what I encounter frequently in my professional life. The system of equations describes (under some assumptions) one dimensional flow of a gas through a porous solid structure, where heat is exchanged from gas to the solid matrix. In the solid matrix, a temperaturedependent chemical reaction is triggered. The governing equations are as follows:
(1) Energy balance of the solid material: (\varrho c)_{sol} \space \dot{T}_{sol} = + \dot{q}
(2) Energy balance of the gas: 0 = \varrho u \space c_{p,gas}\frac{\partial T_{gas}}{\partial x}  \dot{q}
(3) Momentum balance determining gas flow rate: \varrho u = \zeta \sqrt{\triangle p}
(4) Solid reactand species balance: \dot{c}_{R} =  r_R
(5) Constitutive equation for heat flow from gas to solid matrix: \dot{q} = \alpha A (T_{gas}  T_{sol})
(6) Constitutive equation for reactand reaction rate: r_R = 5.5\cdot 10^6 \cdot \exp(\frac{5680}{273+T_{gas}})(\varrho c)_{sol} = 2\cdot 10^6 \frac{J}{m^3 K}, \zeta = 5\cdot10^{3} \frac{kg \sqrt{Pa}}{m^2 s}, \alpha A = 10^4 \frac{W}{K}, c_{p,gas} = 10^3 \frac{J}{kg K} are parameters of the system.
Boundary condition for equation (2) is a gas inlet temperature T_{gas}(t,x=0) = T_{in}.
\triangle p in equation (3) is a boundary condition as well, which can in the general case depend on time (but in the following will not).
In the following, I will consider a discretization of above system, for the time being with a spatial discretization into N = 100 control volumes, using upwind differencing. This yields a system of DAEs with gas temperature T_{gas} (in N discrete volumes) and mass flux \varrho u being algebraic solution variables and solid temperature T_{sol} and reactant concentration c_R being differential variables (both in N discrete volumes). For reasons of simplicity, I model gas mass flux \varrho u to be constant in the whole domain, so one equation is enough to describe that in the discretized state. In the case of N = 100, the overall DAE will have 301 equations. The system can be formulated in mass matrix form. Being a DAE system originating from a PDE semidiscretizations, my problem does not fully match the according DifferentialEquations.jl documentation, since they (as far as I have seen) mostly talk about ODE systems resulting from PDE semidiscretizations.
Baseline Matlab implementation
For systems of DAEs in mass matrix form, Matlab only offers two solvers (ode15s and ode23t), which is of course a huge disadvantage to how many solvers Julia offers.
On my machine, solving the DAE problem with the two mentioned solvers takes about 3.5 seconds. If in the course of discussion, sharing the Matlab scripts becomes necessary, I will do so.
Implementation of numerical solution in Julia
I read up a little on how to make differential equation solving in Julia fast (mainly from documentation of DifferentialEquations.jl and texts/talks by @ChrisRackauckas from here and there).
Based on that, I came up with a first implementation of the solution of above system in Julia, which I formulate in mass matrix form and solve using a variety of different solvers. I compare the benchmarked performances of that with a Matlab script that does the same, and I would like to see how much faster Julia can be for that task.
And here is the Julia script:
using DifferentialEquations
using LinearAlgebra
using BenchmarkTools
using Sundials
using DASKR
using DASSL
using MKL # read somewhere that it may be faster than using BLAS, and it indeed was.
using Plots
# Generate the constants
N = 100
dx = 0.5 / N
q = dualcache(zeros(N,1))
rR = dualcache(zeros(N,1))
ϱu = 0
dp = 100e2
Tin = 200
# generate index lists to find the solution variables easily in solution vector
indices_Tgas = 1:N
indices_Tsol = indices_Tgas .+ N
indices_mflux = indices_Tsol[end] + 1
indices_RConcentrationSolid = indices_mflux[end] .+ (1:N)
indices = (indices_Tgas, indices_Tsol, indices_mflux, indices_RConcentrationSolid)
p = (N, # discretization intervals
dx, # control volume length
dp, # boundary condition: pressure differential across domain
Tin, # boundary condition: Gas inlet temperature
q, # caching variable for q
rR, # caching variable for rR
ϱu, # caching variable for ϱu (mass flux)
indices_Tgas, # indices for gas temperature in solution vector
indices_Tsol, # indices for solids temperature in solution vector
indices_mflux,
indices_RConcentrationSolid) # index for mass flux in solution vector
function DiscretizedSystemMassMatrixForm!(du,u,p,t)
N,dx, dp, Tin, q, rR, ϱu, indices_Tgas, indices_Tsol, indices_mflux, indices_RConcentrationSolid = p
Tgas = @view u[indices_Tgas]
Tsol = @view u[indices_Tsol]
dTgas = @view du[indices_Tgas]
dTsol = @view du[indices_Tsol]
mflux = @view u[indices_mflux]
dmflux = @view du[indices_mflux]
RConcsol = @view u[indices_RConcentrationSolid]
dRConcsol = @view du[indices_RConcentrationSolid]
# I need to get the right cache variables from dual cache to avoid exceptions during automatic differentiation
q = get_tmp(q,u)
rR = get_tmp(rR,u)
ϱu = mflux[1]
@. q = 1e4 * (Tgas  Tsol)
@. rR = 5.5e6 * exp(5680 /(273 + Tsol)) * (1  exp(RConcsol/50))
# gas energy balance
@inbounds begin
dTgas[1] = ϱu * 1000 * (Tin  Tgas[2])  q[1] * dx
end
@inbounds for k in 2:N
dTgas[k] = ϱu * 1000 * (Tgas[k1]  Tgas[k])  q[k] * dx
end
# solids energy balance
@inbounds for k in 1:N
dTsol[k] = q[k] / (1000 * 2000)
end
# gas momentum balance
@inbounds begin
dmflux[1] =  mflux[1] + 5e3 * sqrt(dp)
end
# reactant concentration equations
@inbounds for k in 1:N
dRConcsol[k] = rR[k]
end
end
# build initial conditions
u0 = zeros(N*3 + 1)
u0[indices_Tgas] .= Tin
u0[indices_Tsol] .= 20
u0[indices_mflux] = 0.5
u0[indices_RConcentrationSolid] .= 3000
# build mass matrix
M = zeros(3*N+1,3*N+1)
M[indices_Tsol,indices_Tsol] = I(N)
M[indices_RConcentrationSolid,indices_RConcentrationSolid] = I(N)
# build ODE Problem (with mass matrix, so it is a DAE)
f = ODEFunction(DiscretizedSystemMassMatrixForm!, mass_matrix=M)
probDAE = ODEProblem(f,u0,(0.0,1500),p)
# solve the problem, e.g. with Rodas4
solDAE = solve(probDAE,Rodas4(), abstol=1e6, reltol=1e6)
# benchmark the solution. Best result (0.13s) with QBDF, worst with RadauIIA5 (1.3s)
@benchmark solve(probDAE,QBDF())
# I could not get modellingtoolkitize to work
#using ModelingToolkit
#de = modelingtoolkitize(probDAE)
#probDAE_MTK = ODEProblem(de,[],(0.0,1500),jac=true,sparse=true)
Some notes and questions about my implementation:

With above code, I typically already achieve a run time of the solve of 0.3  0.5 seconds. QBDF performs especially well with about 0.14s. This is of course already much better than the Matlab baseline, but I would like to see and learn whether more can be achieved.

In the equation for r_R, you see that I added a smooth multiplicative term making sure that the reaction rate approaches zero when the reactant concentration reaches zero. This is my standard approach, which usually works in my Matlab implementations. In the vicinity of zero concentration, when the smooth multiplicator becomes active, the solvers are forced to take small steps, leading to reduced performance. I am very open to criticism here and would love to hear better approaches (callbacks?) that work well in Julia in such situations.

[Answered by @vettert  thank you] I know that in order to get rid of the last few allocations in my DAE function, I need to broadcast / make broadcast fusion on the equations of Q and rR. However, when doing so, the solvers abort with
MethodError: no method matching...
followed by an  for me  uninterpretable and massive error message. What am I doing wrong? 
I tried to
modellingtoolkitize
my DAE function, but that failed with the following error:LoadError: ArgumentError: invalid index: α₈ of type Num
. What did I do wrong? And by the way, what benefit can I expect when using themodellingtoolkitize
d system? 
[Answered by @ChrisRackauckas  thank you] Can and should I make my mass matrix sparse? How to do so?

What other approaches and code tweaks can you suggest to improve the performance even more? How far can we push it?
To stay as concise as possible, I will leave my code implementing the solution of above system in the fully implicit form out of the discussion. That code does in my view not add much to the discussion, despite enabling me to use Sundial’s IDA (which performed very well, 0.15s, surprisingly in view of the more general nature of DAE formulations it aims at).
Sorry for the lengthy post. I worked on making it concise for quite a bit. I am very interested to get comments and suggestions. Thanks in advance.