Chris’s post The Essential Tools of Scientific Machine Learning discussed several tools that are “essential for scientific machine learning” including probabilistic programming, structured linear algebra, discretizing partial differential equations, LU and QR factorizations, uncertainty quantification, and global sensitivity analysis. Are these tools only essential for (conventional) numerical solving methods or still essential for neural network based PDE solvers too? If the latter, is there any reference explaining how each of these tools can contribute to NN-based solvers? I am new to PDE solving and sorry for my naive question.
Thanks.
Another question, can NN-based PDE solvers completely replace the numerical ones? If no, what are the remining challenges? Is there any reference discussing this problem in detail?
Thanks.
Check out the package: NeuralNetDiffEq
It uses Flux to solve pdes. The literature has shown these tools can solve high dimensional BSDEs very fast. If you read the issues in this package you will see work on using Deep learning for solving general pdes as well.
NN pde solvers provide an additional strategy to solve certain classes of PDEs.
It is too early to say how broad of a class of pdes it can solve…
Copying my reply here as well.
Neural network based solvers are quite slow for forward problems. They aren’t necessarily faster on inverse problems either. They can solve an interesting set of problems that are difficult to solve with traditional methods though, like methods with very high dimension (100 dimensional PDEs) or problems where you want to merge with data (i.e. physics-informed neural networks where you use the physics as a regularizer for small datasets). On conventional 2D or 3D PDEs these things are much slower than a well-tuned classical methods, with good reason, because the classical methods are using all of the information about the problem to make it as simple as possible while the neural network approach is a hammer and a big optimization problem. That said, the way to improve the neural approaches is to add more traditional information to them, so mixing FEM with neural networks and stuff like that is where the literature is going, and that has potential to actually accelerate over a classical method, but needs all of the tools of the classical methods.
Thanks for the answer very much. Are there any more detailed references/data showing the slowness of NN-based PDE solvers compared to the conventional ones?
It’s mentioned in most of the papers on it (see things like the physics-informed neural networks papers) that it’s not fast for forward problems. I don’t have a good source that has done a systematic study on exactly that though, but if you give it a try a few times then you see it’s not even close.
I have found some papers pointing out the slowness problem of nn solvers. Thanks.
Would you please recommend several references that (potentially) optimize nn solvers with FEM (numerical) techniques? For example, I am wondering whether optimizing linear equation solving (the “Distributed Dense, Structured, and Sparse Linear Algebra” tool) is still essential for nn solvers. After reading the 18.337J/6.338J lectures on solving PDEs, my understanding is that even for numerical PDEs, most likely we need to solve linear equations only for stiff PDEs. As for nn solvers, although some nn solvers (PINN, Learning data driven discretizations for partial differential equations) discretize the PDEs, looks like they do not build or solve linear equations.
Another question related to nn PDE solvers, you mentioned some numerical solvers in the universal differential equation post.
There is this property of ODEs called stiffness, and when it comes into play, the simple Runge-Kutta method or Adams-Bashforth-Moulton methods are no longer stable enough to accurately solve the equations. Thus when looking at the universal partial differential equations, we had to make use of a set of ODE solvers which have package implementations in Julia and Fortran.
In my understanding, NeuralNetDiffEq.jl and DiffEqFlux.jl provide differentiable PDE solvers while DifferentialEquations.jl provides conventional numerical solvers (Runge-Kutta method, Adams-Bashforth-Moulton, etc). Do you mean that we can / should also differentiate the numerical solvers?
I don’t know about that direction, but the other direction (optimizing FEM techniques with NNs) is something that people are exploring, in things like this: https://www.sciencedirect.com/science/article/abs/pii/S0263823118303537 . And searching for that you can find like 20 other papers on this. I know the dolfin-adjoint people are getting in on this as well, but I don’t know if they’ve put out a paper quite yet.
Pretty much all PDEs are inherently stiff under some measure. CFL constants are a measure of maximal step size for stability, which is a measure of stiffness. There are of course many different ways to try and handle this stiffness. One very common way is to use implicit methods (for certain classes of PDEs), but other choices like multirate methods do exist (for semi-stiff equations). However, implicit methods have traditionally done extremely well, so I think more and more you’ll see them mixed with neural network approaches. They will lag behind other methods in terms of development though mostly because they are much more difficult to build.
Another place where this will likely show up in probably in Hessian calculations due to gradient pathologies. [2001.04536] Understanding and mitigating gradient pathologies in physics-informed neural networks details the gradient issues of PINNs quite well.
You definitely can/should differentiate numerical solvers. DifferentialEquations.jl provides differentiable numerical solvers (DiffEqSensitivity.jl) that NeuralNetDiffEq.jl and DiffEqFlux.jl build on for the neural-based methods. A lot of papers are easily generalized by thinking about it as a problem of differentiable numerical methods.
Want to do the Multistep PINN? That method is just DiffEqFlux where you use a method like VCABM3 and set adaptive=false
. How do you make it adaptive? Don’t set adaptive=false
. How do you make it handle stiff equations? Replace the ODE solver choice.
Similarly, the UDE paper explains how this method is just a specific SDE in a differentiable implementation of Euler-Maruyama (EM()
in DiffEq), so how do you generalize that to adaptive time stepping? LambaEM()
. Stiff equations? ImplicitEM()
, or SROCK()
, or etc. So it turns out that a lot of these methods can get generalized if you just think about them in the format of a differentiable DE solver, and the advantage is that you can then get all of the optimizations of solver directly.