Why do we need 3 chains to solve a PDE using NeuralPDE

I am working through examples from NeuralPDE.jl package and have a question. I am looking at the inverse problem example (i.e. parameter estimation) and just need to get my thinking straight. This is more of a conceptual/theory question than it is a Julia one.

Let’s consider the Lorenz system as given in the example.

\begin{aligned} x\prime &= \sigma(y - x) \\ y\prime &= x(\rho - z) - y \\ z\prime &= xy - \beta z \end{aligned}

The parameters to be estimated here are \sigma, \rho, and \beta. From my understanding of the theory of PINNs, the underlying NN would have an input dimension of 3 (corresponding to time t) and output dimension of 3 (corresponding to the outputs x, y, and z. In other words, there is just one NN that is being trained.

However, in the example, I see there are three independent NN being trained each with an output dimension of 1 (so three neural networks corresponding to x, y, and z).

 chain1 = Lux.Chain(Dense(input_, n, Lux.σ), Dense(n, n, Lux.σ), Dense(n, n, Lux.σ),
                       Dense(n, 1))
 chain2 = Lux.Chain(Dense(input_, n, Lux.σ), Dense(n, n, Lux.σ), Dense(n, n, Lux.σ),
                       Dense(n, 1))
 chain3 = Lux.Chain(Dense(input_, n, Lux.σ), Dense(n, n, Lux.σ), Dense(n, n, Lux.σ),
                       Dense(n, 1))
 discretization = NeuralPDE.PhysicsInformedNN(
        [chain1, chain2, chain3],
        NeuralPDE.GridTraining(dt), 
        param_estim = true, # whether the parameters of the differential equation should be sent to the additional_loss function
        additional_loss = additional_loss)

I don’t really understand why we would need three chains, each with a single output (i.e., dim = 1) instead of just one chain with 3 outputs. The documentation does say

chain : a vector of Flux.jl or Lux.jl chains with a d-dimensional input and a 1-dimensional output corresponding to each of the dependent variables.

but it does not answer my question. Would anyone recommend suggested readings?

Edit: In the dev version of the documentation, there is now an example of parameter estimation of the Volta Lotkerra model (a system with 2 equations, 4 parameters) in a Bayesian framework. In this example, the NN is exactly what I expected it to be; i.e., input of dimension 1 and output of dimension 2.

So now the documentation has two examples of similar models, but one that uses BNNODE interface (with a single NN) and the other that uses PhysicsInformedNN interface (with multiple NN, each NN corresponding to an equation).

@ChrisRackauckas tagging you for visibility and would appreciate your comment on this.

You can find the answer here NeuralPDE systems of PDEs. In short, it is because you have 3 equations, so 1 NN per equation.

Thanks!

I am wondering why in one of the examples they are able to use a single neural network even though the system has more than 1 equation (see here). This example uses the BNNODE API but I was wondering if it’s possible to convert this to using the PhysicsInformedNN API as well (I am guessing that PhysicsInformedNN is more primitive and should be able to handle the Bayesian example as well)

The ODE formulation specializes on the fact that everything is only differentiated once, and everything is differentiated once.

No. This is addressed in the writeup that explains NeuralPDE.jl.

The summary of that part is really that with high order automatic differentiation you get really bad scaling. It’s multiplicative between the number of outputs and the order. So you really only want to differentiate what you need. If you pool everything together in one neural network and only one term needs a second or third derivative, you’re pretty much stuck differentiating everything to third order which has a cubic cost. Splitting the networks gives only a linear cost growth, so it’s much cheaper.

I do plan to make it so it can be a choice in the future, but that’ll take some work.

2 Likes

@ChrisRackauckas

I spent some time today reading the write up NeuralPDE (arxiv.org) and also this paper on using ANNs to solve ODEs and I have a fairly good understanding now, but I am still not sure about what you mean by “differentiated only once”. I have two questions

  1. If I have a system of ODEs with constant parameters, should I stick with the NNODE API to solve the system?

  2. When is it required to use the PhysicsInformedNN API?

And I may create another topic for this, but what if my system of ODEs has a time-dependent parameter? Consider for example, this basic SIR system

S' = -b(t) S I 
I' = b(t) S I - g I

Given data for I I’d like to solve the inverse problem of finding b(t) but I am not sure how to use the package to do that (it seems to me that I need another NN for b(t))

Thanks in advance for your help, if you have the time!

Yes

PDEs

That’s physics-informed neural operators territory, which should get added soon

Thanks @ChrisRackauckas, so in the documentation there are two examples of inverse problems.

  1. Under the ODE section, this example uses the NNODE API to optimize the parameters of a system of ordinary differential equations. In this example, a single NN with output dimension equal to the number of equations is used.

  2. Also, under the PDE PINN section, this example uses the PINN API instead for the inverse problem (though it’s still a system of ordinary differential equations), with 3 separate chains corresponding to the three solutions x(t), y(t), z(t).

Should we consolidate or get rid of the second example? At the very least, perhaps we can say what’s different between the two. I am happy to create an issue/PR if you let me know what the main differences are here.

That one is fine. It should probably be replaced with a PDE one though.