Why do we need 3 chains to solve a PDE using NeuralPDE

I am working through examples from NeuralPDE.jl package and have a question. I am looking at the inverse problem example (i.e. parameter estimation) and just need to get my thinking straight. This is more of a conceptual/theory question than it is a Julia one.

Let’s consider the Lorenz system as given in the example.

\begin{aligned} x\prime &= \sigma(y - x) \\ y\prime &= x(\rho - z) - y \\ z\prime &= xy - \beta z \end{aligned}

The parameters to be estimated here are \sigma, \rho, and \beta. From my understanding of the theory of PINNs, the underlying NN would have an input dimension of 3 (corresponding to time t) and output dimension of 3 (corresponding to the outputs x, y, and z. In other words, there is just one NN that is being trained.

However, in the example, I see there are three independent NN being trained each with an output dimension of 1 (so three neural networks corresponding to x, y, and z).

 chain1 = Lux.Chain(Dense(input_, n, Lux.σ), Dense(n, n, Lux.σ), Dense(n, n, Lux.σ),
                       Dense(n, 1))
 chain2 = Lux.Chain(Dense(input_, n, Lux.σ), Dense(n, n, Lux.σ), Dense(n, n, Lux.σ),
                       Dense(n, 1))
 chain3 = Lux.Chain(Dense(input_, n, Lux.σ), Dense(n, n, Lux.σ), Dense(n, n, Lux.σ),
                       Dense(n, 1))
 discretization = NeuralPDE.PhysicsInformedNN(
        [chain1, chain2, chain3],
        NeuralPDE.GridTraining(dt), 
        param_estim = true, # whether the parameters of the differential equation should be sent to the additional_loss function
        additional_loss = additional_loss)

I don’t really understand why we would need three chains, each with a single output (i.e., dim = 1) instead of just one chain with 3 outputs. The documentation does say

chain : a vector of Flux.jl or Lux.jl chains with a d-dimensional input and a 1-dimensional output corresponding to each of the dependent variables.

but it does not answer my question. Would anyone recommend suggested readings?

Edit: In the dev version of the documentation, there is now an example of parameter estimation of the Volta Lotkerra model (a system with 2 equations, 4 parameters) in a Bayesian framework. In this example, the NN is exactly what I expected it to be; i.e., input of dimension 1 and output of dimension 2.

So now the documentation has two examples of similar models, but one that uses BNNODE interface (with a single NN) and the other that uses PhysicsInformedNN interface (with multiple NN, each NN corresponding to an equation).

@ChrisRackauckas tagging you for visibility and would appreciate your comment on this.

You can find the answer here NeuralPDE systems of PDEs. In short, it is because you have 3 equations, so 1 NN per equation.

Thanks!

I am wondering why in one of the examples they are able to use a single neural network even though the system has more than 1 equation (see here). This example uses the BNNODE API but I was wondering if it’s possible to convert this to using the PhysicsInformedNN API as well (I am guessing that PhysicsInformedNN is more primitive and should be able to handle the Bayesian example as well)

The ODE formulation specializes on the fact that everything is only differentiated once, and everything is differentiated once.

No. This is addressed in the writeup that explains NeuralPDE.jl.

The summary of that part is really that with high order automatic differentiation you get really bad scaling. It’s multiplicative between the number of outputs and the order. So you really only want to differentiate what you need. If you pool everything together in one neural network and only one term needs a second or third derivative, you’re pretty much stuck differentiating everything to third order which has a cubic cost. Splitting the networks gives only a linear cost growth, so it’s much cheaper.

I do plan to make it so it can be a choice in the future, but that’ll take some work.

1 Like