I’m playing around with machine learning (Flux) for dynamic systems. To make things concrete and simple, I have invented a scalar system:

which could represent a well mixed tank reactor where y could be concentration and u could be (scaled) temperature. Not a very realistic model, but with the advantage of allowing for graphical presentation.

Here is what I have done:

- Building a continuous ML model mapping (y,u) \rightarrow \frac{dy}{dt}, \mathrm{FNN}_\mathrm{c}(\cdot)
- Building a discrete ML model mapping (y_t,u_t) \rightarrow y_{t+1}, \mathrm{FNN}_\mathrm{d}(\cdot)

Both approaches lead to their own “difficulties” — the first wrt. the DifferentialEquations package of solving:

the second wrt. interpretation of the model — should I use the predictor:

(which is some nonlinear, moving average of known values — I denote it MAX) or should I use the predictor:

with known \hat{y}_0 = y_0 (which is some auto regression with input; I denote it ARX).

So… I run experiments with the simple model, leading to:

which can be given a parametric (time), 3D presentation as:

Here, the

`:surface`

plot is the true surface of the original ODE, while the `:path3d`

lines are the results of the experiments, i.e., the data I can use for building the model.
First,

- I trained the continuous mapping \mathrm{FDD}_\mathrm{c}(\cdot),
- I tried to solve the model \frac{dy}{dt} = \mathrm{FDD}_\mathrm{c}(y,u) using the
`DifferentialEquations`

package.**Failure**with some error message which I suspect is related to types not being consistent?? - Then, I solved the ODE using my own, simple Euler implementation… with results (solid line: solution of original ODE; dotted line: solution of ODE with mapping \mathrm{FDD}_\mathrm{c}(\cdot)):

Here, the “Single experiment” case use the experiment with y(0) = 15, while the “Randomized experiment” use data from all experiments, and randomly permute the order of the data before training.

Second,

- I train the discrete mapping using all data. The results of the “MAX” and the “ARX” interpretations are shown below:

- As seen, the “ARX” interpretation performs pathetically poor.

Questions related to the “discrete” model:

A. I use loss function `loss_d(x, y) = mean((m_d(x).-y).^2)`

.

B. I assume this loss function is equivalent to the sum of *one-step-ahead prediction errors* (except for some scaling), \sum_t (y_t - \hat{y}_{t|t-1})^2, where \hat{y}_{t|t-1} is the one step-ahead prediction \hat{y}_t using known y_{t-1}?

C. Is it possible to get the effect of minimizing the “ballistic” predictions, \sum_t (y_t - \hat{y}_{t|0})^2 where \hat{y}_{t|0} is the “ballistic” prediction \hat{y}_t using known y_0? How would I then have to specify the `loss`

function?

D. Because the data are generated from a first order ODE, I would assume that it should be possible to get good predictions also with what I call the “ARX” approach. Is the failure due to insufficient training?

E. Would the “ARX” approach fare better if I extend the model to (y_t,y_{t-1},...,u_t,u_{t-1},...)\rightarrow y_{t+1} (even though I know the model is first order…)?

F. Would the `RNN`

(recurrent NN) handle the “ARX” approach better?