I’m playing around with machine learning (Flux) for dynamic systems. To make things concrete and simple, I have invented a scalar system:
which could represent a well mixed tank reactor where y could be concentration and u could be (scaled) temperature. Not a very realistic model, but with the advantage of allowing for graphical presentation.
Here is what I have done:
- Building a continuous ML model mapping (y,u) \rightarrow \frac{dy}{dt}, \mathrm{FNN}_\mathrm{c}(\cdot)
- Building a discrete ML model mapping (y_t,u_t) \rightarrow y_{t+1}, \mathrm{FNN}_\mathrm{d}(\cdot)
Both approaches lead to their own “difficulties” — the first wrt. the DifferentialEquations package of solving:
the second wrt. interpretation of the model — should I use the predictor:
(which is some nonlinear, moving average of known values — I denote it MAX) or should I use the predictor:
with known \hat{y}_0 = y_0 (which is some auto regression with input; I denote it ARX).
So… I run experiments with the simple model, leading to:
which can be given a parametric (time), 3D presentation as:
Here, the
:surface
plot is the true surface of the original ODE, while the :path3d
lines are the results of the experiments, i.e., the data I can use for building the model.
First,
- I trained the continuous mapping \mathrm{FDD}_\mathrm{c}(\cdot),
- I tried to solve the model \frac{dy}{dt} = \mathrm{FDD}_\mathrm{c}(y,u) using the
DifferentialEquations
package. Failure with some error message which I suspect is related to types not being consistent?? - Then, I solved the ODE using my own, simple Euler implementation… with results (solid line: solution of original ODE; dotted line: solution of ODE with mapping \mathrm{FDD}_\mathrm{c}(\cdot)):
Here, the “Single experiment” case use the experiment with y(0) = 15, while the “Randomized experiment” use data from all experiments, and randomly permute the order of the data before training.
Second,
- I train the discrete mapping using all data. The results of the “MAX” and the “ARX” interpretations are shown below:
- As seen, the “ARX” interpretation performs pathetically poor.
Questions related to the “discrete” model:
A. I use loss function loss_d(x, y) = mean((m_d(x).-y).^2)
.
B. I assume this loss function is equivalent to the sum of one-step-ahead prediction errors (except for some scaling), \sum_t (y_t - \hat{y}_{t|t-1})^2, where \hat{y}_{t|t-1} is the one step-ahead prediction \hat{y}_t using known y_{t-1}?
C. Is it possible to get the effect of minimizing the “ballistic” predictions, \sum_t (y_t - \hat{y}_{t|0})^2 where \hat{y}_{t|0} is the “ballistic” prediction \hat{y}_t using known y_0? How would I then have to specify the loss
function?
D. Because the data are generated from a first order ODE, I would assume that it should be possible to get good predictions also with what I call the “ARX” approach. Is the failure due to insufficient training?
E. Would the “ARX” approach fare better if I extend the model to (y_t,y_{t-1},...,u_t,u_{t-1},...)\rightarrow y_{t+1} (even though I know the model is first order…)?
F. Would the RNN
(recurrent NN) handle the “ARX” approach better?