Fun with Flux and Differential equations?

BLI · May 28, 2019, 7:35am

I’m playing around with machine learning (Flux) for dynamic systems. To make things concrete and simple, I have invented a scalar system:

\frac{dy}{dt} = 2(4-y)-10^2\exp(-1/u)y^2 + 10^3\exp(-1/u)y

which could represent a well mixed tank reactor where y could be concentration and u could be (scaled) temperature. Not a very realistic model, but with the advantage of allowing for graphical presentation.

Here is what I have done:

Building a continuous ML model mapping (y,u) \rightarrow \frac{dy}{dt}, \mathrm{FNN}_\mathrm{c}(\cdot)
Building a discrete ML model mapping (y_t,u_t) \rightarrow y_{t+1}, \mathrm{FNN}_\mathrm{d}(\cdot)

Both approaches lead to their own “difficulties” — the first wrt. the DifferentialEquations package of solving:

\frac{dy}{dt} = \mathrm{FNN}_\mathrm{c}(y,u),

the second wrt. interpretation of the model — should I use the predictor:

\hat{y}_{t+1} = \mathrm{FNN}_\mathrm{d}(y_t,u_t)

(which is some nonlinear, moving average of known values — I denote it MAX) or should I use the predictor:

\hat{y}_{t+1} = \mathrm{FNN}_\mathrm{d}(\hat{y}_t,u_t)

with known \hat{y}_0 = y_0 (which is some auto regression with input; I denote it ARX).

So… I run experiments with the simple model, leading to:

which can be given a parametric (time), 3D presentation as:

Here, the :surface plot is the true surface of the original ODE, while the :path3d lines are the results of the experiments, i.e., the data I can use for building the model.

First,

I trained the continuous mapping \mathrm{FDD}_\mathrm{c}(\cdot),
I tried to solve the model \frac{dy}{dt} = \mathrm{FDD}_\mathrm{c}(y,u) using the DifferentialEquations package. Failure with some error message which I suspect is related to types not being consistent??
Then, I solved the ODE using my own, simple Euler implementation… with results (solid line: solution of original ODE; dotted line: solution of ODE with mapping \mathrm{FDD}_\mathrm{c}(\cdot)):

image882×580 112 KB

Here, the “Single experiment” case use the experiment with y(0) = 15, while the “Randomized experiment” use data from all experiments, and randomly permute the order of the data before training.

Second,

I train the discrete mapping using all data. The results of the “MAX” and the “ARX” interpretations are shown below:

image855×567 103 KB
As seen, the “ARX” interpretation performs pathetically poor.

Questions related to the “discrete” model:
A. I use loss function loss_d(x, y) = mean((m_d(x).-y).^2).
B. I assume this loss function is equivalent to the sum of one-step-ahead prediction errors (except for some scaling), \sum_t (y_t - \hat{y}_{t|t-1})^2, where \hat{y}_{t|t-1} is the one step-ahead prediction \hat{y}_t using known y_{t-1}?
C. Is it possible to get the effect of minimizing the “ballistic” predictions, \sum_t (y_t - \hat{y}_{t|0})^2 where \hat{y}_{t|0} is the “ballistic” prediction \hat{y}_t using known y_0? How would I then have to specify the loss function?
D. Because the data are generated from a first order ODE, I would assume that it should be possible to get good predictions also with what I call the “ARX” approach. Is the failure due to insufficient training?
E. Would the “ARX” approach fare better if I extend the model to (y_t,y_{t-1},...,u_t,u_{t-1},...)\rightarrow y_{t+1} (even though I know the model is first order…)?
F. Would the RNN (recurrent NN) handle the “ARX” approach better?

baggepinnen · May 28, 2019, 8:16am

\hat{y}_{t+1} = \mathrm{FNN}_\mathrm{d}(\hat{y}_t,u_t)

In this case (you call it ARX), you do not feed new data into the predictions and you are thus only using y_0 to form all the predictions. This thus corresponds to what I think you call the “ballistic” prediction (I would call it simulation and it’s frequently referred to as simulation error minimization as opposed to prediction error minimization).

Both \hat{y}_{t+1} = \mathrm{FNN}_\mathrm{d}(\hat{y}_t,u_t) and \hat{y}_{t+1} = \mathrm{FNN}_\mathrm{d}(y_t,u_t) are in fact ARX models, the difference is in what data you feed into them. Only the first is an option for multi-step prediction/simulation.

The simulation-error minimization is significantly harder and in some sense equivalent to using an RNN, since the last \hat y depends on all the intermediate \hat y, which are equivalent to the hidden state of an RNN. While minimizing the prediction errors is a convex problem if the model is linear, minimizing the simulation errors is not.

ChrisRackauckas · May 28, 2019, 9:07am

Did you Flux.data the TrackedArrays away? You do have to be quite careful with with Flux here given that it always generates Tracked types.

BLI · May 28, 2019, 10:27am

Here is my loop in my Euler solver:

for j in 2:Ntm
       cA_nn[j] = cA_nn[j-1] + dt*Tracker.data(m_c([cA_nn[j-1],Uc[i][j-1]]))[1]
end

When using the DifferentialEquations solver, I used Tracker.data(m_c([cA_nn[j-1],Uc[i][j-1]]))[1] in the function definition. I can try to recreate the error message.

BLI · May 28, 2019, 10:57am

Comment 1: OK – “simulation-error” minimization – what I called the “ballistic-error” minimization, or someone else may call the “shooting-error” minimization – is definitely harder because (i) if initial parameter guesses makes the system unstable, it is hardly possible to evaluate the cost function, and (ii) this problem has multiple local minima (in almost every case), even for linear systems.

Comment 2: Does Flux and other ML platforms support “simulation-error” minimization? I just discussed this with a ph.d.-student (who works with ML – I am just an “amateur” in the field), and he suggested that I can get “simulation-error” minimization if I include the simulator in the formulation of the loss function. But that would be rather complicated, I think…

Comment 3: Naming conventions…

Both \hat{y}_{t+1}=\mathrm{FNN}_\mathrm{d}(\hat{y}_t,u_t) and \hat{y}_{t+1}=\mathrm{FNN}_\mathrm{d}(y_t,u_t) are in fact ARX models

I’d say yes and no. The underlying model is ARX (or: NARX) of form y_{t+1}=\mathrm{FNN}_\mathrm{d}(y_t,u_t). However, there is nothing auto-regressive in the second predictor \hat{y}_{t+1}=\mathrm{FNN}_\mathrm{d}(y_t,u_t) — the word “auto regressive” really means that the prediction depends on past values (=regressive) of itself (=auto).

You are probably right in the sense that the literature denotes it an ARX (or: NARX) predictor – I should try to locate my good ole’ copy of my ph.d. course syllabus from 1987 and brush up my knowledge:

But the predictor \hat{y}_{t+1}=\mathrm{FNN}_\mathrm{d}(y_t,u_t), although it may be denoted “ARX” really computes the prediction based on some “moving average” of past data – and as such is not dynamic and can not become unstable.

Comment 4: OK – I’ll take a look at the RNN. My understanding is that an RNN layer contains a state, which in signal processing terminology implies “infinite impulse response”. Limited memory RNNs probably have “long” but “finite impulse response”, and in one sense would be similar to FNNs with “ARX” data structure but multiply delayed outputs in the input data set.

baggepinnen · May 28, 2019, 11:55am

Most often, the auto regression does not refer to the predictor using it’s own prediction, rather it refers to the signal y_t being a function of itself, i.e., y_{t-k}, no matter where these previous values of y come from.

An RNN does contain a state, but so does any model where you input previous predictions to form new predictions. The state may be implicit in this case, but all intermediate predictions serve the same role as the state in an RNN.

BTW, I see we read the same book. The first author was on the committee for my defense a couple of months ago

ChrisRackauckas · May 28, 2019, 11:59am

I’m starting to think you haven’t read DiffEqFlux.jl – A Julia Library for Neural Differential Equations

or haven’t seen the library GitHub - SciML/DiffEqFlux.jl: Universal neural differential equations with O(1) backprop, GPUs, and stiff+non-stiff DE solvers, demonstrating scientific machine learning (SciML) and physics-informed machine learning methods

BLI · May 28, 2019, 12:21pm

Answers: yes, and yes – I haven’t read the web page, and I haven’t looked at the library – I have delayed looking at these more fancy things; I want to understand the basics first, but it is good to know of the tools.

I have seen the name of DiffEqFlux floating around. What I might find useful is if there is a tool integrating Flux and DifferentialEquations which makes it simple to develop an ML predictor (“surrogate model”. Here, ML = machine learning, not maximum likelihood) based on a complex differential equations model. Such a package would certainly be useful, but a complete package will be rather extensive, including things such as optimal experiment design to get as good data as possible before training the model. And: should the “surrogate” model be continuous time, or discrete time? In Farrell and Polycarpou’s 2006 book,

they tend to emphasize continuous time models by estimating derivatives through some filtering (but essentially for experimental data). If one has a noise-free differential equations model to create the data, a DAE solver already computes the derivatives.

I’m not sure if DiffEqFlux does this (ease developoing surrogate models) – your hint may indicate that it is related to “shooting error prediction”; anyway, thanks for the reminder.

ChrisRackauckas · May 28, 2019, 12:23pm

It’s not in the library, but it does help for this. More enhancements to it are coming out at JuliaCon.

ChrisRackauckas · May 28, 2019, 12:55pm

For fixing your problem, you probably want to do the 2-dispatch approach from the Flux.jl README. That’s my best guess as to what the issue is without seeing code.

BLI · May 28, 2019, 1:53pm

Here is what I try to do:

# Differential Equations model of predictor
uc_f = LinearInterpolation(uc,tm)
#
function mod_cs(dy,y,p,t)
    dy = Tracker.data(m_cs([y,uc_f(t)]))[1]
end

Testing the function, I get:

julia> mod_cs(0,15,0,0)
-28.649235f0

If I try to solve the model using DifferentialEquations, this is what happens:

tspan = (0,2.)
prob = ODEProblem(mod_cs,10.,tspan)

and then:

julia> sol = solve(prob,Tsit5(),reltol=1e-8,abstol=1e-8)
MethodError: no method matching similar(::Float64)
Closest candidates are:
  similar(!Matched::ZMQ.Message, !Matched::Type{T}, !Matched::Tuple{Vararg{Int64,N}} where N) where T at C:\Users\user\.julia\packages\ZMQ\ABGOx\src\message.jl:93
  similar(!Matched::DataStructures.IntSet) at deprecated.jl:55
  similar(!Matched::DataFrames.StackedVector, !Matched::Type, !Matched::Union{Integer, AbstractUnitRange}...) at C:\Users\user\.julia\packages\DataFrames\m9gd9\src\abstractdataframe\reshape.jl:384
  ...

Stacktrace:
 [1] alg_cache(::Tsit5, ::Float64, ::Float64, ::Type, ::Type, ::Type, ::Float64, ::Float64, ::ODEFunction{true,typeof(mod_cs),LinearAlgebra.UniformScaling{Bool},Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing}, ::Float64, ::Float64, ::Float64, ::Nothing, ::Bool, ::Type{Val{true}}) at C:\Users\user\.julia\packages\OrdinaryDiffEq\TUKTm\src\caches\low_order_rk_caches.jl:356
 [2] #__init#292.....

I have tried to convert the Float32 number to Float64, with no effect.

ChrisRackauckas · May 28, 2019, 2:56pm

If you want to solve a scalar differential equation, use the out of place format (u,p,t):

function mod_cs(y,p,t)
    Tracker.data(m_cs([y,uc_f(t)]))[1]
end

BLI · May 28, 2019, 4:04pm

… that was simple, and solved the problem.

Topic		Replies	Views
DifferentialEquations.jl + Flux.jl or Knet.jl Machine Learning diffeq	15	3594	October 18, 2018
How do I debug this in diffeqflux? General Usage	38	1791	November 15, 2020
Tracking initial condition to optimize starting value too Diffeqflux Specific Domains diffeq , flux	11	1298	July 25, 2019
DiffEqFlux and Lux/Flux General Usage	7	1148	January 27, 2023
DiffEqFlux and Flux v0.10 (compatibility) Machine Learning	7	1297	December 6, 2019

Fun with Flux and Differential equations?

Related topics