Problem setup for UODE testing

Hi all,
I’m following these Universal ODE examples, trying to get one of my own systems to work. Having a good deal of issues getting things to solve & I hope someone can see the error I’m making.

Problem is that I see no errors turning up, some of the function calls are just spooling forever. I’ve turned on status bars on the ODE solver calls and I either get
ODE 0%| | ETA: N/A
or stuck somewhere around 25% with the ETA just continually climbing up, never going down.

My system:

\begin{align} \frac{dB}{dt} &= r_B\frac{N}{N+H_1}B-c_B B^2 - p_r \frac{PB^2}{H_4^2+B^2}+i_B,\\ \frac{dP}{dt} &= p_r p_e \frac{PB^2}{H_4^2+B^2}\frac{V}{H_2+V} - c_P P^2 - U_1(B,P,V) +i_P, \\ \frac{dV}{dt} &= r_V V - U_2(B,P,V) - m_V \frac{VB^2}{H_3^2+B^2}. \end{align}

Where U_{1,2} are the (arbitrary) portions of my system I’d like to try and recover. They are m_p P and c_V V^2 respectively in truth.

The tspan for my problem is large compared to the L-V example. The system runs on a time unit of days and the dynamics I’m interested in cover 30 years (10,950 days). A normal solve with Tsit5() is practically instantaneous over that range.

I’ve set up the training set to cover 3 years, with 30 data points over that span.

Current code:

using OrdinaryDiffEq
using ModelingToolkit
using DataDrivenDiffEq
using LinearAlgebra, DiffEqSensitivity, Optim
using DiffEqFlux, Flux
using JLD2

println("Generate data")
function lake!(du, u, p, t)
    B, P, V = u
    N, ib, ip, r, H₁, H₂, H₃, H₄, cb, cp, prmax, ce, mp, rv, cv, mv = p

    FR = B^2 / (B^2 + H₄^2)

    du[1] =
        dB = ib + r * (N / (N + H₁)) * B - cb * B^2 - prmax * FR * P
    du[2] =
        dP = ip + ce * prmax * FR * P * (V / (V + H₂)) - mp * P - cp * P^2
    du[3] =
        dV = rv * V - cv * V^2 - mv * (V * B^2 / (H₃^2 + B^2))
end

# Define the experiment
datasize = 30
tspan = (0.0f0, 1095.0f0) # Full data would be 10950: 30 years in days. Currently 3 years with 30 training points
tsteps = range(tspan[1], tspan[2], length = datasize)
u0 = Float32[60.036, 0.738, 11.654] # X1
p_ = Float32[
    0.7,
    2e-5,
    2e-5,
    7.5e-3,
    0.5,
    20,
    20,
    15,
    7.5e-5,
    2.75e-4,
    5e-2,
    0.14,
    2.25e-3,
    7e-3,
    6e-5,
    7e-3,
]
prob = ODEProblem(lake!, u0, tspan, p_)
solution = solve(prob, Vern7(), abstol = 1e-12, reltol = 1e-12, maxiters = 1e7, saveat = tsteps)

# Ideal data
X = Array(solution)
# Add noise to the ideal data
Xₙ = X + Float32(1e-3) * randn(eltype(X), size(X))

# Training implementation

# Input layer (BPV), hidden layer, ouptput layer (u1, u2)
ann_L = FastChain(FastDense(3, 32, tanh), FastDense(32, 32, tanh), FastDense(32, 2))
p = initial_params(ann_L)

function dudt_(u, p, t)
    B, P, V = u
    N, ib, ip, r, H₁, H₂, H₃, H₄, cb, cp, prmax, ce, mp, rv, cv, mv = p_
    # z should be u1, u2 correct?
    z = ann_L(u, p)

    FR = B^2 / (B^2 + H₄^2)

    [ib + r * (N / (N + H₁)) * B - cb * B^2 - prmax * FR * P,
     ip + ce * prmax * FR * P * (V / (V + H₂)) - z[1] - cp * P^2,
     rv * V - z[2] - mv * (V * B^2 / (H₃^2 + B^2))]
end

prob_nn = ODEProblem(dudt_, u0, tspan, p)
# contrete_solve has been depreciated apparently, although line below fails if I just use solve at present
# Depending on initialization, this line is the one that gets a portion of the way then just stops.
# Other times it solves almost instantly.
sol_nn = concrete_solve(prob_nn, Tsit5(), u0, p, saveat = solution.t)

function predict(θ)
    Array(concrete_solve(
        prob_nn,
        Vern7(),
        u0,
        θ,
        saveat = solution.t,
        abstol = 1e-6,
        reltol = 1e-6,
        maxiters = 1e7,
        sensealg = InterpolatingAdjoint(autojacvec = ReverseDiffVJP()),
    ))
end

function loss(θ)
    pred = predict(θ)
    sum(abs2, Xₙ .- pred), pred
end

# Test - this works without issue so long as sol_nn does.
loss(p)

losses = []

callback(θ, l, pred) = begin
    push!(losses, l)
     if !isempty(losses)
        println("Current loss after $(length(losses)) iterations: $(losses[end])")
     end
    false
end

# First train with ADAM for better convergence
res1 = DiffEqFlux.sciml_train(loss, p, ADAM(0.01), cb=callback, maxiters=200)

# Rest of the script is irrelevant for us.

And here is where I run into issues. res1 never starts: the status bar just sits at Training 0%| | ETA: N/A which gives me little to go on in terms of troubleshooting - there’s a lot of packages I’m unfamiliar with and the problem could be in any of them, or more likely, my set-up.

The system doesn’t seem to be stiff. I’ve tried with AutoTsit5(Rosenbrock23()) and AutoVern7(Rodas5()) just in case - no difference.
I’ve attempted to solve this on the gpu as well, by

u0 |> gpu
p |> gpu
p_ |> gpu
res1 = DiffEqFlux.sciml_train(loss, p, ADAM(0.01), cb=callback, maxiters=200)

but that also doesn’t do much of anything.

Can you see the issue at all?

A random neural network probably doesn’t give you something stable over that huge of a time span. You’ll want to play with the initial parameters of the neural network, or use a strategy like demonstrated in https://diffeqflux.sciml.ai/dev/examples/local_minima/

1 Like

Thanks for the suggestion Chris, although I don’t even seem to be getting to the point where the local minima strategy can be applied.

I’ve edited some input conditions to give at least some change over a smaller initial time window.

datasize = 30
tspan = (0.0f0, 10.0f0)                                                                                       
tsteps = range(tspan[1], tspan[2], length = datasize)                                                                                                                                          
u0 = Float32[90.984, 10.0, 80.816]                                                            
p_ = Float32[
    0.1,
    2e-5,
    2e-5,
    7.5e-3,
    0.5,
    20,
    20,
    15,
    7.5e-5,
    2.75e-4,
    5e-2,
    0.14,
    2.25e-3,
    7e-3,
    6e-5,
    7e-3,
]

The rest of the setup code is the same.
If I run l,pred = loss(p) before the training, here’s an example output:


(lines are the initial solution, markers are predictions based on the initial random parameter values)

The result is still pretty flat, but I don’t see how this could be getting stuck in a local minima?

If I alter the callback function to plot updates:

callback(θ, l, pred) = begin
    push!(losses, l)
    display(l)
    plt = plot(tsteps[1:size(pred,2)], X[:,1:size(pred,2)]', legend = false)
    scatter!(plt, tsteps[1:size(pred,2)], pred')
    display(plot(plt))
    false
end

The res1 call never manages to even display the loss value l, let alone update the plot. Something else must be amiss yeah?

Edit:
Parameter set that isn’t as flat over the training window, still the same issue:

datasize = 30
tspan = (0.0f0, 10.0f0)
tsteps = range(tspan[1], tspan[2], length = datasize)
u0 = Float32[60.036, 0.738, 11.654]
p_ = Float32[
    0.7,
    2e-5,
    2e-5,
    7.5e-3,
    0.5,
    20,
    20,
    15,
    7.5e-3, # 7.5e-5,
    2.75e-6, # 2.75e-4,
    5e-2,
    14.0, # 0.14
    2.25e-4, #mp
    5e-2, #7e-3
    6e-7, #cv
    7e-3,
]

Initial parameters guess before sciml_train:

Screenshot_20200807_182833

Second Edit: Actually, the parameter set above worked! It just took around 28 minutes to show the first step, then converged from there in 2 minutes.

So I can move on from here - thanks!

With the help of @ChrisRackauckas’ previous answer, I’ve managed to compress my timeline to a more functional range. Using the same equations as above, my new parameter set is

# Artificially collapse time scale to an appropreate range for the NN.
τ = 300
p_ = Float32[
    0.7,
    2e-5 * τ,
    2e-5 * τ,  
    7.5e-3 * τ,
    0.5,
    20,
    20,
    15,
    7.5e-5 * τ,
    2.75e-4 * τ,
    5e-2 * τ,   
    0.14,
    2.25e-3 * τ, #mp * τ, will need to unscale later
    7e-3 * τ,
    6e-5 * τ, #cv * τ, will need to unscale later
    7e-3 * τ,
]

Which enables my training data to be a more respectable

datasize = 30
tspan = (0.0f0, 3.0f0)
tsteps = range(tspan[1], tspan[2], length = datasize)

Thus, solving the NN portion is now a straightforward step, Yielding a good fit for my unknowns:

# Ideal data
# m_p P, c_v V^2
L̄ = [p_[13] * X[2,:]';p_[15] * (X[3,:] .^ 2)']
# Neural network guess
L̂ = ann_L(Xₙ, res2.minimizer)

Screenshot_20200811_101307

Now, I’m trying to recover m_p P and c_v V^2 using SInDy, but don’t seem to be having much luck.

Here are the details you’ll need to follow along:
Pretty simple basis:

@variables u[1:3]
h = Operation[u; u.^2; u.^3; sin.(u); cos.(u); 1]
basis = Basis(h, u)

My ‘real’ data:

# Ideal data, (B, P, V)
X = Array(solution)
# Add noise to the ideal data
Xₙ = X + Float32(1e-3) * randn(eltype(X), size(X))

Float32[60.03501 58.813198 57.746056 56.817295 56.002934 55.284653 54.649925 54.08962 53.590942 53.149014 52.751167 52.39717 52.081867 51.79673 51.542362 51.31143 51.102936 50.915524 50.74641 50.58813 50.444656 50.31366 50.191795 50.080654 49.977543 49.879482 49.787895 49.698246 49.618042 49.544212; 0.7365888 0.73702586 0.7382404 0.73592335 0.73757845 0.73639107 0.73894846 0.73881716 0.73736215 0.7374129 0.7376731 0.7376398 0.73752075 0.73964417 0.74060017 0.7400208 0.74057287 0.74170345 0.74299425 0.74339664 0.7477569 0.7492879 0.7483499 0.7524558 0.7527696 0.7543931 0.7592906 0.7618538 0.76314175 0.76721525; 11.653926 11.654253 11.671077 11.692297 11.721293 11.757507 11.797537 11.843286 11.894215 11.949576 12.006296 12.067214 12.133057 12.1960335 12.265695 12.333571 12.405001 12.475596 12.548772 12.623673 12.696516 12.772361 12.844946 12.921922 12.996918 13.073674 13.146856 13.222248 13.296648 13.370284]

and the neural network guess of my two unknowns:

Float32[0.49594843 0.49605167 0.49617392 0.4962982 0.49644905 0.49659967 0.49678326 0.49696237 0.49714184 0.49734688 0.4975723 0.4978165 0.4980893 0.49843413 0.4988172 0.49922848 0.4997449 0.50036114 0.5011006 0.50196916 0.5030898 0.50432986 0.5056533 0.50740856 0.50924486 0.51142704 0.51392204 0.5166917 0.5196348 0.52292186; 2.4320884 2.4411485 2.4520636 2.464492 2.4785047 2.4943013 2.5114942 2.5303664 2.5509937 2.5731423 2.5966578 2.6217601 2.648635 2.6758575 2.7052617 2.7353282 2.7669077 2.7988124 2.8319352 2.8665643 2.9005435 2.9360812 2.9712234 3.0073447 3.043308 3.079943 3.1147962 3.150848 3.1859117 3.2197812]

Following the documentation of DataDrivenDiffEq, I attempt

opt = SR3(1e-3, 1.0)
Ψ = SInDy(Xₙ[:, 2:end], L̂[:, 2:end], basis, opt=opt, maxiter=100000, normalize=true, denoise=true)

Which fails to identify a result for Equation 1:

julia> println(Ψ)
Sparse Identification Result
No. of Parameters : 2
Active terms : 2
   Equation 1 : 0
   Equation 2 : 2
Overall error (L2-Norm) : 5.511965
   Equation 1 : 2.7072322
   Equation 2 : 2.8047326
AICC :
   Equation 1 : -3.8355591
   Equation 2 : 0.8347673

SR3{Float64,UnionAll}(0.3, 1.0, ProximalOperators.NormL1) converged after 30400 iterations.

julia> print_equations(Ψ)
1 dimensional basis in ["u₁", "u₂", "u₃"]
f_1 = u₃ ^ 2 * p₁ + u₃ ^ 3 * p₂

julia> ODESystem(Ψ)
ERROR: AssertionError: length(b) == length(variables(b))

The documentation is currently a little thin on what to do at this point. I’ve (blindly) tried different optimisers and parameters, altered the ranges in my input arrays, etc. Nothing sane in the results. In this example, Chris uses two extra parameters:

  1. λ = exp10.(-7:0.1:3), which verbatim doesn’t help my situation and I’m unsure what makes sense for my system anyhow.
  2. f_target(x, w) = iszero(x[1]) ? Inf : norm(w.*x, 2), but the f_target keyword seems to have been dropped in the current version v0.3.2 of the SInDy function. Should I be using an older version?

My questions now would be:

  • How can I help the solver identify a 2 dimensional basis rather than dropping the first term in my result?
  • How do I identify sane optimiser defaults for any given system? (should I not be using SR3 in this instance for example)?
  • What range of \lambda makes sense for my problem and how do I identify that?
  • Is f_target still relevant?

You might want to ask @Julius_Martensen how the latest Pareto stuff is going, because that should be a bit better at recovering a sparse form.

But that said, how good was the sparse form it gave you? It might be interesting to have a regularization that prefers some basis functions over others as well, biasing towards lower order or something.

1 Like

Its getting there. We might fall back onto a mix of 0.3.1 and 0.3.2 given the open PR where the user is able to provide a feature space and an evaluation metric. I found some bugs / dumb mistakes today, so it might be better of in the future.

Choosing λ is indeed hard, given that no one knows a priori the appropriate size of the parameters. So this is try and error. Additionally, this is influenced by the normalize option, which scales the parameters. SR3 tends to perform better than STRRidge if properly tuned ( you can also allow it to diverge more from the ridge regression ). I would try to perform a hyperparameter optimization around these and the optimizers.

1 Like

It looks like it did grab a sparse form though, just a different one than intended, so it could just be unidentifiable given the data.

@Libbum you might want to sample more neural network input/outputs

1 Like

Thanks for the replies guys - will keep an eye on the open issues and play around with this system further.

This was an arbitrary test anyway and I’m not completely sure my system is sound in the first place!

Good suggestions as to where to go from here, and nice to know there’s nothing silly with my approach so far at least.

I couldn’t get that far to test due to this length assertion error when attempting to create the ODESystem

Hmm, the updated tested version should be here BTW: https://github.com/SciML/DataDrivenDiffEq.jl/blob/master/test/applications/partial_lotka_volterra.jl . I think you might want to fallback to v0.3.2 until that open PR is finalized though.

1 Like

The PR is stable and on master. I will publish a new version either today or tomorrow.

1 Like

Awesome, thanks!