NeuralODE Loss function constrain optimization

Hello everybody I’m searchig to train my NeuraODE to create a surrogate model of a mechanical device.
Could you help me to figure out how can I constrain the Loss function in order to take in consideration the non linearity of the model?

The problem that I formulate is

min_{\theta \in \mathbb{R}}\;Loss Function(\tilde y(\theta),y) \\[1ex] subject\;to\qquad \tilde y \leq k

\tilde y is the NeuralODE prediction and k is a constant value.
I wish to train the NN in a way to embeded the physical limitis of the system I would to mimic.

I try to figure out with Optimization.jl but I have same problem to understand how can I pass to cons the \tilde y the prediction of the NeuralODE.

using CSV,DataFrames,DifferentialEquations,Plots,Flux,DiffEqFlux,JLD,Random,Lux,Optimization,OptimizationOptimJL
using Flux: throttle
using Statistics
using IterTools: ncycle
using DiffEqFlux: group_ranges
rng = MersenneTwister(1234) #Generatore di numeri random sempre uguali

#Costruzione Layer rappresentativo della dinamica
struct LawLayer{F1} <: Lux.AbstractExplicitLayer 
  force::Float32
  parameters::F1
end

function LawLayer(force::Float32; parameters  = Lux.glorot_uniform)
  return LawLayer{typeof(parameters)}(force,parameters) 
end

function Lux.initialparameters(rng::AbstractRNG, l::LawLayer)
  weight = l.parameters(rng,3,1)
  return (weight)
end

Lux.initialstates(::AbstractRNG, ::LawLayer) = NamedTuple()
Lux.parameterlength(::LawLayer) = 3

function (l::LawLayer)(u::AbstractArray, ps, st::NamedTuple)
    
    F = Float32(l.force)
    a, b, c = ps

    z1 = u[2] 
    z2 = F*a .- (b*u[1] + c*u[2])

    return [z1,z2], st
end


#Costruzione container per sequenza di Layer 
struct CustomLayer{L1,L2,L3,L4,L5,L6} <: Lux.AbstractExplicitContainerLayer{(:dense1, :dense2, :dynamic, :dense3, :dense4, :dense5)}
  dense1::L1
  dense2::L2
  dynamic::L3
  dense3::L4
  dense4::L5
  dense5::L6
end

function (CL::CustomLayer)(u::AbstractArray, ps, st::NamedTuple)

  y1, st_de1 = CL.dense1(u, ps.dense1, st.dense1)
  y2, st_de2 = CL.dense2(y1, ps.dense2, st.dense2)
  y, st_dy = CL.dynamic(y2, ps.dynamic, st.dynamic)
  y3, st_de3 = CL.dense3(y, ps.dense3, st.dense3)
  y4, st_de4 = CL.dense4(y3, ps.dense4, st.dense4)
  y5, st_de5 = CL.dense5(y4, ps.dense5, st.dense5)

  out = y5 #[y1[1,:];y4]
  return out, (dynamic = st_dy, dense1 = st_de1, dense2 = st_de2, dense3 = st_de3, dense4 = st_de4, dense5 = st_de5)
end

#Acquisizione dati aumentare il numero di punti 
# Provare ad utilizzare un fattore di 
t = Data.tout[1:10:end]
Errout = Data.Errout[1:10:end]
XSout = Data.XSout[1:10:end]
i_cmdout = Data.i_cmdout[1:10:end]
P12out = Data.P12out[1:10:end]
DXJout = Data.DXJout[1:10:end]
XJout = Data.XJout[1:10:end]

DXJout =  100 * DXJout # da m/s to mm/s
XJout = 100 * XJout # da m to mm
#Validation Data
DXJout_val = 100 * [Data_val1.DXJout, Data_val2.DXJout, Data_val3.DXJout]
XJout_val = 100 * [Data_val1.XJout, Data_val2.XJout, Data_val3.XJout]

DX_cap = 15*100
# datap = plot(t,XJout,label = "Data",lw = 1.8, title = "Position")
# valp = plot(t,XJout_val,label = "Data_val",lw = 1)
# datav = plot(t,DXJout,label = "Data",lw = 1.8, title = "Velocity")
# valv = plot(t,DXJout_val,label = "Data_val",lw = 1) # plot(datap,valp,datav,valv,layout = (4,1),size = (800,1200))


ϕ = [XJout DXJout]

tspan = Float32[t[1], t[end]]
x0 = Float32[XJout[1]; DXJout[1]]

act = swish
hidden_1 = 5 #cap a 50,35,25
hidden_2 = 5
hidden_3 = 5
learning_rate = 1e-3
opt =  ADAM(learning_rate,(0.9,0.8)) #learnig dinamico fare prove
opt = IPNewton()
opt = BFGS()
LOSS = []

F = Float32.([100 50 200 500])

NN = CustomLayer(Lux.Dense(2,hidden_1,act),
                    Lux.Dense(hidden_1,2,act),
                    LawLayer(F[1]),
                    Lux.Dense(2,hidden_2,act),
                    Lux.Dense(hidden_2,hidden_3),
                    Lux.Dense(hidden_3,2)) 

# NN = Lux.Chain(Lux.Dense(2,hidden_1,act),
#                     Lux.Dense(hidden_1,2,act),
#                     LawLayer(F[1]),
#                     Lux.Dense(2,hidden_2,act),
#                     Lux.Dense(hidden_2,2))

p, st = Lux.setup(rng, NN)

prob_neuralode = NeuralODE(NN, tspan, Tsit5(), saveat = t)
p_init = Lux.ComponentArray(p)

function predict_neuralode(p)
    prediction = Array(prob_neuralode(x0,p,st)[1])'
    return prediction
end
  

function loss_neuralode(p)
    ỹ = predict_neuralode(p)
    loss = Flux.mse(ϕ,ỹ) 
    return loss, ỹ
end


function cons(x,p)
  p = nothing
  y = predict_neuralode(x)[:,2]
end

cons(p_init)


#callback function
cb = function(p,l,y_tilde)
    train = plot(t,ϕ[:,1:2],layout = (2,1),size = (800,800),linewidth = 1.25,label = ["position" "speed"],
                grid=:y, gridwidth=1.5, xlabel = "time", yguidefontrotation = 90,aspect_ration = 1)
    Plots.scatter!(t,y_tilde[:,1:2],label = ["position_NN" "speed_NN"])
    push!(LOSS,l)
    display(train)
    println("LOSS: ", LOSS[end])
    if l <= 2e-8
        flag = true
    else
        flag = false
    end
return flag
end


adtype = Optimization.AutoZygote()
adtype = Optimization.AutoForwardDiff()
optf = Optimization.OptimizationFunction((x, p) -> loss_neuralode(x), adtype; cons = cons)
optprob = Optimization.OptimizationProblem(optf, p_init, lcons = [-Inf], ucons = [DX_cap])
result_neuralode = Optimization.solve(optprob,opt,callback = throttle(cb,20),maxiters = 1000)

See the tutorial in Optimization.jl on using equality and inequality constraints:

https://docs.sciml.ai/Optimization/stable/tutorials/constraints/

Thank you @ChrisRackauckas I saw this tutorial but I can’t figure out how to implement the constraint in my Loss Function formulation.

NN = CustomLayer(Lux.Dense(2,hidden_1,act),
                    Lux.Dense(hidden_1,2,act),
                    LawLayer(F[1]),
                    Lux.Dense(2,hidden_2,act),
                    Lux.Dense(hidden_2,hidden_3),
                    Lux.Dense(hidden_3,2)) 

p, st = Lux.setup(rng, NN)

prob_neuralode = NeuralODE(NN, tspan, Tsit5(), saveat = t)
p_init = Lux.ComponentArray(p)

function predict_neuralode(p)
    prediction = Array(prob_neuralode(x0,p,st)[1])'
    return prediction
end

function loss_neuralode(p)
    ỹ = predict_neuralode(p)
    loss = Flux.mse(ϕ,ỹ) 
    return loss, ỹ
end

In this configuration:

cons = (res, x, p) -> res = predict_neuralode(x)[:,2]

adtype = Optimization.AutoForwardDiff()
optf = Optimization.OptimizationFunction((x, p) -> loss_neuralode(x), adtype; cons = cons)
optprob = Optimization.OptimizationProblem(optf, p_init, lcons = [-Inf], ucons = [DX_cap])
result_neuralode = Optimization.solve(optprob,opt_c,callback = cb ,maxiters = 100)

The error is:

ERROR: MethodError: no method matching Optim.BarrierLineSearchGrad(::Vector{Float32}, ::Matrix{Float32}, ::Optim.BarrierStateVars{Float64}, ::Optim.BarrierStateVars{Float64})

So I decided to convert the NN prediction into Float32

function predict_neuralode(p)
    prediction = Float32.(Array(prob_neuralode(x0,p,st)[1])')
    return prediction
end

function loss_neuralode(p)
    ỹ = predict_neuralode(p)
    loss = Flux.mse(ϕ,ỹ) 
    return loss, ỹ
end

In this case the error became:

ERROR: Cannot determine ordering of Dual tags Nothing and ForwardDiff.Tag{Optimization.var"#89#106"{OptimizationFunction{true, Optimization.AutoForwardDiff{nothing}, var"#31#32", Nothing, Nothing, Nothing, var"#29#30", Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, SciMLBase.NullParameters}, Float32} 

The used libraries:

using CSV,DataFrames,DifferentialEquations,Plots,Flux,DiffEqFlux,JLD,Random,Lux,Optimization,OptimizationOptimJL,ForwardDiff
using Flux: throttle
using Statistics
using IterTools: ncycle
using DiffEqFlux: group_ranges

Any suggestions?

Your code isn’t runnable. But did you try using IPOPT?

No, I’m trying to use Optimization.jl and the tutorial you shared.

Here a meaningless formulation of my code.

using CSV,DataFrames,DifferentialEquations,Plots,Flux,DiffEqFlux,JLD,Random,Lux,Optimization,OptimizationOptimJL,ForwardDiff
using Flux: throttle
using Statistics
using IterTools: ncycle
using DiffEqFlux: group_ranges
rng = MersenneTwister(1234)
#Costruzione Layer rappresentativo della dinamica
struct LawLayer{F1} <: Lux.AbstractExplicitLayer 
  force::Float64
  parameters::F1
end

function LawLayer(force::Float64; parameters  = Lux.glorot_uniform)
  return LawLayer{typeof(parameters)}(force,parameters) 
end

function Lux.initialparameters(rng::AbstractRNG, l::LawLayer)
  weight = l.parameters(rng,3,1)
  return (weight)
end

Lux.initialstates(::AbstractRNG, ::LawLayer) = NamedTuple()
Lux.parameterlength(::LawLayer) = 3

function (l::LawLayer)(u::AbstractArray, ps, st::NamedTuple)
    
    F = Float64(l.force)
    a, b, c = ps

    z1 = u[2] 
    z2 = F*a .- (b*u[1] + c*u[2])

    return [z1,z2], st
end


#Costruzione container per sequenza di Layer 
struct CustomLayer{L1,L2,L3,L4,L5,L6} <: Lux.AbstractExplicitContainerLayer{(:dense1, :dense2, :dynamic, :dense3, :dense4, :dense5)}
  dense1::L1
  dense2::L2
  dynamic::L3
  dense3::L4
  dense4::L5
  dense5::L6
end

function (CL::CustomLayer)(u::AbstractArray, ps, st::NamedTuple)

  y1, st_de1 = CL.dense1(u, ps.dense1, st.dense1)
  y2, st_de2 = CL.dense2(y1, ps.dense2, st.dense2)
  y, st_dy = CL.dynamic(y2, ps.dynamic, st.dynamic)
  y3, st_de3 = CL.dense3(y, ps.dense3, st.dense3)
  y4, st_de4 = CL.dense4(y3, ps.dense4, st.dense4)
  y5, st_de5 = CL.dense5(y4, ps.dense5, st.dense5)

  out = y5 #[y1[1,:];y4]
  return out, (dynamic = st_dy, dense1 = st_de1, dense2 = st_de2, dense3 = st_de3, dense4 = st_de4, dense5 = st_de5)
end

x0 = [0.0,2.0]
tspan = (0.0f0,10.0f0)
t = range(tspan[1],tspan[2],length = 101)

rng = Random.default_rng()
function Simple_pendulum(du,u,p,t)
    L = 0.5#m
    g = 9.81 #m=s^2
    m = 0.1 #Kg
    c = 1.1

    du[1] = u[2] #du[1] angular velocity - u[1] angular position
    du[2] = -(u[2]*c)-(g/L)*sin(u[1]) #du[2] = angular acceleration - u[2] angular velocity
    nothing
end

prob = ODEProblem(Simple_pendulum,x0,tspan)
ϕ = Array(solve(prob,Tsit5(),saveat = t))

DX_cap = Float64(maximum(ϕ))/2

tspan = Float64[t[1], t[end]]

act = swish
hidden_1 = 5 #cap a 50,35,25
hidden_2 = 5
hidden_3 = 5
learning_rate = 1e-3
opt =  ADAM(learning_rate,(0.9,0.8)) #learnig dinamico fare prove
opt_c = IPNewton()
opt2 = BFGS()
LOSS = []

F = Float64.([100 50 200 500])

NN = CustomLayer(Lux.Dense(2,hidden_1,act),
                    Lux.Dense(hidden_1,2,act),
                    LawLayer(F[1]),
                    Lux.Dense(2,hidden_2,act),
                    Lux.Dense(hidden_2,hidden_3),
                    Lux.Dense(hidden_3,2)) 

p, st = Lux.setup(rng, NN)

prob_neuralode = NeuralODE(NN, tspan, Tsit5(), saveat = t)
p_init = Lux.ComponentArray(p)

function predict_neuralode(p)
    prediction = Float32.(Array(prob_neuralode(x0,p,st)[1]))
    return prediction
end

function loss_neuralode(p)
    ỹ = predict_neuralode(p)
    loss = Flux.mse(ϕ,ỹ) 
    return loss, ỹ
end

#callback function
cb = function(p,l,y_tilde)
    train = plot(t,ϕ[:,1:2],layout = (2,1),size = (800,800),linewidth = 1.25,label = ["position" "speed"],
                grid=:y, gridwidth=1.5, xlabel = "time", yguidefontrotation = 90,aspect_ration = 1)
    Plots.scatter!(t,y_tilde[:,1:2],label = ["position_NN" "speed_NN"])
    push!(LOSS,l)
    display(train)
    println("LOSS: ", LOSS[end])
    if l <= 2e-8
        flag = true
    else
        flag = false
    end
return flag
end

cons = (res, x, p) -> res = predict_neuralode(x)[2,:]

Float32.(predict_neuralode(p_init)[2,:])

adtype = Optimization.AutoForwardDiff()
optf = Optimization.OptimizationFunction((x, p) -> loss_neuralode(x), adtype; cons = cons)
optprob = Optimization.OptimizationProblem(optf, p_init, lcons = [-Inf], ucons = [DX_cap])
result_neuralode = Optimization.solve(optprob,opt_c,callback = cb ,maxiters = 100)

println("FIRST TRAINING COMPLETE!")

Thank a lot in advance for your time!

Doing this an ForwardDiff is not going to combine well because you’re stripping duals. Also, your entire rest of the code is in Float32. What happens when you remove this?

I would highly recommend making your whole setup either Float32 or Float64, right now you’re mixing them.

So in case I aligned all the code into Float64 the error become:

ERROR: MethodError: no method matching Optim.BarrierLineSearchGrad(::Vector{Float32}, ::Matrix{Float32}, ::Optim.BarrierStateVars{Float64}, ::Optim.BarrierStateVars{Float64})

If I convert in Float32 the error is the previous one.

The mixing value format is an attempt to figure out the missmatch error

That’s because it’s half Float32 and half Float64. Just make all of the computations one or the other.

Yes, I saw!
When align the type computation, the problem is as follow:

ERROR: Cannot determine ordering of Dual tags Nothing and ForwardDiff.Tag{Optimization.var"#89#106"{OptimizationFunction{true, Optimization.AutoForwardDiff{nothing}, var"#38#39", Nothing, Nothing, Nothing, var"#36#37", Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, SciMLBase.NullParameters}, Float32} 

I don’t understand your comment below about it.
Could you please explain again where I’m wrong?

Make every number in your code Float64, or make every number Float32.

Did it but nothing change.
The error is the same as before:

ERROR: Cannot determine ordering of Dual tags Nothing and ForwardDiff.Tag{Optimization.var"#89#106"{OptimizationFunction{true, Optimization.AutoForwardDiff{nothing}, var"#38#39", Nothing, Nothing, Nothing, var"#36#37", Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, SciMLBase.NullParameters}, Float32}

What is your current code? And you removed the conversion to Float in the loss right?

1 Like

So using for all variables Float32 the error is like the previous one (removing Float32 in loss function):

ERROR: MethodError: no method matching Optim.BarrierLineSearchGrad(::Vector{Float32}, ::Matrix{Float32}, ::Optim.BarrierStateVars{Float64}, ::Optim.BarrierStateVars{Float64})

but I can’t figure out what I’ve missed.
So I decided to reformulate all the code using Float64 and in this case this error occurs:

ERROR: Detected non-constant types in an out-of-place ODE solve, i.e. for
du = f(u,p,t) we see typeof(du) !== typeof(u/t). This is not
supported by OrdinaryDiffEq.jl’s solvers. Please either make f
type-constant (i.e. typeof(du) === typeof(u/t)) or use the mutating
in-place form f(du,u,p,t) (which is type-constant by construction).

Searching I found this error is probably related to the same cause, mismatching between variables type but also in this case I can’t understad.

Here my actual code, training dataset (DXJout,
XJout) cames from a .csv you can substitute with no matter what:

using CSV,DataFrames,DifferentialEquations,Plots,Flux,DiffEqFlux,JLD,Random,Lux,Optimization,OptimizationOptimJL,ForwardDiff
using Flux: throttle
using Statistics
using IterTools: ncycle
using DiffEqFlux: group_ranges
rng = MersenneTwister(1234) #Generatore di numeri random sempre uguali

#Costruzione Layer rappresentativo della dinamica
struct LawLayer{F1} <: Lux.AbstractExplicitLayer 
  force::Float64
  parameters::F1
end

function LawLayer(force::Float64; parameters  = Lux.glorot_uniform)
  return LawLayer{typeof(parameters)}(force,parameters) 
end

function Lux.initialparameters(rng::AbstractRNG, l::LawLayer)
  weight = l.parameters(rng,3,1)
  return (weight)
end

Lux.initialstates(::AbstractRNG, ::LawLayer) = NamedTuple()
Lux.parameterlength(::LawLayer) = 3

function (l::LawLayer)(u::AbstractArray, ps, st::NamedTuple)
    
    F = Float64(l.force)
    a, b, c = ps

    z1 = u[2] 
    z2 = F*a .- (b*u[1] + c*u[2])

    return [z1,z2], st
end


#Costruzione container per sequenza di Layer 
struct CustomLayer{L1,L2,L3,L4,L5,L6} <: Lux.AbstractExplicitContainerLayer{(:dense1, :dense2, :dynamic, :dense3, :dense4, :dense5)}
  dense1::L1
  dense2::L2
  dynamic::L3
  dense3::L4
  dense4::L5
  dense5::L6
end

function (CL::CustomLayer)(u::AbstractArray, ps, st::NamedTuple)

  y1, st_de1 = CL.dense1(u, ps.dense1, st.dense1)
  y2, st_de2 = CL.dense2(y1, ps.dense2, st.dense2)
  y, st_dy = CL.dynamic(y2, ps.dynamic, st.dynamic)
  y3, st_de3 = CL.dense3(y, ps.dense3, st.dense3)
  y4, st_de4 = CL.dense4(y3, ps.dense4, st.dense4)
  y5, st_de5 = CL.dense5(y4, ps.dense5, st.dense5)

  out = y5 #[y1[1,:];y4]
  return out, (dynamic = st_dy, dense1 = st_de1, dense2 = st_de2, dense3 = st_de3, dense4 = st_de4, dense5 = st_de5)
end

Data = CSV.read("C:/Users/Antonino Gandolfo/Documents/Tesi/Script/Servo/Data/simout_1.00000_run_MJ100.csv",DataFrame,types = Float64)
Data_val1 = CSV.read("C:/Users/Antonino Gandolfo/Documents/Tesi/Script/Servo/Data/simout_1.00000_run_MJ50.csv",DataFrame,types = Float64)
Data_val2 = CSV.read("C:/Users/Antonino Gandolfo/Documents/Tesi/Script/Servo/Data/simout_1.00000_run_MJ200.csv",DataFrame,types = Float64)
Data_val3 = CSV.read("C:/Users/Antonino Gandolfo/Documents/Tesi/Script/Servo/Data/simout_1.00000_run_MJ500.csv",DataFrame,types = Float64)


#Acquisizione dati aumentare il numero di punti 
# Provare ad utilizzare un fattore di 
t = Data.tout[1:10:end]
Errout = Data.Errout[1:10:end]
XSout = Data.XSout[1:10:end]
i_cmdout = Data.i_cmdout[1:10:end]
P12out = Data.P12out[1:10:end]
DXJout = Data.DXJout[1:10:end]
XJout = Data.XJout[1:10:end]

DXJout =  100 * DXJout # da m/s to mm/s
XJout = 100 * XJout # da m to mm
#Validation Data
DXJout_val = 100 * [Data_val1.DXJout, Data_val2.DXJout, Data_val3.DXJout]
XJout_val = 100 * [Data_val1.XJout, Data_val2.XJout, Data_val3.XJout]

ϕ = [XJout DXJout]
DX_cap = maximum(ϕ)
tspan = [t[1], t[end]]
x0 = [XJout[1]; DXJout[1]]

act = swish
hidden_1 = 5 #cap a 50,35,25
hidden_2 = 5
hidden_3 = 5
learning_rate = Float64(1e-3)
opt =  ADAM(learning_rate,(0.9,0.8)) 
opt_c = IPNewton()
opt2 = BFGS()
LOSS = []

F = Float64.([100 50 200 500])

# NN = CustomLayer(Lux.Dense(2,hidden_1,act),
#                     Lux.Dense(hidden_1,2,act),
#                     LawLayer(F[1]),
#                     Lux.Dense(2,hidden_2,act),
#                     Lux.Dense(hidden_2,hidden_3),
#                     Lux.Dense(hidden_3,2)) 

NN = Lux.Chain(Lux.Dense(2,10,act),
              Lux.Dense(10,2))

p, st = Lux.setup(rng, NN)
p_init = Lux.ComponentArray(p)

prob_neuralode = NeuralODE(NN, tspan, Tsit5(), saveat = t)
#prob_neuralode = ODEProblem((u,p,t)->NN(u,p,st)[1], x0, tspan, p_init, saveat = t)

function predict_neuralode(p)
  Array(prob_neuralode(x0,p,st)[1])
end

function loss_neuralode(p)
    ỹ = predict_neuralode(p)
    loss = sum(abs2,ϕ-ỹ')
    return loss,ỹ
end

loss,y = loss_neuralode(p_init) 
#callback function
cb = function(p,l,y_tilde)
    train = plot(t,ϕ[:,1:2],layout = (2,1),size = (800,800),linewidth = 1.25,label = ["position" "speed"],
                grid=:y, gridwidth=1.5, xlabel = "time", yguidefontrotation = 90,aspect_ration = 1)
    Plots.scatter!(t,y_tilde[:,1:2],label = ["position_NN" "speed_NN"])
    push!(LOSS,l)
    display(train)
    println("LOSS: ", LOSS[end])
    if l <= Float64(2e-8)
        flag = true
    else
        flag = false
    end
return flag
end

cons = (res, x, p) -> res = predict_neuralode(x)[2,:]

#adtype = Optimization.AutoZygote()
adtype = Optimization.AutoForwardDiff()
optf = Optimization.OptimizationFunction((x, p) -> loss_neuralode(x), adtype; cons = cons)
optprob = Optimization.OptimizationProblem(optf, p_init, lcons = [-Inf], ucons = [DX_cap])
result_neuralode = Optimization.solve(optprob,opt_c,maxiters = 100)

println("FIRST TRAINING COMPLETE!")

I aligned all the variables to FLoat32.
Now the error became:

ERROR: MethodError: no method matching Optim.IPNewtonState(::ComponentArrays.ComponentVector{Float32, Vector{Float32}, Tuple{ComponentArrays.Axis{(layer_1 = ViewAxis(1:15, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_2 = ViewAxis(16:27, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))), layer_3 = ViewAxis(28:30, ShapedAxis((3, 1), NamedTuple())), layer_4 = ViewAxis(31:45, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_5 = ViewAxis(46:57, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))))}}}, ::Float32, ::ComponentArrays.ComponentVector{Float32, Vector{Float32}, Tuple{ComponentArrays.Axis{(layer_1 = ViewAxis(1:15, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_2 = ViewAxis(16:27, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))), layer_3 = ViewAxis(28:30, ShapedAxis((3, 1), NamedTuple())), layer_4 = ViewAxis(31:45, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_5 = ViewAxis(46:57, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))))}}}, ::ComponentArrays.ComponentVector{Float32, Vector{Float32}, Tuple{ComponentArrays.Axis{(layer_1 = ViewAxis(1:15, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_2 = ViewAxis(16:27, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))), layer_3 = ViewAxis(28:30, ShapedAxis((3, 1), NamedTuple())), layer_4 = ViewAxis(31:45, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_5 = ViewAxis(46:57, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))))}}}, ::Float32, ::ComponentArrays.ComponentMatrix{Float32, Matrix{Float32}, Tuple{ComponentArrays.Axis{(layer_1 = ViewAxis(1:15, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_2 = ViewAxis(16:27, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))), layer_3 = ViewAxis(28:30, ShapedAxis((3, 1), NamedTuple())), layer_4 = ViewAxis(31:45, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_5 = ViewAxis(46:57, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))))}, ComponentArrays.Axis{(layer_1 = ViewAxis(1:15, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_2 = ViewAxis(16:27, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))), layer_3 = ViewAxis(28:30, ShapedAxis((3, 1), NamedTuple())), layer_4 = ViewAxis(31:45, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_5 = ViewAxis(46:57, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))))}}}, ::Int64, ::Vector{Int8}, ::ComponentArrays.ComponentVector{Float32, Vector{Float32}, Tuple{ComponentArrays.Axis{(layer_1 = ViewAxis(1:15, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_2 = ViewAxis(16:27, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))), layer_3 = ViewAxis(28:30, ShapedAxis((3, 1), NamedTuple())), layer_4 = ViewAxis(31:45, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_5 = ViewAxis(46:57, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))))}}}, ::Float32, ::Float32, ::Float32, ::Float32, ::Optim.BarrierStateVars{Float32}, ::Optim.BarrierStateVars{Float32}, ::Optim.BarrierStateVars{Float32}, ::Vector{Float32}, ::Matrix{Float32}, ::Float32, ::ComponentArrays.ComponentVector{Float32, Vector{Float32}, Tuple{ComponentArrays.Axis{(layer_1 = ViewAxis(1:15, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_2 = ViewAxis(16:27, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))), layer_3 = ViewAxis(28:30, ShapedAxis((3, 1), NamedTuple())), layer_4 = ViewAxis(31:45, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_5 = ViewAxis(46:57, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))))}}}, ::Float32, ::Optim.BarrierLineSearchGrad{Float32}, ::ComponentArrays.ComponentVector{Float32, Vector{Float32}, Tuple{ComponentArrays.Axis{(layer_1 = ViewAxis(1:15, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_2 = ViewAxis(16:27, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))), layer_3 = ViewAxis(28:30, ShapedAxis((3, 1), NamedTuple())), layer_4 = ViewAxis(31:45, Axis(weight = ViewAxis(1:10, ShapedAxis((5, 2), NamedTuple())), bias = ViewAxis(11:15, ShapedAxis((5, 1), NamedTuple())))), layer_5 = ViewAxis(46:57, Axis(weight = ViewAxis(1:10, ShapedAxis((2, 5), NamedTuple())), bias = ViewAxis(11:12, ShapedAxis((2, 1), NamedTuple())))))}}}, ::Int64)
Closest candidates are:
  Optim.IPNewtonState(::Tx, ::T, ::Tx, ::Tx, ::T, ::Matrix{T}, ::Any, ::Vector{Int8}, ::Tx, ::T, ::T, ::T, ::T, ::Optim.BarrierStateVars{T}, ::Optim.BarrierStateVars{T}, ::Optim.BarrierStateVars{T}, ::Vector{T}, ::Matrix{T}, ::T, ::Tx, ::T, ::Optim.BarrierLineSearchGrad{T}, ::Tx, ::Any) where {T, Tx} at C:\Users\Antonino Gandolfo\.julia\packages\Optim\tP8PJ\src\multivariate\solvers\constrained\ipnewton\ipnewton.jl:53

It’s possible that the NeuralODE formulation is not compatible with constrained optimization?

Make the state of the optimization problem be just a plain vector and reinterpret it into the componentarray. We can probably make that automatic.

As I undestand I need to treat the state of my optimization problem (the weights of my NeuralODE) as a component array for the DiffEqFlux.jl in order to update it as always.

NN = Lux.Chain(Lux.Dense(2,hidden_1,act),
                    Lux.Dense(hidden_1,2,act),
                    LawLayer(F[1]),
                    Lux.Dense(2,hidden_2,act),
                    Lux.Dense(hidden_2,2))


p, st = Lux.setup(rng, NN)
p_init = Lux.ComponentArray(p)

prob_neuralode = ODEProblem((u,p,t)->NN(u,p,st)[1], x0, tspan, p_init, saveat = t)

function predict_neuralode(p)
  Array(solve(prob_neuralode,Tsit5(),p = p))
  #Array(prob_neuralode(x0,p,st)[1])
end

function loss_neuralode(p)
    ỹ = predict_neuralode(p)
    loss = sum(abs2,ϕ-ỹ')
    return loss,ỹ
end
adtype = Optimization.AutoForwardDiff()
optf = Optimization.OptimizationFunction((x, p) -> loss_neuralode(x), adtype)

At the same time I need to rearrange p_init = vcat(p_init) as vector to pass it for the constraint optimization?

optprob = Optimization.OptimizationProblem(optf, p_init, lcons = [-10.f0,-10.f0], ucons = [10.f0,10f0])
result_neuralode = Optimization.solve(optprob,opt_c,maxiters = 100)

How could I pass the two formulation of the optimization’s state to the same OptimizationProblem?
I’ve got the point or there is something I miss?

What I mean is (untested code):

NN = Lux.Chain(Lux.Dense(2,hidden_1,act),
                    Lux.Dense(hidden_1,2,act),
                    LawLayer(F[1]),
                    Lux.Dense(2,hidden_2,act),
                    Lux.Dense(hidden_2,2))


p, st = Lux.setup(rng, NN)
p_init = p

prob_neuralode = ODEProblem((u,p,t)->NN(u,p,st)[1], x0, tspan, p_init, saveat = t)

function predict_neuralode(p)
  Array(solve(prob_neuralode,Tsit5(),p = Lux.ComponentArray(p)))
end

function loss_neuralode(p)
    ỹ = predict_neuralode(p)
    loss = sum(abs2,ϕ-ỹ')
    return loss,ỹ
end
adtype = Optimization.AutoForwardDiff()
optf = Optimization.OptimizationFunction((x, p) -> loss_neuralode(x), adtype)
optprob = Optimization.OptimizationProblem(optf, p_init, lcons = [-10.f0,-10.f0], ucons = [10.f0,10f0])
result_neuralode = Optimization.solve(optprob,opt_c,maxiters = 100)

The optimization is just a normal array, but then it’s a ComponentArray in the cost.

Ok it is as I understood.
This implementation dosen’t work, the Lux.setup returns NamedTuple as output.

Moreover I think this is not the right way to proced. As before what I’m trying to do is an optimization of the NeuralODE parameters in such a way the NeuralODE solution:

function predict_neuralode(p)
Array(solve(prob_neuralode,Tsit5(),p = Lux.ComponentArray(p)))
end

respect both the observed dynamic (dataset):

function loss_neuralode(p)
ỹ = predict_neuralode(p)
loss = sum(abs2,ϕ-ỹ’)
return loss,ỹ
end

and a constraint physical condition:

cons = (res,x,p) → res = predict_neuralode(x)

As far as I understand in this way what we are try to resolve is to pass the NeuralODE parameters to the Optimization function as ComponentArray:

optf = Optimization.OptimizationFunction((x, p) → loss_neuralode(x), adtype; cons = cons)

At the same time the NeuralODE parameters has to be converted in a plain vector in such a way to submit they as input in the box optimization problem and then re-converted as ComponentArray of NamedTuple for:

cons = (res,x,p) -> res = predict_neuralode(x)

Probably I missed up sumthing in the process, I can’t figured out to this problem formulation.

Maybe there is an easiest way code a problem like that:

\begin{aligned} &\min_{\theta} Loss Function(\tilde y(\theta),y)\\ &subject\; to\; \tilde y \le k \qquad k = constant \end{aligned}

@avikpal do you have some primitive for making the conversion to named tuples nicer here?

If the problem is resolved by having the parameters be a flat vector, then using CAs is probably the best bet.

using ComponentArrays

ps = ComponentArray(ps_named_tuple)
ax = getaxes(ps)
p = getdata(ps)

function predict_neuralode(p)
     Array(solve(prob_neuralode,Tsit5(),p = ComponentArray(p, ax)))
end