Negative loss when doing parameter estimation of ODEs

When doing parameter estimation of ODEs using the optim package I’m seeing negative losses returned by the optimizer.

I set up a simple example which reproduces it, and I can’t see anything strange I’m doing that might be causing this. Is this intended behavior? I cannot find anything about negative losses in the documentation.

A problem I’m assuming is related is that after enough iterations the loss will suddenly jump up to a higher value and stay there. Example below

exmaple_loss

Any advice would be appreciated

using Logging
using DifferentialEquations, Optimization, OptimizationOptimJL, SciMLSensitivity
using Plots

# This is a minimal example of a problem I'm having where in doing parameter 
# estimation the optimizer returns negative loss values which should be impossible.

function test_ode!(du, u, p, t)
    ϵ,μ=p 
    S,E= u 
    #dS/dt
    du[1] = -μ*S +ϵ*E 
    #dE/dt
    du[2] = μ*S - ϵ*E 
end
function estimate_parameters()
    # Data for comparison
    E_data = [836154,776521,769818, 743232, 728789]

    u0_initial = [3_100_000.0,500_000.0]
    u0_bounds =[(3_000_000,4_000_000),(100_000,1_000_000),]
    params_initial = [0.1,  0.05]
    params_bounds = [(0,1),(0,1)]
    p_bounds = vcat(u0_bounds,params_bounds)
    tspan = (0.0, 70.0)
    prob = ODEProblem(test_ode!, u0_initial, tspan, params_initial)

    function loss_function(p, prob)
        u0_current = vcat(p[1:length(u0_initial)],0)
        params_current = p[length(u0_initial)+1:end] 
        _prob = remake(prob, u0=u0_current,p=params_current)
        sol = solve(_prob)
        E_sol = sol([0,12,24,36,48])[2,:]
        return sum(abs2, E_sol .- E_data)
    end

    loss_history = Float64[]
    function callback(p,l)
        @show l
        push!(loss_history,l)
        display(plot(loss_history))
        return false
    end
    #3. Set up optimization
    opt_f = OptimizationFunction(loss_function, Optimization.AutoForwardDiff())
    p_init =  vcat(u0_initial,params_initial)
    opt_prob = OptimizationProblem(opt_f, p_init, prob; lb=first.(p_bounds), ub=last.(p_bounds))
    return solve(opt_prob,LBFGS(); callback = callback) 
end
estimate_parameters()

Try @show p, sum(abs2, E_sol .- E_data) in the loss function and see if you can find the point that is odd, and then use that to isolate the problem.

Here’s some of the output of showing p and sum(abs2, E_sol .- E_data)in the loss function. When a negative loss is detected in the call back, I also @show p and @show l

p = [3.4604209116536845e6, 830465.4297554487, 0.6664641536533832, 0.14220388808557272]
sum(abs2, E_sol .- E_data) = 1.5369843438188548e9
p = ForwardDiff.Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}, Float64, 4}[Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(3.4604209116536845e6,1.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(830465.4297554487,0.0,1.0,0.0,0.0), Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(0.6664641536533831,0.0,0.0,1.0,0.0), Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(0.14220388808557272,0.0,0.0,0.0,1.0)]
sum(abs2, E_sol .- E_data) = Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(1.5430743229291186e9,-56.35917163754493,-11391.753197848067,4.808266626457977e8,-1.2197653749892578e9)
p = [3.4604209116536845e6, 830465.4297554487, 0.6664641536533831, 0.14220388808557272]
sum(abs2, E_sol .- E_data) = 1.536984373201887e9
p = ForwardDiff.Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}, Float64, 4}[Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(3.4604209116536845e6,1.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(830465.4297554487,0.0,1.0,0.0,0.0), Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(0.6664641536533832,0.0,0.0,1.0,0.0), Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(0.14220388808557272,0.0,0.0,0.0,1.0)]
sum(abs2, E_sol .- E_data) = Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(1.5430743229312682e9,-56.35917163425438,-11391.753197842045,4.808266627301178e8,-1.2197653747655334e9)
p = [3.4604209116536845e6, 830465.4297554487, 0.6664641536533832, 0.14220388808557272]
sum(abs2, E_sol .- E_data) = 1.5369843438188548e9
┌ Warning: Negative loss encountered!
└ @ Main ~/Projects/Academics/DrugEpidemiologyProject/src/ParameterEstimation/paraesttest.jl:42
p = Optimization.OptimizationState{Vector{Float64}, Float64, Vector{Float64}, Nothing, OptimizationState{Float64, LBFGS{Optim.InverseDiagonal, LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Optim.var"#56#57"{Vector{Int64}, Vector{Int64}, Fminbox{LBFGS{Nothing, LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Returns{Nothing}}, Float64, Optim.var"#39#41"}, Optim.BarrierWrapper{OnceDifferentiable{Float64, Vector{Float64}, Vector{Float64}}, Optim.BoxBarrier{Vector{Int64}, Vector{Int64}}, Float64, Float64, Vector{Float64}}}}}, Nothing}(27, [3.4604209116536845e6, 830465.4297554487, 0.6664641536533832, 0.14220388808557272], -1.1195267115541757e11, [-807.0098955840394, -720.7887640640856, 4.009274384657078e9, -1.5040220472508835e10], nothing,     27    -1.119527e+11     1.504022e+10
 * Current step size: 0.0017956504787880272
 * time: 8.30139684677124
 * g(x): [-807.0098955840394, -720.7887640640856, 4.009274384657078e9, -1.5040220472508835e10]
 * x: [3.4604209116536845e6, 830465.4297554487, 0.6664641536533832, 0.14220388808557272]
, nothing)
l = -1.1195267115541757e11
p = ForwardDiff.Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}, Float64, 4}[Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(3.4673319406535e6,1.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(830125.9104395622,0.0,1.0,0.0,0.0), Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(0.6232680306192389,0.0,0.0,1.0,0.0), Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(0.13276837272742242,0.0,0.0,0.0,1.0)]
sum(abs2, E_sol .- E_data) = Dual{ForwardDiff.Tag{var"#loss_function#19"{Vector{Float64}, Vector{Int64}}, Float64}}(1.5436740138958683e9,145.3223823686585,-11914.167805943318,-8.751885839238205e8,3.828618506728241e9)
p = [3.4673319406535e6, 830125.9104395622, 0.6232680306192389, 0.13276837272742242]
sum(abs2, E_sol .- E_data) = 1.5379024280406404e9

So you can see that in the loss and parameters shown from the loss function are fine, but when looking at the loss that’s passed to the callback we see the negative value. In the OptimizationState there are also a few negative values printed e.g (27, [3.4604209116536845e6, 830465.4297554487, 0.6664641536533832, 0.14220388808557272], -1.1195267115541757e11, [-807.0098955840394, -720.7887640640856, 4.009274384657078e9, -1.5040220472508835e10], nothing, 27 -1.119527e+11 1.504022e+10 but I’m not quite sure what those numbers actually are. I took a look at the source here on line 132 which I believe is passing Optimization.OptimizationState(u = θ, objective = x[1], p = cache.p, iter = iter_count[]) to the callback function which is subsequently printed out here. Not seeing the relationship between that and the output though.

My guess is those negative values are the possible next step in the optimization but they are out of bounds? Having trouble navigating the source code to verify this though. I’ll also add that solving the same problem without bounds doesn’t give this negative loss.

Looking at line 132 again in Optimization.jl/src/lbfgsb.jl is even more confusing.

_loss = function (θ)
            x = cache.f(θ, cache.p)
            iter_count[] += 1
            cons_tmp .= zero(eltype(θ))
            cache.f.cons(cons_tmp, θ)
            cons_tmp[eq_inds] .= cons_tmp[eq_inds] - cache.lcons[eq_inds]
            cons_tmp[ineq_inds] .= cons_tmp[ineq_inds] .- cache.ucons[ineq_inds]
            opt_state = Optimization.OptimizationState(
                u = θ, objective = x[1], p = cache.p, iter = iter_count[])
            if cache.callback(opt_state, x...)
                error("Optimization halted by callback.")
            end
            return x[1] + sum(@. λ * cons_tmp[eq_inds] + ρ / 2 * (cons_tmp[eq_inds] .^ 2)) + 1 / (2 * ρ) * sum((max.(Ref(0.0), μ .+ (ρ .* cons_tmp[ineq_inds]))) .^ 2)
        end

The objective that is returned to the callback should just be the output of my loss_function. I don’t see anywhere it could be modified to return a negative value.

Is this only an issue with LBFGSB?

Seems like only both LBFGSB and BFGSB. Using simulated annealing or CMA-ES doesn’t give the negative loss, atleast in this example.

I recommend using OptimizationNLopt LBFGS or OptimizationOptimJL LBFGS then

Open an issue with this? It does seem related to doman handling.