# BFGS very slow compared to BlackBoxOptim - How to improve performance

I am trying to fit a differential equation (using DiffEqParamEstim) with these properties:

• Lumped parameter linear thermal model with 3 states
• 8 interpolated external inputs (sampled at 50 Hz) using Interpolations.jl objects (shown below with `_itp` suffix)
• 8 parameters to fit
• Long simulation time (data length - 600s) - runs in 0.01 secs

I have pasted a snippet of the code below for the differential equation. The variables not mentioned in the function are all constant floats enclosed in a wrapper function. (apologies for the incomplete code, I tried to make a standalone example but couldnât replicate the slow behavior)

``````function thermal_model!(du, u, p, t)
# scaling parameters
p_natconv, p_carcass2air, p_tread2carcass, p_air2ambient = p
fxtire = fxtire_itp(t)
fytire = fytire_itp(t)
fztire = fztire_itp(t)
vx = vx_itp(t)
alpha = alpha_itp(t)
kappa = kappa_itp(t)
h_splitter = h_splitter_itp(t)

# arc length of tread area

q_friction = p_friction * vx * (abs(fytire * tan(alpha)) + abs(fxtire * kappa))
q_carcass2air = p_carcass2air * h_carcass2air * area_tread * (t_carcass - t_air)
q_carcass2ambient_natconv = p_natconv * h_natconv * area_sidewall * (t_carcass - t_ambient)
q_deflection = p_deflection * h_deflection * vx * abs(fztire)
q_air2ambient = p_air2ambient * h_natconv * area_rim * (t_air - t_ambient)

du[2] = der_t_carcass = (q_tread2carcass + q_deflection - q_carcass2air - q_carcass2ambient_natconv)/(m_carcass * cp_carcass)
du[3] = der_t_air = (q_carcass2air - q_air2ambient)/(m_air * cp_air)
end

``````

I first use BlackBoxOptim with parameter bounds to first get the model to a reasonable parameter space and then use that as an initial guess for BFGS(). I find that BlackBoxOptim is able to get the fit close but BFGS() is not refining the fit even after running for a long time. On top of that, it is also extremely slow.

Loss Function:

``````cost_function = build_loss_objective(prob, Tsit5(), L2Loss(tsteps, transpose(data_ref)),maxiters=500000,verbose=true)
``````

This is the code for BFGS() optimization which uses the result from BlackBoxOptim.

``````n_p = 8
l_b = repeat([0.0],n_p)
u_b = repeat([2.0e2],n_p)
result_bfgs = Optim.optimize(cost_function,l_b,u_b,p_bb, Fminbox(BFGS()))
``````

My guess is it is something to with the gradient calculation but I donât know how I can make it faster. I am using the default Tsit5() solver with adaptive timestep. I tried these approaches and failed:

• swapped to DataInterpolations.jl and it was even slower.
• tried fitting only 100s but there is drift in the states (its still relatively slower than bboptimize)

I am new to Julia and I would appreciate any pointers on what might be the issue here.

1 Like

If it takes 600s to simulate on itâs own, I imagine it will take a lot longer to compute a ForwardDiff gradient. Are you sure it should be taking this long, for a 3-state model? Maybe profile your code and check. Otherwise, try using BFGS() with the collocation cost described here: https://diffeq.sciml.ai/stable/analysis/parameter_estimation/#Two-Stage-method-(Non-Parametric-Collocation)

That should have much quicker gradient evaluations

1 Like

Sorry I should have been clearer. The data to fit is 600 seconds long. The simulation runs fairly fast (around 0.01 sec). Edited the post to clarify.

1 Like

You can check how fast gradient evaluation is taking by calling cost(p,g), where g is an empty array the size of the gradient. I highly doubt it will be taking very long in a such a small model. So I guess BFGS() is just taking a lot of iterations to get to the local minimum. You can check how many by probing the results structure, and see if it corresponds roughly to a 600 second optimisation time as a sanity check? Either way, you can warmstart BFGS by first using a collocation metric on your cost function (see my previous post), which might/not help

Are you sure the function is differentiable? Is the Hessian possibly singular? Could your initial approximation of the Hessian be poor? Any of these will give a quasi-Newton method problems.

Most black box methods are derivative-free and can get you something useful, but it may take many, many calls to the objective. If black box is happy and BFGS is not, somethingâs wrong with either the problem or the initial iterate for the solution or (less likely) the Hessian. BFGS with a line search will perform poorly if the initial data are poor.

Blackbox vs qradient-based is one of the topics in (shameless plug)

author=âC. T. Kelleyâ,
title="{Iterative Methods for Optimization}",
publisher=âSIAMâ,
year=1999,
series=âFrontiers in Applied Mathematicsâ,
number=18

1 Like

There is a chance that this function might not be smooth because I use `abs()`, `min()` and `max()` functions. Is there anyway to check the Hessian in Julia ?

I have simulated the same model in Modelica and it did not trigger any state events which is how I would think this model does not have any discontinuities. However, Modelica uses symbolic manipulation and has its own way of reducing equations so I donât know how it would compare to `DifferentialEquations.jl`

1 Like

Could you tell me which package this function is from ? does `p` mean the ODE problem here ?

I tried your collocation suggestion with the code snippet here and it is running for 15+ mins and still hasnât stopped. I donât know if this the right way to do the 2-stage method since the documentation didnât have an example

``````cost_twostage = two_stage_method(prob,tsteps,data_ref_tr;
kernel= :Epanechnikov,
mpg_autodiff = false,
verbose = true)
n_p = 8
p0 = repeat([0.1], n_p)
l_b = repeat([0.0],n_p)
u_b = repeat([1.0e2],n_p)
result_bfgs_twostage = Optim.optimize(cost_twostage,l_b,u_b,p0, Fminbox(BFGS()))
``````

Edit:

running BFGS unbounded completes but the fitted parameters look bad and it is extremely slow per `f(x)` call

`result_bfgs_twostage = Optim.optimize(cost_twostage,p0,BFGS())`

``````* Status: success

* Candidate solution
Final objective value:     9.827941e+02

* Found with
Algorithm:     BFGS

* Convergence measures
|x - x'|               = 0.00e+00 â¤ 0.0e+00
|x - x'|/|x'|          = 0.00e+00 â¤ 0.0e+00
|f(x) - f(x')|         = 0.00e+00 â¤ 0.0e+00
|f(x) - f(x')|/|f(x')| = 0.00e+00 â¤ 0.0e+00
|g(x)|                 = 8.45e-07 â° 1.0e-08

* Work counters
Seconds run:   232  (vs limit Inf)
Iterations:    245
f(x) calls:    697
âf(x) calls:   697
``````

If your nonsmothness is only in functions like abs, max, and min. There are ways to fix that by approximating the function with a smooth function. This works amazingly well and the approximation does not have to be super good. I hate to burden you with technical papers, but the examples (not the theorems and proofs) in

author=âX. Chenâ,
title=âSmoothing methods for nonsmooth, nonconvex minimizationâ,
year=2012,
journal=âMath. Prog.â,
volume=134,
pages=â71â99â

might help. There is even good stuff here

1 Like

I removed all the potential non-smooth elements and its still pretty slow on BFGS. The only ones remaining are the interpolated inputs now.

So I guess the slowdown might be from `Interpolations.jl`. I donât have any other ideas otherwise.

Are you interpolating a nonsmooth function with a smooth interpolation? That is not a wise thing to do. Interpolating max with a 4th degree polynomial, for example, will hive you some very annoying oscillations.

I understand your concern here. The interpolations are however on measurement data that have to be fed as inputs to the thermal model. I donât have a choice here.

In my case I was just using abs max and min to account for measurement drop outs. My point was the speed of BFGS didnât budge even after taking out all the non smooth terms.

What kind of interpolation are you using? Piecewise linear? Polynomial? Trig? âŚ

How are you getting gradients? Finite differences is a high-risk way to do it if the objective has internal interpolations.

I am just using the `LinearInterpolation()` object from `Interpolations.jl`. I think it is a BSpline with degree 1.

I honestly do not know gradients are being computed. My guess is the package is fully compatible with `ForwardDiff.jl` so it should provide it automatically. (I might be wrong here)

code snippet from the package documentation below. I have 8 of these interpolation objects as inputs to the model.

``````xs = 1:0.2:5
A = [f(x) for x in xs]

# linear interpolation
interp_linear = LinearInterpolation(xs, A)
``````

If you are interpolating a non-differentiable function and/or computing gradients of a PL interpolation, that could really confuse BFGS.

2 Likes

You might want to look at the new symbolic passes weâve released:

1 Like

Can you share a full code? What does your profile look like?

1 Like

Hello @ChrisRackauckas, full code below. The code needs input data (which I donât own so canât share in public), it wonât run without it.

``````## Functions and common constants for Thermal Tire Model
using Printf
using FileIO
using DataFrames
using Interpolations
using DifferentialEquations
using RecursiveArrayTools

## Build ODE Problem
function build_tire_thermal_ode(data,init_state,tspan,tstep,wheel_pos)
## convenience lambda functions to append wheel position
wheelvar = (var) -> @sprintf("%s%s",var,wheel_pos)
get_data = (var) -> data[!, wheelvar(var)]
lin_interp = (var) -> LinearInterpolation(data[!, "Time"], get_data(var))
get_init = (var) -> init_state[wheelvar(var)]

## Define Interpolations
# TODO: figure out how to do multiple 1D interpolations of matrix columns; this is painful
fxtire_itp = lin_interp("FxTire")
fytire_itp = lin_interp("FyTire")
fztire_itp = lin_interp("FzTire")
vx_itp = lin_interp("vContactPatchX")
alpha_itp = lin_interp("aTireSlip")
kappa_itp = lin_interp("rTireSlip")
h_splitter_itp = lin_interp("hSplitterDeflected")
t_carcass_itp = lin_interp("TTireCarcass")
t_air_itp = lin_interp("TTireAir")

## Nominal Tire Model parameters
# using parameters from Limebeer and Mavros papers and scaling them using the fitting process
# Environment
t_ambient = init_state["TAmbient"]
t_track = init_state["TTrack"]

tire_size = [305, 54, 15]
r_rim = 0.5 * tire_size[3] * 0.0254
tire_width = tire_size[1] / 1000
area_rim = 2 * pi * r_rim * tire_width
area_sidewall = pi * (r_unloaded^2 - r_rim^2)

# air
tire_air_volume = area_sidewall * tire_width
air_density = 2.5738
m_air = air_density * tire_air_volume

# carcass
m_tire = 9.072

# specific heat
cp_air = 0.718e3 # J/Kg/K
cp_carcass = 1.7652e3

# heat transfer coeffs
h_natconv = 5.0
h_forcedconv = 7.0
h_carcass2air = 8.0e1
h_deflection = 2.79e-4*60.0

## Thermal Model - using nominal physical heat transfer coefficients
smoothabs = x -> sqrt(x^2 + 10^-10) # smooth version of abs
function tire_thermal_model!(du, u, p, t)
# scaling parameters
p_natconv, p_carcass2air, p_tread2carcass, p_air2ambient = p
fxtire = fxtire_itp(t)
fytire = fytire_itp(t)
fztire = fztire_itp(t)
vx = vx_itp(t)
alpha = alpha_itp(t)
kappa = kappa_itp(t)
h_splitter = h_splitter_itp(t)

# crude approximation of tire contact area purely based on geometry

# heat transfer rates
q_friction = p_friction * vx * (smoothabs(fytire * tan(alpha)) + smoothabs(fxtire * kappa))
q_carcass2air = p_carcass2air * h_carcass2air * area_tread * (t_carcass - t_air)
q_carcass2ambient_natconv = p_natconv * h_natconv * area_sidewall * (t_carcass - t_ambient)
q_deflection = p_deflection * h_deflection * vx * smoothabs(fztire)
q_air2ambient = p_air2ambient * h_natconv * area_rim * (t_air - t_ambient)

du[2] = der_t_carcass = (q_tread2carcass + q_deflection - q_carcass2air - q_carcass2ambient_natconv)/(m_carcass * cp_carcass)
du[3] = der_t_air = (q_carcass2air - q_air2ambient)/(m_air * cp_air)
end

## Build ODE Problem
# initial state
u0 = map(get_init,["TTireSurface","TTireCarcass","TTireAir"])
n_p = 8
p0 = repeat([0.1],n_p)
ode_prob = ODEProblem(tire_thermal_model!, u0, tspan, p0)

## Get Reference Data
tsteps = collect(tspan[1]:tstep:tspan[2])
data_ref = convert(Array, VectorOfArray([t_tread_itp(tsteps), t_carcass_itp(tsteps), t_air_itp(tsteps)]))

return (ode_prob,data_ref)
end

## ODE Solution Plotting
using Plots
plotly()

function solve_ode_prob(prob,p,tspan,tsteps)
prob_remade = remake(prob;p = p,tpsan = tspan)
return solve(prob_remade, Tsit5(), saveat = tsteps)
end

function evaluate_ode_prob(prob,p,tspan,tstep,data_ref,plot_title)
tsteps = collect(tspan[1]:tstep:tspan[2])
@time sol = solve_ode_prob(prob,p,tspan,tsteps)
plt = plot(sol, title = plot_title, label = ["Tread" "Carcass" "Air"], xlabel = "Time (s)", ylabel = "Temperature (K)",legend=:topleft)
# size=(1200,600)
plot!(tsteps, data_ref, label="", ls = :dot)
display(plt)
return sol
end

## Training - Build ODE Parameter Estimation Problem
using DifferentialEquations, DiffEqParamEstim
function build_ode_estimation_prob(tspan,tstep,data_train,init_train)
tsteps = collect(tspan[1]:tstep:tspan[2])
prob, data_ref = build_tire_thermal_ode(data_train,init_train,tspan,tstep,wheel_pos)
cost_function = build_loss_objective(prob, Tsit5(), L2Loss(tsteps, transpose(data_ref));
maxiters=500000,verbose=true)
return prob, data_ref,cost_function
end

using DataFrames, Printf, FileIO
data_dir = joinpath(pwd(), "data", "processed")
wheel_pos = "RF"
wheelvar = (var) -> @sprintf("%s%s",var,wheel_pos)
filename_train = "Phoenix_R18_50Hz"
filepath_train = joinpath(data_dir,@sprintf("%s.jld2",filename_train))
filename_validate = "Phoenix_R17_50Hz"
filepath_validate = joinpath(data_dir,@sprintf("%s.jld2",filename_validate))
# read initial conditions and reference data

## simulation constants
sampling_rate = 50.0 # Hz
tstep = 1/sampling_rate
t_end = 600.0 # end time
tspan = (0.0, t_end)
tsteps = collect(tspan[1]:tstep:tspan[2])

## Parameters
n_p = 8
display_param = param_val -> display(Dict(param_names.=>param_val))
p0 = repeat([0.1], n_p)

## Build ODE Problem
prob, data_ref,cost_function = build_ode_estimation_prob((0.0,600.0),tstep,data_train,init_train)
data_ref_tr = transpose(data_ref)

## BlackBoxOptim
# [2.010205352273235, 3.6567658848983844, 86.61267821640898, 0.039428524429510406, 18.151228262926487, 0.0188954545517476, 16.84746373039449, 0.06875481085963224]
using BlackBoxOptim
res = bboptimize(cost_function; SearchRange = (0.0, 1.0e2),MaxSteps=100000, NumDimensions = 8)
p_bb = best_candidate(res)
display_param(p_bb)
_ = evaluate_ode_prob(prob,p_bb,(0.0,600.0),tstep,data_ref,"BBOptimize")

## BFGS
using Optim
prob, data_ref,cost_function = build_ode_estimation_prob((0.0,600.0),tstep,data_train,init_train)
l_b = repeat([0.0],n_p)
u_b = repeat([2.0e2],n_p)
result_bfgs = Optim.optimize(cost_function,l_b,u_b,p_bb, Fminbox(BFGS()))
print(result_bfgs)
p_bfgs = result_bfgs.minimizer
display_param(p_bfgs)
_ = evaluate_ode_prob(prob,p_bfgs,(0.0,600.0),tstep,data_ref,"BFGS")

## Validation
init_validate[wheelvar("TTireCarcass")] = init_train[wheelvar("TTireCarcass")]
init_validate[wheelvar("TTireAir")] = init_train[wheelvar("TTireAir")]
init_validate["TAmbient"] = 288.706
init_validate["TTrack"] = 301.483
prob_val, data_val = build_tire_thermal_ode(data_validate,init_validate,tspan,tstep,wheel_pos)
_ = evaluate_ode_prob(prob_val,p_bfgs,tspan,tstep,data_val,"Validation")
``````

This is the profiling result (https://pastebin.com/7MWyzkrS) for BFGS optimization using the code below (I had to time limit the function otherwise it wouldnât finish)

``````Profile.init(n = 10^8)
Profile.clear()
@profile Optim.optimize(cost_function,l_b,u_b,p_bb, Fminbox(BFGS()),Optim.Options(time_limit=300.0))
Profile.print(format=:flat)
``````

Itâs hard to help without being able to run it, so Iâll just throw out some standard guesses.

BFGS requires gradients. What youâre not showing is how many iterations it takes. My guess is that itâs taking more than you thought because your tolerances on the gradient calculation are too high. Decrease the solver `abstol` and `reltol` for better gradients. Also, do `BFGS(initial_stepnorm = 0.01)` to not rely as much on the initial Hessian, or directly set the initial Hessian.

What can happen is that if you allow incorrect gradients, you can move to a bad part of parameter space where itâs more stiff and the solving is slow, and it could take awhile to get back to reasonable parameters (or might never do it). So you want to avoid that by ensuring accuracy on the derivatives, and also not allow the first few steps to have too large of an influence before an an accurate Hessian approximation is found.

2 Likes

I mean call your cost function (in this case cost_twostage) with two arguments. The one you synthesised using DiffEqParamEstim. If you call it with two arguments, it mutates the second argument to give the gradient. Which gives an idea of how long each individual gradient calculation is taking

If even the two stage method is resulting in long times, then the gradient might be numerically bad. So the optimisation algorithm gets confused/makes bad steps and takes ages to settle. Why are you setting mpg_autodiff = false? That reverts to a finite difference gradient, which can be really inaccurate in the type of problem you have.

Sorry I havenât used the two stage method of the package before. I implemented my own (almost surely worse) version of it a while ago. It will always be really quick as youâre not differentiating through the ODE solution to get the gradient.

Also, havenât mentioned it yet but almost surely if you need more control over the optimization you should be using DiffEqFlux.jl. Weâre very soon going to be reforming the DifferentialEquations.jl documentation to help push people in that direction.

1 Like