# BFGS very slow compared to BlackBoxOptim - How to improve performance

If it takes 600s to simulate on it’s own, I imagine it will take a lot longer to compute a ForwardDiff gradient. Are you sure it should be taking this long, for a 3-state model? Maybe profile your code and check. Otherwise, try using BFGS() with the collocation cost described here: https://diffeq.sciml.ai/stable/analysis/parameter_estimation/#Two-Stage-method-(Non-Parametric-Collocation)

That should have much quicker gradient evaluations

1 Like

Sorry I should have been clearer. The data to fit is 600 seconds long. The simulation runs fairly fast (around 0.01 sec). Edited the post to clarify.

1 Like

You can check how fast gradient evaluation is taking by calling cost(p,g), where g is an empty array the size of the gradient. I highly doubt it will be taking very long in a such a small model. So I guess BFGS() is just taking a lot of iterations to get to the local minimum. You can check how many by probing the results structure, and see if it corresponds roughly to a 600 second optimisation time as a sanity check? Either way, you can warmstart BFGS by first using a collocation metric on your cost function (see my previous post), which might/not help

Are you sure the function is differentiable? Is the Hessian possibly singular? Could your initial approximation of the Hessian be poor? Any of these will give a quasi-Newton method problems.

Most black box methods are derivative-free and can get you something useful, but it may take many, many calls to the objective. If black box is happy and BFGS is not, something’s wrong with either the problem or the initial iterate for the solution or (less likely) the Hessian. BFGS with a line search will perform poorly if the initial data are poor.

Blackbox vs qradient-based is one of the topics in (shameless plug)

author=“C. T. Kelley”,
title="{Iterative Methods for Optimization}",
publisher=“SIAM”,
year=1999,
series=“Frontiers in Applied Mathematics”,
number=18

1 Like

There is a chance that this function might not be smooth because I use `abs()`, `min()` and `max()` functions. Is there anyway to check the Hessian in Julia ?

I have simulated the same model in Modelica and it did not trigger any state events which is how I would think this model does not have any discontinuities. However, Modelica uses symbolic manipulation and has its own way of reducing equations so I don’t know how it would compare to `DifferentialEquations.jl`

1 Like

Could you tell me which package this function is from ? does `p` mean the ODE problem here ?

I tried your collocation suggestion with the code snippet here and it is running for 15+ mins and still hasn’t stopped. I don’t know if this the right way to do the 2-stage method since the documentation didn’t have an example

``````cost_twostage = two_stage_method(prob,tsteps,data_ref_tr;
kernel= :Epanechnikov,
mpg_autodiff = false,
verbose = true)
n_p = 8
p0 = repeat([0.1], n_p)
l_b = repeat([0.0],n_p)
u_b = repeat([1.0e2],n_p)
result_bfgs_twostage = Optim.optimize(cost_twostage,l_b,u_b,p0, Fminbox(BFGS()))
``````

Edit:

running BFGS unbounded completes but the fitted parameters look bad and it is extremely slow per `f(x)` call

`result_bfgs_twostage = Optim.optimize(cost_twostage,p0,BFGS())`

``````* Status: success

* Candidate solution
Final objective value:     9.827941e+02

* Found with
Algorithm:     BFGS

* Convergence measures
|x - x'|               = 0.00e+00 ≤ 0.0e+00
|x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
|f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
|f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
|g(x)|                 = 8.45e-07 ≰ 1.0e-08

* Work counters
Seconds run:   232  (vs limit Inf)
Iterations:    245
f(x) calls:    697
∇f(x) calls:   697
``````

If your nonsmothness is only in functions like abs, max, and min. There are ways to fix that by approximating the function with a smooth function. This works amazingly well and the approximation does not have to be super good. I hate to burden you with technical papers, but the examples (not the theorems and proofs) in

author=“X. Chen”,
title=“Smoothing methods for nonsmooth, nonconvex minimization”,
year=2012,
journal=“Math. Prog.”,
volume=134,
pages=“71–99”

might help. There is even good stuff here

1 Like

I removed all the potential non-smooth elements and its still pretty slow on BFGS. The only ones remaining are the interpolated inputs now.

So I guess the slowdown might be from `Interpolations.jl`. I don’t have any other ideas otherwise.

Are you interpolating a nonsmooth function with a smooth interpolation? That is not a wise thing to do. Interpolating max with a 4th degree polynomial, for example, will hive you some very annoying oscillations.

I understand your concern here. The interpolations are however on measurement data that have to be fed as inputs to the thermal model. I don’t have a choice here.

In my case I was just using abs max and min to account for measurement drop outs. My point was the speed of BFGS didn’t budge even after taking out all the non smooth terms.

What kind of interpolation are you using? Piecewise linear? Polynomial? Trig? …

How are you getting gradients? Finite differences is a high-risk way to do it if the objective has internal interpolations.

I am just using the `LinearInterpolation()` object from `Interpolations.jl`. I think it is a BSpline with degree 1.

I honestly do not know gradients are being computed. My guess is the package is fully compatible with `ForwardDiff.jl` so it should provide it automatically. (I might be wrong here)

code snippet from the package documentation below. I have 8 of these interpolation objects as inputs to the model.

``````xs = 1:0.2:5
A = [f(x) for x in xs]

# linear interpolation
interp_linear = LinearInterpolation(xs, A)
``````

If you are interpolating a non-differentiable function and/or computing gradients of a PL interpolation, that could really confuse BFGS.

2 Likes

You might want to look at the new symbolic passes we’ve released:

1 Like

Can you share a full code? What does your profile look like?

1 Like

Hello @ChrisRackauckas, full code below. The code needs input data (which I don’t own so can’t share in public), it won’t run without it.

``````## Functions and common constants for Thermal Tire Model
using Printf
using FileIO
using DataFrames
using Interpolations
using DifferentialEquations
using RecursiveArrayTools

## Build ODE Problem
function build_tire_thermal_ode(data,init_state,tspan,tstep,wheel_pos)
## convenience lambda functions to append wheel position
wheelvar = (var) -> @sprintf("%s%s",var,wheel_pos)
get_data = (var) -> data[!, wheelvar(var)]
lin_interp = (var) -> LinearInterpolation(data[!, "Time"], get_data(var))
get_init = (var) -> init_state[wheelvar(var)]

## Define Interpolations
# TODO: figure out how to do multiple 1D interpolations of matrix columns; this is painful
fxtire_itp = lin_interp("FxTire")
fytire_itp = lin_interp("FyTire")
fztire_itp = lin_interp("FzTire")
vx_itp = lin_interp("vContactPatchX")
alpha_itp = lin_interp("aTireSlip")
kappa_itp = lin_interp("rTireSlip")
h_splitter_itp = lin_interp("hSplitterDeflected")
t_carcass_itp = lin_interp("TTireCarcass")
t_air_itp = lin_interp("TTireAir")

## Nominal Tire Model parameters
# using parameters from Limebeer and Mavros papers and scaling them using the fitting process
# Environment
t_ambient = init_state["TAmbient"]
t_track = init_state["TTrack"]

tire_size = [305, 54, 15]
r_rim = 0.5 * tire_size[3] * 0.0254
tire_width = tire_size[1] / 1000
area_rim = 2 * pi * r_rim * tire_width
area_sidewall = pi * (r_unloaded^2 - r_rim^2)

# air
tire_air_volume = area_sidewall * tire_width
air_density = 2.5738
m_air = air_density * tire_air_volume

# carcass
m_tire = 9.072

# specific heat
cp_air = 0.718e3 # J/Kg/K
cp_carcass = 1.7652e3

# heat transfer coeffs
h_natconv = 5.0
h_forcedconv = 7.0
h_carcass2air = 8.0e1
h_deflection = 2.79e-4*60.0

## Thermal Model - using nominal physical heat transfer coefficients
smoothabs = x -> sqrt(x^2 + 10^-10) # smooth version of abs
function tire_thermal_model!(du, u, p, t)
# scaling parameters
p_natconv, p_carcass2air, p_tread2carcass, p_air2ambient = p
fxtire = fxtire_itp(t)
fytire = fytire_itp(t)
fztire = fztire_itp(t)
vx = vx_itp(t)
alpha = alpha_itp(t)
kappa = kappa_itp(t)
h_splitter = h_splitter_itp(t)

# crude approximation of tire contact area purely based on geometry

# heat transfer rates
q_friction = p_friction * vx * (smoothabs(fytire * tan(alpha)) + smoothabs(fxtire * kappa))
q_carcass2air = p_carcass2air * h_carcass2air * area_tread * (t_carcass - t_air)
q_carcass2ambient_natconv = p_natconv * h_natconv * area_sidewall * (t_carcass - t_ambient)
q_deflection = p_deflection * h_deflection * vx * smoothabs(fztire)
q_air2ambient = p_air2ambient * h_natconv * area_rim * (t_air - t_ambient)

du[2] = der_t_carcass = (q_tread2carcass + q_deflection - q_carcass2air - q_carcass2ambient_natconv)/(m_carcass * cp_carcass)
du[3] = der_t_air = (q_carcass2air - q_air2ambient)/(m_air * cp_air)
end

## Build ODE Problem
# initial state
u0 = map(get_init,["TTireSurface","TTireCarcass","TTireAir"])
n_p = 8
p0 = repeat([0.1],n_p)
ode_prob = ODEProblem(tire_thermal_model!, u0, tspan, p0)

## Get Reference Data
tsteps = collect(tspan[1]:tstep:tspan[2])
data_ref = convert(Array, VectorOfArray([t_tread_itp(tsteps), t_carcass_itp(tsteps), t_air_itp(tsteps)]))

return (ode_prob,data_ref)
end

## ODE Solution Plotting
using Plots
plotly()

function solve_ode_prob(prob,p,tspan,tsteps)
prob_remade = remake(prob;p = p,tpsan = tspan)
return solve(prob_remade, Tsit5(), saveat = tsteps)
end

function evaluate_ode_prob(prob,p,tspan,tstep,data_ref,plot_title)
tsteps = collect(tspan[1]:tstep:tspan[2])
@time sol = solve_ode_prob(prob,p,tspan,tsteps)
plt = plot(sol, title = plot_title, label = ["Tread" "Carcass" "Air"], xlabel = "Time (s)", ylabel = "Temperature (K)",legend=:topleft)
# size=(1200,600)
plot!(tsteps, data_ref, label="", ls = :dot)
display(plt)
return sol
end

## Training - Build ODE Parameter Estimation Problem
using DifferentialEquations, DiffEqParamEstim
function build_ode_estimation_prob(tspan,tstep,data_train,init_train)
tsteps = collect(tspan[1]:tstep:tspan[2])
prob, data_ref = build_tire_thermal_ode(data_train,init_train,tspan,tstep,wheel_pos)
cost_function = build_loss_objective(prob, Tsit5(), L2Loss(tsteps, transpose(data_ref));
maxiters=500000,verbose=true)
return prob, data_ref,cost_function
end

using DataFrames, Printf, FileIO
data_dir = joinpath(pwd(), "data", "processed")
wheel_pos = "RF"
wheelvar = (var) -> @sprintf("%s%s",var,wheel_pos)
filename_train = "Phoenix_R18_50Hz"
filepath_train = joinpath(data_dir,@sprintf("%s.jld2",filename_train))
filename_validate = "Phoenix_R17_50Hz"
filepath_validate = joinpath(data_dir,@sprintf("%s.jld2",filename_validate))
# read initial conditions and reference data

## simulation constants
sampling_rate = 50.0 # Hz
tstep = 1/sampling_rate
t_end = 600.0 # end time
tspan = (0.0, t_end)
tsteps = collect(tspan[1]:tstep:tspan[2])

## Parameters
n_p = 8
display_param = param_val -> display(Dict(param_names.=>param_val))
p0 = repeat([0.1], n_p)

## Build ODE Problem
prob, data_ref,cost_function = build_ode_estimation_prob((0.0,600.0),tstep,data_train,init_train)
data_ref_tr = transpose(data_ref)

## BlackBoxOptim
# [2.010205352273235, 3.6567658848983844, 86.61267821640898, 0.039428524429510406, 18.151228262926487, 0.0188954545517476, 16.84746373039449, 0.06875481085963224]
using BlackBoxOptim
res = bboptimize(cost_function; SearchRange = (0.0, 1.0e2),MaxSteps=100000, NumDimensions = 8)
p_bb = best_candidate(res)
display_param(p_bb)
_ = evaluate_ode_prob(prob,p_bb,(0.0,600.0),tstep,data_ref,"BBOptimize")

## BFGS
using Optim
prob, data_ref,cost_function = build_ode_estimation_prob((0.0,600.0),tstep,data_train,init_train)
l_b = repeat([0.0],n_p)
u_b = repeat([2.0e2],n_p)
result_bfgs = Optim.optimize(cost_function,l_b,u_b,p_bb, Fminbox(BFGS()))
print(result_bfgs)
p_bfgs = result_bfgs.minimizer
display_param(p_bfgs)
_ = evaluate_ode_prob(prob,p_bfgs,(0.0,600.0),tstep,data_ref,"BFGS")

## Validation
init_validate[wheelvar("TTireCarcass")] = init_train[wheelvar("TTireCarcass")]
init_validate[wheelvar("TTireAir")] = init_train[wheelvar("TTireAir")]
init_validate["TAmbient"] = 288.706
init_validate["TTrack"] = 301.483
prob_val, data_val = build_tire_thermal_ode(data_validate,init_validate,tspan,tstep,wheel_pos)
_ = evaluate_ode_prob(prob_val,p_bfgs,tspan,tstep,data_val,"Validation")
``````

This is the profiling result (https://pastebin.com/7MWyzkrS) for BFGS optimization using the code below (I had to time limit the function otherwise it wouldn’t finish)

``````Profile.init(n = 10^8)
Profile.clear()
@profile Optim.optimize(cost_function,l_b,u_b,p_bb, Fminbox(BFGS()),Optim.Options(time_limit=300.0))
Profile.print(format=:flat)
``````

It’s hard to help without being able to run it, so I’ll just throw out some standard guesses.

BFGS requires gradients. What you’re not showing is how many iterations it takes. My guess is that it’s taking more than you thought because your tolerances on the gradient calculation are too high. Decrease the solver `abstol` and `reltol` for better gradients. Also, do `BFGS(initial_stepnorm = 0.01)` to not rely as much on the initial Hessian, or directly set the initial Hessian.

What can happen is that if you allow incorrect gradients, you can move to a bad part of parameter space where it’s more stiff and the solving is slow, and it could take awhile to get back to reasonable parameters (or might never do it). So you want to avoid that by ensuring accuracy on the derivatives, and also not allow the first few steps to have too large of an influence before an an accurate Hessian approximation is found.

2 Likes

I mean call your cost function (in this case cost_twostage) with two arguments. The one you synthesised using DiffEqParamEstim. If you call it with two arguments, it mutates the second argument to give the gradient. Which gives an idea of how long each individual gradient calculation is taking

If even the two stage method is resulting in long times, then the gradient might be numerically bad. So the optimisation algorithm gets confused/makes bad steps and takes ages to settle. Why are you setting mpg_autodiff = false? That reverts to a finite difference gradient, which can be really inaccurate in the type of problem you have.

Sorry I haven’t used the two stage method of the package before. I implemented my own (almost surely worse) version of it a while ago. It will always be really quick as you’re not differentiating through the ODE solution to get the gradient.

Also, haven’t mentioned it yet but almost surely if you need more control over the optimization you should be using DiffEqFlux.jl. We’re very soon going to be reforming the DifferentialEquations.jl documentation to help push people in that direction.

1 Like

A few suggestions:

1. Make sure the gradient is correct. A lot of the times that I struggled with gradient-based optimisation algorithms, the gradient was wrong. So define the cost function and check that it differentiates correctly using any AD package and finite difference. There might be an AD bug, unlikely but not impossible.
2. Try other algorithms except `BFGS`. If your cost function’s curvature is changing often, `BFGS` is likely a bad choice for an algorithm here because it tries to capture “global curvature information” in the approximate inverse Hessian which can be complete gibberish if your cost function’s curvature changes too often. `GradientDescent` and `ConjugateGradient` are 2 alternatives I would try.
3. Benchmark your function and its gradient and check for type instabilities with `Float64` inputs and `ForwardDiff.Dual` inputs. It’s possible that your function is type stable when run with 1 input type but not type stable when run with another input type.
4. Consider using reverse-mode AD to define the gradient if it’s too slow. You can pass the gradient function explicitly to Optim.
5. Loosen the tolerance as Chris suggests above and see how loose it’s allowed to be while still converging to a reasonable solution.
1 Like