BFGS very slow compared to BlackBoxOptim - How to improve performance

vsskanth · October 29, 2020, 5:37pm

I am trying to fit a differential equation (using DiffEqParamEstim) with these properties:

Lumped parameter linear thermal model with 3 states
8 interpolated external inputs (sampled at 50 Hz) using Interpolations.jl objects (shown below with _itp suffix)
8 parameters to fit
Long simulation time (data length - 600s) - runs in 0.01 secs

I have pasted a snippet of the code below for the differential equation. The variables not mentioned in the function are all constant floats enclosed in a wrapper function. (apologies for the incomplete code, I tried to make a standalone example but couldn’t replicate the slow behavior)

function thermal_model!(du, u, p, t)
  # scaling parameters
  p_friction, p_forcedconv, p_tread2road, p_deflection,
  p_natconv, p_carcass2air, p_tread2carcass, p_air2ambient = p
  fxtire = fxtire_itp(t)
  fytire = fytire_itp(t)
  fztire = fztire_itp(t)
  vx = vx_itp(t)
  alpha = alpha_itp(t)
  kappa = kappa_itp(t)
  r_loaded = r_loaded_itp(t)
  h_splitter = h_splitter_itp(t)

  # arc length of tread area
   theta_1 = acos(min(r_loaded - h_splitter, r_unloaded) / r_unloaded)
   theta_2 = acos(min(r_loaded, r_unloaded) / r_unloaded)
  area_tread_forced_air = r_unloaded * (theta_1 - theta_2) * tire_width
  area_tread_contact = tire_width * 2 * sqrt(max(r_unloaded^2 - r_loaded^2, 0))

  q_friction = p_friction * vx * (abs(fytire * tan(alpha)) + abs(fxtire * kappa))
  q_tread2ambient_forcedconv = p_forcedconv * h_forcedconv * area_tread_forced_air * (t_tread - t_ambient) * vx^0.805
  q_tread2ambient_natconv = p_natconv * h_natconv * (area_tread - area_tread_contact) * (t_tread - t_ambient)
  q_tread2carcass = p_tread2carcass * h_tread2carcass * area_tread * (t_tread - t_carcass)
  q_carcass2air = p_carcass2air * h_carcass2air * area_tread * (t_carcass - t_air)
  q_carcass2ambient_natconv = p_natconv * h_natconv * area_sidewall * (t_carcass - t_ambient)
  q_tread2road = p_tread2road * h_tread2road * area_tread_contact * (t_tread - t_track)
  q_deflection = p_deflection * h_deflection * vx * abs(fztire)
  q_air2ambient = p_air2ambient * h_natconv * area_rim * (t_air - t_ambient)

    du[1] = der_t_tread = (q_friction - q_tread2carcass - q_tread2road - q_tread2ambient_forcedconv - q_tread2ambient_natconv)/(m_tread * cp_tread)
    du[2] = der_t_carcass = (q_tread2carcass + q_deflection - q_carcass2air - q_carcass2ambient_natconv)/(m_carcass * cp_carcass)
    du[3] = der_t_air = (q_carcass2air - q_air2ambient)/(m_air * cp_air)
end

I first use BlackBoxOptim with parameter bounds to first get the model to a reasonable parameter space and then use that as an initial guess for BFGS(). I find that BlackBoxOptim is able to get the fit close but BFGS() is not refining the fit even after running for a long time. On top of that, it is also extremely slow.

Loss Function:

cost_function = build_loss_objective(prob, Tsit5(), L2Loss(tsteps, transpose(data_ref)),maxiters=500000,verbose=true)

This is the code for BFGS() optimization which uses the result from BlackBoxOptim.

n_p = 8
l_b = repeat([0.0],n_p)
u_b = repeat([2.0e2],n_p)
result_bfgs = Optim.optimize(cost_function,l_b,u_b,p_bb, Fminbox(BFGS()))

My guess is it is something to with the gradient calculation but I don’t know how I can make it faster. I am using the default Tsit5() solver with adaptive timestep. I tried these approaches and failed:

swapped to DataInterpolations.jl and it was even slower.
tried fitting only 100s but there is drift in the states (its still relatively slower than bboptimize)

I am new to Julia and I would appreciate any pointers on what might be the issue here.

Dhruva2 · October 29, 2020, 5:44pm

If it takes 600s to simulate on it’s own, I imagine it will take a lot longer to compute a ForwardDiff gradient. Are you sure it should be taking this long, for a 3-state model? Maybe profile your code and check. Otherwise, try using BFGS() with the collocation cost described here: Parameter Estimation and Bayesian Analysis · DifferentialEquations.jl

That should have much quicker gradient evaluations

vsskanth · October 29, 2020, 5:48pm

Sorry I should have been clearer. The data to fit is 600 seconds long. The simulation runs fairly fast (around 0.01 sec). Edited the post to clarify.

Dhruva2 · October 29, 2020, 5:58pm

You can check how fast gradient evaluation is taking by calling cost(p,g), where g is an empty array the size of the gradient. I highly doubt it will be taking very long in a such a small model. So I guess BFGS() is just taking a lot of iterations to get to the local minimum. You can check how many by probing the results structure, and see if it corresponds roughly to a 600 second optimisation time as a sanity check? Either way, you can warmstart BFGS by first using a collocation metric on your cost function (see my previous post), which might/not help

ctkelley · October 29, 2020, 6:46pm

Are you sure the function is differentiable? Is the Hessian possibly singular? Could your initial approximation of the Hessian be poor? Any of these will give a quasi-Newton method problems.

Most black box methods are derivative-free and can get you something useful, but it may take many, many calls to the objective. If black box is happy and BFGS is not, something’s wrong with either the problem or the initial iterate for the solution or (less likely) the Hessian. BFGS with a line search will perform poorly if the initial data are poor.

Blackbox vs qradient-based is one of the topics in (shameless plug)

author=“C. T. Kelley”,
title=“{Iterative Methods for Optimization}”,
publisher=“SIAM”,
address=“Philadelphia”,
year=1999,
series=“Frontiers in Applied Mathematics”,
number=18

vsskanth · October 29, 2020, 7:32pm

There is a chance that this function might not be smooth because I use abs(), min() and max() functions. Is there anyway to check the Hessian in Julia ?

I have simulated the same model in Modelica and it did not trigger any state events which is how I would think this model does not have any discontinuities. However, Modelica uses symbolic manipulation and has its own way of reducing equations so I don’t know how it would compare to DifferentialEquations.jl

vsskanth · October 29, 2020, 7:32pm

Could you tell me which package this function is from ? does p mean the ODE problem here ?

I tried your collocation suggestion with the code snippet here and it is running for 15+ mins and still hasn’t stopped. I don’t know if this the right way to do the 2-stage method since the documentation didn’t have an example

cost_twostage = two_stage_method(prob,tsteps,data_ref_tr;
  kernel= :Epanechnikov,
  loss_func = L2Loss(tsteps, data_ref_tr;colloc_grad = colloc_grad(tsteps,data_ref_tr)),
  mpg_autodiff = false,
  verbose = true)
n_p = 8
p0 = repeat([0.1], n_p)
l_b = repeat([0.0],n_p)
u_b = repeat([1.0e2],n_p)
result_bfgs_twostage = Optim.optimize(cost_twostage,l_b,u_b,p0, Fminbox(BFGS()))

Edit:

running BFGS unbounded completes but the fitted parameters look bad and it is extremely slow per f(x) call

result_bfgs_twostage = Optim.optimize(cost_twostage,p0,BFGS())

* Status: success

 * Candidate solution
    Final objective value:     9.827941e+02

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 0.0e+00
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
    |g(x)|                 = 8.45e-07 ≰ 1.0e-08

 * Work counters
    Seconds run:   232  (vs limit Inf)
    Iterations:    245
    f(x) calls:    697
    ∇f(x) calls:   697

ctkelley · October 29, 2020, 7:57pm

If your nonsmothness is only in functions like abs, max, and min. There are ways to fix that by approximating the function with a smooth function. This works amazingly well and the approximation does not have to be super good. I hate to burden you with technical papers, but the examples (not the theorems and proofs) in

author=“X. Chen”,
title=“Smoothing methods for nonsmooth, nonconvex minimization”,
year=2012,
journal=“Math. Prog.”,
volume=134,
pages=“71–99”

might help. There is even good stuff here

vsskanth · October 29, 2020, 9:59pm

I removed all the potential non-smooth elements and its still pretty slow on BFGS. The only ones remaining are the interpolated inputs now.

So I guess the slowdown might be from Interpolations.jl. I don’t have any other ideas otherwise.

ctkelley · October 29, 2020, 11:00pm

Are you interpolating a nonsmooth function with a smooth interpolation? That is not a wise thing to do. Interpolating max with a 4th degree polynomial, for example, will hive you some very annoying oscillations.

vsskanth · October 30, 2020, 12:15am

I understand your concern here. The interpolations are however on measurement data that have to be fed as inputs to the thermal model. I don’t have a choice here.

In my case I was just using abs max and min to account for measurement drop outs. My point was the speed of BFGS didn’t budge even after taking out all the non smooth terms.

ctkelley · October 30, 2020, 12:24am

What kind of interpolation are you using? Piecewise linear? Polynomial? Trig? …

How are you getting gradients? Finite differences is a high-risk way to do it if the objective has internal interpolations.

vsskanth · October 30, 2020, 6:01pm

I am just using the LinearInterpolation() object from Interpolations.jl. I think it is a BSpline with degree 1.

I honestly do not know gradients are being computed. My guess is the package is fully compatible with ForwardDiff.jl so it should provide it automatically. (I might be wrong here)

code snippet from the package documentation below. I have 8 of these interpolation objects as inputs to the model.

xs = 1:0.2:5
A = [f(x) for x in xs]

# linear interpolation
interp_linear = LinearInterpolation(xs, A)

ctkelley · October 30, 2020, 8:45pm

If you are interpolating a non-differentiable function and/or computing gradients of a PL interpolation, that could really confuse BFGS.

ChrisRackauckas · October 30, 2020, 9:10pm

You might want to look at the new symbolic passes we’ve released:

https://github.com/JuliaComputing/StructuralTransformations.jl

ChrisRackauckas · October 30, 2020, 9:12pm

Can you share a full code? What does your profile look like?

vsskanth · October 31, 2020, 6:11pm

Hello @ChrisRackauckas, full code below. The code needs input data (which I don’t own so can’t share in public), it won’t run without it.

## Functions and common constants for Thermal Tire Model
using Printf
using FileIO
using DataFrames
using Interpolations
using DifferentialEquations
using RecursiveArrayTools

## Build ODE Problem
function build_tire_thermal_ode(data,init_state,tspan,tstep,wheel_pos)
  ## convenience lambda functions to append wheel position
  wheelvar = (var) -> @sprintf("%s%s",var,wheel_pos)
  get_data = (var) -> data[!, wheelvar(var)]
  lin_interp = (var) -> LinearInterpolation(data[!, "Time"], get_data(var))
  get_init = (var) -> init_state[wheelvar(var)]

  ## Define Interpolations
  # TODO: figure out how to do multiple 1D interpolations of matrix columns; this is painful
  fxtire_itp = lin_interp("FxTire")
  fytire_itp = lin_interp("FyTire")
  fztire_itp = lin_interp("FzTire")
  vx_itp = lin_interp("vContactPatchX")
  alpha_itp = lin_interp("aTireSlip")
  kappa_itp = lin_interp("rTireSlip")
  r_loaded_itp = lin_interp("rTireLoaded")
  h_splitter_itp = lin_interp("hSplitterDeflected")
  t_carcass_itp = lin_interp("TTireCarcass")
  t_tread_itp = lin_interp("TTireSurface")
  t_air_itp = lin_interp("TTireAir")

  ## Nominal Tire Model parameters
  # using parameters from Limebeer and Mavros papers and scaling them using the fitting process
  # Environment
  t_ambient = init_state["TAmbient"]
  t_track = init_state["TTrack"]

  tire_size = [305, 54, 15]
  r_rim = 0.5 * tire_size[3] * 0.0254
  r_unloaded = init_state[wheelvar("rTireUnloaded")]
  tire_width = tire_size[1] / 1000
  area_tread = 2 * pi * r_unloaded * tire_width
  area_rim = 2 * pi * r_rim * tire_width
  area_sidewall = pi * (r_unloaded^2 - r_rim^2)

  # air
  tire_air_volume = area_sidewall * tire_width
  air_density = 2.5738
  m_air = air_density * tire_air_volume

  # tread
  tread_depth = 0.00254
  tread_density = 1.10e03
  tread_volume = tread_depth * tread_density
  m_tread = area_tread * tread_volume

  # carcass
  m_tire = 9.072
  m_carcass = m_tire - m_tread
  carcass_depth = tread_depth * 2
  carcass_volume = carcass_depth * area_tread

  # road
  road_depth = 20e-3

  # specific heat
  cp_air = 0.718e3 # J/Kg/K
  cp_carcass = 1.7652e3
  cp_tread = 1.560e3

  # heat transfer coeffs
  h_tread2road = 7.5e-1 / road_depth
  h_tread2carcass = 3.03e-1 / carcass_depth
  h_natconv = 5.0
  h_forcedconv = 7.0
  h_carcass2air = 8.0e1
  h_deflection = 2.79e-4*60.0

  ## Thermal Model - using nominal physical heat transfer coefficients
  smoothabs = x -> sqrt(x^2 + 10^-10) # smooth version of abs
  function tire_thermal_model!(du, u, p, t)
    t_tread, t_carcass, t_air = u
    # scaling parameters
    p_friction, p_forcedconv, p_tread2road, p_deflection,
    p_natconv, p_carcass2air, p_tread2carcass, p_air2ambient = p
    fxtire = fxtire_itp(t)
    fytire = fytire_itp(t)
    fztire = fztire_itp(t)
    vx = vx_itp(t)
    alpha = alpha_itp(t)
    kappa = kappa_itp(t)
    r_loaded = r_loaded_itp(t)
    h_splitter = h_splitter_itp(t)

    # crude approximation of tire contact area purely based on geometry
    area_tread_forced_air = h_splitter * tire_width
    area_tread_contact = tire_width * 2 * sqrt(r_unloaded^2 - r_loaded^2)

    # heat transfer rates
    q_friction = p_friction * vx * (smoothabs(fytire * tan(alpha)) + smoothabs(fxtire * kappa))
    q_tread2ambient_forcedconv = p_forcedconv * h_forcedconv * area_tread_forced_air * (t_tread - t_ambient) * vx^0.805
    q_tread2ambient_natconv = p_natconv * h_natconv * (area_tread - area_tread_contact) * (t_tread - t_ambient)
    q_tread2carcass = p_tread2carcass * h_tread2carcass * area_tread * (t_tread - t_carcass)
    q_carcass2air = p_carcass2air * h_carcass2air * area_tread * (t_carcass - t_air)
    q_carcass2ambient_natconv = p_natconv * h_natconv * area_sidewall * (t_carcass - t_ambient)
    q_tread2road = p_tread2road * h_tread2road * area_tread_contact * (t_tread - t_track)
    q_deflection = p_deflection * h_deflection * vx * smoothabs(fztire)
    q_air2ambient = p_air2ambient * h_natconv * area_rim * (t_air - t_ambient)

    du[1] = der_t_tread = (q_friction - q_tread2carcass - q_tread2road - q_tread2ambient_forcedconv - q_tread2ambient_natconv)/(m_tread * cp_tread)
    du[2] = der_t_carcass = (q_tread2carcass + q_deflection - q_carcass2air - q_carcass2ambient_natconv)/(m_carcass * cp_carcass)
    du[3] = der_t_air = (q_carcass2air - q_air2ambient)/(m_air * cp_air)
  end

  ## Build ODE Problem
  # initial state
  u0 = map(get_init,["TTireSurface","TTireCarcass","TTireAir"])
  n_p = 8
  p0 = repeat([0.1],n_p)
  ode_prob = ODEProblem(tire_thermal_model!, u0, tspan, p0)

  ## Get Reference Data
  tsteps = collect(tspan[1]:tstep:tspan[2])
  data_ref = convert(Array, VectorOfArray([t_tread_itp(tsteps), t_carcass_itp(tsteps), t_air_itp(tsteps)]))

  return (ode_prob,data_ref)
end

## ODE Solution Plotting
using Plots
plotly()

function solve_ode_prob(prob,p,tspan,tsteps)
    prob_remade = remake(prob;p = p,tpsan = tspan)
    return solve(prob_remade, Tsit5(), saveat = tsteps)
end

function evaluate_ode_prob(prob,p,tspan,tstep,data_ref,plot_title)
  tsteps = collect(tspan[1]:tstep:tspan[2])
  @time sol = solve_ode_prob(prob,p,tspan,tsteps)
  plt = plot(sol, title = plot_title, label = ["Tread" "Carcass" "Air"], xlabel = "Time (s)", ylabel = "Temperature (K)",legend=:topleft)
  # size=(1200,600)
  plot!(tsteps, data_ref, label="", ls = :dot)
  display(plt)
  return sol
end

## Training - Build ODE Parameter Estimation Problem
using DifferentialEquations, DiffEqParamEstim
function build_ode_estimation_prob(tspan,tstep,data_train,init_train)
  tsteps = collect(tspan[1]:tstep:tspan[2])
  prob, data_ref = build_tire_thermal_ode(data_train,init_train,tspan,tstep,wheel_pos)
  cost_function = build_loss_objective(prob, Tsit5(), L2Loss(tsteps, transpose(data_ref));
  maxiters=500000,verbose=true)
  return prob, data_ref,cost_function
end

## Load data
using DataFrames, Printf, FileIO
data_dir = joinpath(pwd(), "data", "processed")
wheel_pos = "RF"
wheelvar = (var) -> @sprintf("%s%s",var,wheel_pos)
filename_train = "Phoenix_R18_50Hz"
filepath_train = joinpath(data_dir,@sprintf("%s.jld2",filename_train))
filename_validate = "Phoenix_R17_50Hz"
filepath_validate = joinpath(data_dir,@sprintf("%s.jld2",filename_validate))
# read initial conditions and reference data
data_train,init_train = load(filepath_train, "df_data", "init_state")
data_validate,init_validate = load(filepath_validate, "df_data", "init_state")

## simulation constants
sampling_rate = 50.0 # Hz
tstep = 1/sampling_rate
t_end = 600.0 # end time
tspan = (0.0, t_end)
tsteps = collect(tspan[1]:tstep:tspan[2])

## Parameters
n_p = 8
param_names = [:p_friction,:p_forcedconv,:p_tread2road,:p_deflection, :p_natconv, :p_carcass2air, :p_tread2carcass, :p_air2ambient]
display_param = param_val -> display(Dict(param_names.=>param_val))
p0 = repeat([0.1], n_p)

## Build ODE Problem
prob, data_ref,cost_function = build_ode_estimation_prob((0.0,600.0),tstep,data_train,init_train)
data_ref_tr = transpose(data_ref)

## BlackBoxOptim
# [2.010205352273235, 3.6567658848983844, 86.61267821640898, 0.039428524429510406, 18.151228262926487, 0.0188954545517476, 16.84746373039449, 0.06875481085963224]
using BlackBoxOptim
res = bboptimize(cost_function; SearchRange = (0.0, 1.0e2),MaxSteps=100000, NumDimensions = 8)
p_bb = best_candidate(res)
display_param(p_bb)
_ = evaluate_ode_prob(prob,p_bb,(0.0,600.0),tstep,data_ref,"BBOptimize")

## BFGS
using Optim
prob, data_ref,cost_function = build_ode_estimation_prob((0.0,600.0),tstep,data_train,init_train)
l_b = repeat([0.0],n_p)
u_b = repeat([2.0e2],n_p)
result_bfgs = Optim.optimize(cost_function,l_b,u_b,p_bb, Fminbox(BFGS()))
print(result_bfgs)
p_bfgs = result_bfgs.minimizer
display_param(p_bfgs)
_ = evaluate_ode_prob(prob,p_bfgs,(0.0,600.0),tstep,data_ref,"BFGS")

## Validation
init_validate[wheelvar("TTireCarcass")] = init_train[wheelvar("TTireCarcass")]
init_validate[wheelvar("TTireAir")] = init_train[wheelvar("TTireAir")]
init_validate["TAmbient"] = 288.706
init_validate["TTrack"] = 301.483
prob_val, data_val = build_tire_thermal_ode(data_validate,init_validate,tspan,tstep,wheel_pos)
_ = evaluate_ode_prob(prob_val,p_bfgs,tspan,tstep,data_val,"Validation")

This is the profiling result (https://pastebin.com/7MWyzkrS) for BFGS optimization using the code below (I had to time limit the function otherwise it wouldn’t finish)

Profile.init(n = 10^8)
Profile.clear()
@profile Optim.optimize(cost_function,l_b,u_b,p_bb, Fminbox(BFGS()),Optim.Options(time_limit=300.0))
Profile.print(format=:flat)

ChrisRackauckas · November 1, 2020, 1:41am

It’s hard to help without being able to run it, so I’ll just throw out some standard guesses.

BFGS requires gradients. What you’re not showing is how many iterations it takes. My guess is that it’s taking more than you thought because your tolerances on the gradient calculation are too high. Decrease the solver abstol and reltol for better gradients. Also, do BFGS(initial_stepnorm = 0.01) to not rely as much on the initial Hessian, or directly set the initial Hessian.

What can happen is that if you allow incorrect gradients, you can move to a bad part of parameter space where it’s more stiff and the solving is slow, and it could take awhile to get back to reasonable parameters (or might never do it). So you want to avoid that by ensuring accuracy on the derivatives, and also not allow the first few steps to have too large of an influence before an an accurate Hessian approximation is found.

Dhruva2 · November 2, 2020, 9:38pm

I mean call your cost function (in this case cost_twostage) with two arguments. The one you synthesised using DiffEqParamEstim. If you call it with two arguments, it mutates the second argument to give the gradient. Which gives an idea of how long each individual gradient calculation is taking

If even the two stage method is resulting in long times, then the gradient might be numerically bad. So the optimisation algorithm gets confused/makes bad steps and takes ages to settle. Why are you setting mpg_autodiff = false? That reverts to a finite difference gradient, which can be really inaccurate in the type of problem you have.

Sorry I haven’t used the two stage method of the package before. I implemented my own (almost surely worse) version of it a while ago. It will always be really quick as you’re not differentiating through the ODE solution to get the gradient.

ChrisRackauckas · November 3, 2020, 3:19am

Also, haven’t mentioned it yet but almost surely if you need more control over the optimization you should be using DiffEqFlux.jl. We’re very soon going to be reforming the DifferentialEquations.jl documentation to help push people in that direction.

Topic		Replies	Views
Differential Equations and Parameter Estimation General Usage	21	3571	July 2, 2019
Parameter estimation for Fractional differential equations New to Julia optimization	2	106	June 3, 2025
Optim and differential equations? Specific Domains	45	3417	July 5, 2019
Parameter estimation of an ODE in Julia is slower than in R Performance optim , r , differentialequation , parameters	25	1709	August 7, 2022
Why do I get the same parameter after the OptimizationProblem General Usage question	23	784	March 7, 2024

BFGS very slow compared to BlackBoxOptim - How to improve performance

Related topics