BFGS very slow compared to BlackBoxOptim - How to improve performance

If it takes 600s to simulate on it’s own, I imagine it will take a lot longer to compute a ForwardDiff gradient. Are you sure it should be taking this long, for a 3-state model? Maybe profile your code and check. Otherwise, try using BFGS() with the collocation cost described here: https://diffeq.sciml.ai/stable/analysis/parameter_estimation/#Two-Stage-method-(Non-Parametric-Collocation)

That should have much quicker gradient evaluations

1 Like

Sorry I should have been clearer. The data to fit is 600 seconds long. The simulation runs fairly fast (around 0.01 sec). Edited the post to clarify.

1 Like

You can check how fast gradient evaluation is taking by calling cost(p,g), where g is an empty array the size of the gradient. I highly doubt it will be taking very long in a such a small model. So I guess BFGS() is just taking a lot of iterations to get to the local minimum. You can check how many by probing the results structure, and see if it corresponds roughly to a 600 second optimisation time as a sanity check? Either way, you can warmstart BFGS by first using a collocation metric on your cost function (see my previous post), which might/not help

Are you sure the function is differentiable? Is the Hessian possibly singular? Could your initial approximation of the Hessian be poor? Any of these will give a quasi-Newton method problems.

Most black box methods are derivative-free and can get you something useful, but it may take many, many calls to the objective. If black box is happy and BFGS is not, something’s wrong with either the problem or the initial iterate for the solution or (less likely) the Hessian. BFGS with a line search will perform poorly if the initial data are poor.

Blackbox vs qradient-based is one of the topics in (shameless plug)

author=“C. T. Kelley”,
title="{Iterative Methods for Optimization}",
publisher=“SIAM”,
address=“Philadelphia”,
year=1999,
series=“Frontiers in Applied Mathematics”,
number=18

1 Like

There is a chance that this function might not be smooth because I use abs(), min() and max() functions. Is there anyway to check the Hessian in Julia ?

I have simulated the same model in Modelica and it did not trigger any state events which is how I would think this model does not have any discontinuities. However, Modelica uses symbolic manipulation and has its own way of reducing equations so I don’t know how it would compare to DifferentialEquations.jl

1 Like

Could you tell me which package this function is from ? does p mean the ODE problem here ?

I tried your collocation suggestion with the code snippet here and it is running for 15+ mins and still hasn’t stopped. I don’t know if this the right way to do the 2-stage method since the documentation didn’t have an example

cost_twostage = two_stage_method(prob,tsteps,data_ref_tr;
  kernel= :Epanechnikov,
  loss_func = L2Loss(tsteps, data_ref_tr;colloc_grad = colloc_grad(tsteps,data_ref_tr)),
  mpg_autodiff = false,
  verbose = true)
n_p = 8
p0 = repeat([0.1], n_p)
l_b = repeat([0.0],n_p)
u_b = repeat([1.0e2],n_p)
result_bfgs_twostage = Optim.optimize(cost_twostage,l_b,u_b,p0, Fminbox(BFGS()))

Edit:

running BFGS unbounded completes but the fitted parameters look bad and it is extremely slow per f(x) call

result_bfgs_twostage = Optim.optimize(cost_twostage,p0,BFGS())

* Status: success

 * Candidate solution
    Final objective value:     9.827941e+02

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 0.0e+00
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
    |g(x)|                 = 8.45e-07 ≰ 1.0e-08

 * Work counters
    Seconds run:   232  (vs limit Inf)
    Iterations:    245
    f(x) calls:    697
    ∇f(x) calls:   697

If your nonsmothness is only in functions like abs, max, and min. There are ways to fix that by approximating the function with a smooth function. This works amazingly well and the approximation does not have to be super good. I hate to burden you with technical papers, but the examples (not the theorems and proofs) in

author=“X. Chen”,
title=“Smoothing methods for nonsmooth, nonconvex minimization”,
year=2012,
journal=“Math. Prog.”,
volume=134,
pages=“71–99”

might help. There is even good stuff here

1 Like

I removed all the potential non-smooth elements and its still pretty slow on BFGS. The only ones remaining are the interpolated inputs now.

So I guess the slowdown might be from Interpolations.jl. I don’t have any other ideas otherwise.

Are you interpolating a nonsmooth function with a smooth interpolation? That is not a wise thing to do. Interpolating max with a 4th degree polynomial, for example, will hive you some very annoying oscillations.

I understand your concern here. The interpolations are however on measurement data that have to be fed as inputs to the thermal model. I don’t have a choice here.

In my case I was just using abs max and min to account for measurement drop outs. My point was the speed of BFGS didn’t budge even after taking out all the non smooth terms.

What kind of interpolation are you using? Piecewise linear? Polynomial? Trig? …

How are you getting gradients? Finite differences is a high-risk way to do it if the objective has internal interpolations.

I am just using the LinearInterpolation() object from Interpolations.jl. I think it is a BSpline with degree 1.

I honestly do not know gradients are being computed. My guess is the package is fully compatible with ForwardDiff.jl so it should provide it automatically. (I might be wrong here)

code snippet from the package documentation below. I have 8 of these interpolation objects as inputs to the model.

xs = 1:0.2:5
A = [f(x) for x in xs]

# linear interpolation
interp_linear = LinearInterpolation(xs, A)

If you are interpolating a non-differentiable function and/or computing gradients of a PL interpolation, that could really confuse BFGS.

2 Likes

You might want to look at the new symbolic passes we’ve released:

1 Like

Can you share a full code? What does your profile look like?

1 Like

Hello @ChrisRackauckas, full code below. The code needs input data (which I don’t own so can’t share in public), it won’t run without it.

## Functions and common constants for Thermal Tire Model
using Printf
using FileIO
using DataFrames
using Interpolations
using DifferentialEquations
using RecursiveArrayTools

## Build ODE Problem
function build_tire_thermal_ode(data,init_state,tspan,tstep,wheel_pos)
  ## convenience lambda functions to append wheel position
  wheelvar = (var) -> @sprintf("%s%s",var,wheel_pos)
  get_data = (var) -> data[!, wheelvar(var)]
  lin_interp = (var) -> LinearInterpolation(data[!, "Time"], get_data(var))
  get_init = (var) -> init_state[wheelvar(var)]

  ## Define Interpolations
  # TODO: figure out how to do multiple 1D interpolations of matrix columns; this is painful
  fxtire_itp = lin_interp("FxTire")
  fytire_itp = lin_interp("FyTire")
  fztire_itp = lin_interp("FzTire")
  vx_itp = lin_interp("vContactPatchX")
  alpha_itp = lin_interp("aTireSlip")
  kappa_itp = lin_interp("rTireSlip")
  r_loaded_itp = lin_interp("rTireLoaded")
  h_splitter_itp = lin_interp("hSplitterDeflected")
  t_carcass_itp = lin_interp("TTireCarcass")
  t_tread_itp = lin_interp("TTireSurface")
  t_air_itp = lin_interp("TTireAir")

  ## Nominal Tire Model parameters
  # using parameters from Limebeer and Mavros papers and scaling them using the fitting process
  # Environment
  t_ambient = init_state["TAmbient"]
  t_track = init_state["TTrack"]

  tire_size = [305, 54, 15]
  r_rim = 0.5 * tire_size[3] * 0.0254
  r_unloaded = init_state[wheelvar("rTireUnloaded")]
  tire_width = tire_size[1] / 1000
  area_tread = 2 * pi * r_unloaded * tire_width
  area_rim = 2 * pi * r_rim * tire_width
  area_sidewall = pi * (r_unloaded^2 - r_rim^2)

  # air
  tire_air_volume = area_sidewall * tire_width
  air_density = 2.5738
  m_air = air_density * tire_air_volume

  # tread
  tread_depth = 0.00254
  tread_density = 1.10e03
  tread_volume = tread_depth * tread_density
  m_tread = area_tread * tread_volume

  # carcass
  m_tire = 9.072
  m_carcass = m_tire - m_tread
  carcass_depth = tread_depth * 2
  carcass_volume = carcass_depth * area_tread

  # road
  road_depth = 20e-3

  # specific heat
  cp_air = 0.718e3 # J/Kg/K
  cp_carcass = 1.7652e3
  cp_tread = 1.560e3

  # heat transfer coeffs
  h_tread2road = 7.5e-1 / road_depth
  h_tread2carcass = 3.03e-1 / carcass_depth
  h_natconv = 5.0
  h_forcedconv = 7.0
  h_carcass2air = 8.0e1
  h_deflection = 2.79e-4*60.0

  ## Thermal Model - using nominal physical heat transfer coefficients
  smoothabs = x -> sqrt(x^2 + 10^-10) # smooth version of abs
  function tire_thermal_model!(du, u, p, t)
    t_tread, t_carcass, t_air = u
    # scaling parameters
    p_friction, p_forcedconv, p_tread2road, p_deflection,
    p_natconv, p_carcass2air, p_tread2carcass, p_air2ambient = p
    fxtire = fxtire_itp(t)
    fytire = fytire_itp(t)
    fztire = fztire_itp(t)
    vx = vx_itp(t)
    alpha = alpha_itp(t)
    kappa = kappa_itp(t)
    r_loaded = r_loaded_itp(t)
    h_splitter = h_splitter_itp(t)

    # crude approximation of tire contact area purely based on geometry
    area_tread_forced_air = h_splitter * tire_width
    area_tread_contact = tire_width * 2 * sqrt(r_unloaded^2 - r_loaded^2)

    # heat transfer rates
    q_friction = p_friction * vx * (smoothabs(fytire * tan(alpha)) + smoothabs(fxtire * kappa))
    q_tread2ambient_forcedconv = p_forcedconv * h_forcedconv * area_tread_forced_air * (t_tread - t_ambient) * vx^0.805
    q_tread2ambient_natconv = p_natconv * h_natconv * (area_tread - area_tread_contact) * (t_tread - t_ambient)
    q_tread2carcass = p_tread2carcass * h_tread2carcass * area_tread * (t_tread - t_carcass)
    q_carcass2air = p_carcass2air * h_carcass2air * area_tread * (t_carcass - t_air)
    q_carcass2ambient_natconv = p_natconv * h_natconv * area_sidewall * (t_carcass - t_ambient)
    q_tread2road = p_tread2road * h_tread2road * area_tread_contact * (t_tread - t_track)
    q_deflection = p_deflection * h_deflection * vx * smoothabs(fztire)
    q_air2ambient = p_air2ambient * h_natconv * area_rim * (t_air - t_ambient)

    du[1] = der_t_tread = (q_friction - q_tread2carcass - q_tread2road - q_tread2ambient_forcedconv - q_tread2ambient_natconv)/(m_tread * cp_tread)
    du[2] = der_t_carcass = (q_tread2carcass + q_deflection - q_carcass2air - q_carcass2ambient_natconv)/(m_carcass * cp_carcass)
    du[3] = der_t_air = (q_carcass2air - q_air2ambient)/(m_air * cp_air)
  end

  ## Build ODE Problem
  # initial state
  u0 = map(get_init,["TTireSurface","TTireCarcass","TTireAir"])
  n_p = 8
  p0 = repeat([0.1],n_p)
  ode_prob = ODEProblem(tire_thermal_model!, u0, tspan, p0)

  ## Get Reference Data
  tsteps = collect(tspan[1]:tstep:tspan[2])
  data_ref = convert(Array, VectorOfArray([t_tread_itp(tsteps), t_carcass_itp(tsteps), t_air_itp(tsteps)]))

  return (ode_prob,data_ref)
end

## ODE Solution Plotting
using Plots
plotly()

function solve_ode_prob(prob,p,tspan,tsteps)
    prob_remade = remake(prob;p = p,tpsan = tspan)
    return solve(prob_remade, Tsit5(), saveat = tsteps)
end

function evaluate_ode_prob(prob,p,tspan,tstep,data_ref,plot_title)
  tsteps = collect(tspan[1]:tstep:tspan[2])
  @time sol = solve_ode_prob(prob,p,tspan,tsteps)
  plt = plot(sol, title = plot_title, label = ["Tread" "Carcass" "Air"], xlabel = "Time (s)", ylabel = "Temperature (K)",legend=:topleft)
  # size=(1200,600)
  plot!(tsteps, data_ref, label="", ls = :dot)
  display(plt)
  return sol
end

## Training - Build ODE Parameter Estimation Problem
using DifferentialEquations, DiffEqParamEstim
function build_ode_estimation_prob(tspan,tstep,data_train,init_train)
  tsteps = collect(tspan[1]:tstep:tspan[2])
  prob, data_ref = build_tire_thermal_ode(data_train,init_train,tspan,tstep,wheel_pos)
  cost_function = build_loss_objective(prob, Tsit5(), L2Loss(tsteps, transpose(data_ref));
  maxiters=500000,verbose=true)
  return prob, data_ref,cost_function
end

## Load data
using DataFrames, Printf, FileIO
data_dir = joinpath(pwd(), "data", "processed")
wheel_pos = "RF"
wheelvar = (var) -> @sprintf("%s%s",var,wheel_pos)
filename_train = "Phoenix_R18_50Hz"
filepath_train = joinpath(data_dir,@sprintf("%s.jld2",filename_train))
filename_validate = "Phoenix_R17_50Hz"
filepath_validate = joinpath(data_dir,@sprintf("%s.jld2",filename_validate))
# read initial conditions and reference data
data_train,init_train = load(filepath_train, "df_data", "init_state")
data_validate,init_validate = load(filepath_validate, "df_data", "init_state")

## simulation constants
sampling_rate = 50.0 # Hz
tstep = 1/sampling_rate
t_end = 600.0 # end time
tspan = (0.0, t_end)
tsteps = collect(tspan[1]:tstep:tspan[2])

## Parameters
n_p = 8
param_names = [:p_friction,:p_forcedconv,:p_tread2road,:p_deflection, :p_natconv, :p_carcass2air, :p_tread2carcass, :p_air2ambient]
display_param = param_val -> display(Dict(param_names.=>param_val))
p0 = repeat([0.1], n_p)

## Build ODE Problem
prob, data_ref,cost_function = build_ode_estimation_prob((0.0,600.0),tstep,data_train,init_train)
data_ref_tr = transpose(data_ref)

## BlackBoxOptim
# [2.010205352273235, 3.6567658848983844, 86.61267821640898, 0.039428524429510406, 18.151228262926487, 0.0188954545517476, 16.84746373039449, 0.06875481085963224]
using BlackBoxOptim
res = bboptimize(cost_function; SearchRange = (0.0, 1.0e2),MaxSteps=100000, NumDimensions = 8)
p_bb = best_candidate(res)
display_param(p_bb)
_ = evaluate_ode_prob(prob,p_bb,(0.0,600.0),tstep,data_ref,"BBOptimize")

## BFGS
using Optim
prob, data_ref,cost_function = build_ode_estimation_prob((0.0,600.0),tstep,data_train,init_train)
l_b = repeat([0.0],n_p)
u_b = repeat([2.0e2],n_p)
result_bfgs = Optim.optimize(cost_function,l_b,u_b,p_bb, Fminbox(BFGS()))
print(result_bfgs)
p_bfgs = result_bfgs.minimizer
display_param(p_bfgs)
_ = evaluate_ode_prob(prob,p_bfgs,(0.0,600.0),tstep,data_ref,"BFGS")

## Validation
init_validate[wheelvar("TTireCarcass")] = init_train[wheelvar("TTireCarcass")]
init_validate[wheelvar("TTireAir")] = init_train[wheelvar("TTireAir")]
init_validate["TAmbient"] = 288.706
init_validate["TTrack"] = 301.483
prob_val, data_val = build_tire_thermal_ode(data_validate,init_validate,tspan,tstep,wheel_pos)
_ = evaluate_ode_prob(prob_val,p_bfgs,tspan,tstep,data_val,"Validation")

This is the profiling result (https://pastebin.com/7MWyzkrS) for BFGS optimization using the code below (I had to time limit the function otherwise it wouldn’t finish)

Profile.init(n = 10^8)
Profile.clear()
@profile Optim.optimize(cost_function,l_b,u_b,p_bb, Fminbox(BFGS()),Optim.Options(time_limit=300.0))
Profile.print(format=:flat)

It’s hard to help without being able to run it, so I’ll just throw out some standard guesses.

BFGS requires gradients. What you’re not showing is how many iterations it takes. My guess is that it’s taking more than you thought because your tolerances on the gradient calculation are too high. Decrease the solver abstol and reltol for better gradients. Also, do BFGS(initial_stepnorm = 0.01) to not rely as much on the initial Hessian, or directly set the initial Hessian.

What can happen is that if you allow incorrect gradients, you can move to a bad part of parameter space where it’s more stiff and the solving is slow, and it could take awhile to get back to reasonable parameters (or might never do it). So you want to avoid that by ensuring accuracy on the derivatives, and also not allow the first few steps to have too large of an influence before an an accurate Hessian approximation is found.

2 Likes

I mean call your cost function (in this case cost_twostage) with two arguments. The one you synthesised using DiffEqParamEstim. If you call it with two arguments, it mutates the second argument to give the gradient. Which gives an idea of how long each individual gradient calculation is taking

If even the two stage method is resulting in long times, then the gradient might be numerically bad. So the optimisation algorithm gets confused/makes bad steps and takes ages to settle. Why are you setting mpg_autodiff = false? That reverts to a finite difference gradient, which can be really inaccurate in the type of problem you have.

Sorry I haven’t used the two stage method of the package before. I implemented my own (almost surely worse) version of it a while ago. It will always be really quick as you’re not differentiating through the ODE solution to get the gradient.

Also, haven’t mentioned it yet but almost surely if you need more control over the optimization you should be using DiffEqFlux.jl. We’re very soon going to be reforming the DifferentialEquations.jl documentation to help push people in that direction.

1 Like

A few suggestions:

  1. Make sure the gradient is correct. A lot of the times that I struggled with gradient-based optimisation algorithms, the gradient was wrong. So define the cost function and check that it differentiates correctly using any AD package and finite difference. There might be an AD bug, unlikely but not impossible.
  2. Try other algorithms except BFGS. If your cost function’s curvature is changing often, BFGS is likely a bad choice for an algorithm here because it tries to capture “global curvature information” in the approximate inverse Hessian which can be complete gibberish if your cost function’s curvature changes too often. GradientDescent and ConjugateGradient are 2 alternatives I would try.
  3. Benchmark your function and its gradient and check for type instabilities with Float64 inputs and ForwardDiff.Dual inputs. It’s possible that your function is type stable when run with 1 input type but not type stable when run with another input type.
  4. Consider using reverse-mode AD to define the gradient if it’s too slow. You can pass the gradient function explicitly to Optim.
  5. Loosen the tolerance as Chris suggests above and see how loose it’s allowed to be while still converging to a reasonable solution.
1 Like