Connection between BFGS and ROOT.Minuit. Stopping criteria

Dear experts on optimization problems,

I am trying to establish a connection between the common tools in High Energy Physics (HEP) and Julia ecosystem.

In HEP, the main tool for optimization is Minuit library. Nowadays, the original Fortran code is literally converted to C++ and used entirely for all problems just because it is so reliable. (here are Yggdrasil binaries thanks to @jstrube, @giordano)

On the non-HEP side, one of the most reliable and widely used optimizers seems to be BFGS, particularly in Julia in NLopt or Optim.

Actually, it might turn that the two libraries are implementing the same algorithm with different stopping conditions.

Here is a quote from [the original Minuit paper] on the implemented algorithm (https://www.sciencedirect.com/science/article/pii/0010465575900399).
image

and the stopping criteria

So,

  • Is Minuit doing BFGS or its ancestor?
  • From my experience, EDM is a good indicator of convergence. Is there anything similar in the current implementation of optimizers?
  • Is there a way to compute EDM with Julia, e.g. in Optim?

Thanks

(pin @pkofod, @andreasnoack, @anriseth from Optim/bfgs.jl)

1 Like

It’s doing something very similar to BFGS, look up variable metric or quasi-Newton methods. The stopping criterion is a test of a “hypothesis” that the gradient is zero that only makes sense for statistical problems. No this is not implemented in Optim or NLopt but you could do it with a callback I think. Many of these methods (DFP, BFGS, …) come from the econometrics/statistics literature so you will sometimes find such stopping criterions in the original papers, but they’re rarely used in general purpose software.

2 Likes

Many thanks for the reply.

What should I look for? Could you expand, please?

Yes, I would like to try something in this spirit. What do you have in mind for a callback?
EDM requires hessian that is very expensive unless it is a by-product of minimization.

the objective stopping condition looks like a great thing to have. With a numerical threshold g_tol I have the impression that sometimes it takes ages to reach 1e-8. However, it is not clear how to adjust it, i.e. if it is safe to do

The stopping criteria can be implemented as follows.

"""
See also MIGrad in Chapter 4: Minuit Commands, https://root.cern.ch/download/minuit.pdf
"""
function miniutestop(state)
    mt = state.metadata
    edm = mt["g(x)"]' * mt["~inv(H)"] * mt["g(x)"] / 2
    edm < 1e-3 * 0.1 * 1.0
end

Here is a minimal working example:

using Distributions, Optim

"""
See also MIGrad in Chapter 4: Minuit Commands, https://root.cern.ch/download/minuit.pdf
""" 
function miunitstop(state)
    mt = state.metadata
    edm = mt["g(x)"]' * mt["~inv(H)"] * mt["g(x)"] / 2
    edm < 1e-3 * 0.1 * 1.0
end

data = randn(1000)
res = optimize([-Inf, 0.0], [Inf, Inf], [0.0, 1.0], Fminbox(BFGS()), Optim.Options(extended_trace=true, callback=miunitstop)) do pars
    -loglikelihood(Normal(pars...), data)
end

Please note that the callback has to be added in Optim.Options along with extended_trace=true for those metadata to be accessible.

Note also that the inverted hessian used in the calculation of EDM is just an approximate one that happens to be used in BFGS. So, after the minimum is found, you may want to calculate an exact one with Zygote.hessian(pars -> -loglikelihood(Normal(pars...), data), res.minimizer) (especially when you need to invert the hessian matrix to get the covariance matrix).


Demo

The miunitstop can reduce the number of calls by ~100x, with effectively the same results:

julia> res = optimize([-Inf, 0.0], [Inf, Inf], [0.0, 1.0], Fminbox(BFGS()), Optim.Options(extended_trace=true, callback=miunitstop)) do pars
           -loglikelihood(Normal(pars...), data)
       end
 * Status: failure

 * Candidate solution
    Final objective value:     1.421423e+03

 * Found with
    Algorithm:     Fminbox with BFGS

 * Convergence measures
    |x - x'|               = 1.34e-02 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.34e-02 ≰ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
    |g(x)|                 = 2.06e-02 ≰ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    1
    f(x) calls:    9
    ∇f(x) calls:   9


julia> res2 = optimize([-Inf, 0.0], [Inf, Inf], [0.0, 1.0], Fminbox(BFGS())) do pars
           -loglikelihood(Normal(pars...), data)
       end
 * Status: success (objective increased between iterations)

 * Candidate solution
    Final objective value:     1.421423e+03

 * Found with
    Algorithm:     Fminbox with BFGS

 * Convergence measures
    |x - x'|               = 9.17e-11 ≰ 0.0e+00
    |x - x'|/|x'|          = 9.15e-11 ≰ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
    |g(x)|                 = 3.75e-08 ≰ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    4
    f(x) calls:    713
    ∇f(x) calls:   713

julia> res.minimizer
2-element Vector{Float64}:
 0.013204458490624406
 1.002498042266284

julia> res2.minimizer
2-element Vector{Float64}:
 0.013199480602918739
 1.002487701716321
5 Likes

that’s pretty amazing, for the record, in my proof of concept package GitHub - JuliaHEP/LiteHF.jl: Light-weight HistFactory in pure Julia, attempts to be compatible with `pyhf` json format
(this is suppose to be pyhf or HistFactory in Julia), I use something like
LiteHF.jl/teststatistics.jl at 69439e0e2ac0669e6c38a79ebafc4046cb749b9d · JuliaHEP/LiteHF.jl · GitHub

which diesn’t reproduce BFGS or ROOT Minuit result numerically, but gives very close final result. It’s potentially useful to include your implementation as a legacy/sanity check option since it uses Optim.jl which is a dependency already

2 Likes