Connection between BFGS and ROOT.Minuit. Stopping criteria

misha_mikhasenko · October 21, 2020, 8:19am

Dear experts on optimization problems,

I am trying to establish a connection between the common tools in High Energy Physics (HEP) and Julia ecosystem.

In HEP, the main tool for optimization is Minuit library. Nowadays, the original Fortran code is literally converted to C++ and used entirely for all problems just because it is so reliable. (here are Yggdrasil binaries thanks to @jstrube, @giordano)

On the non-HEP side, one of the most reliable and widely used optimizers seems to be BFGS, particularly in Julia in NLopt or Optim.

Actually, it might turn that the two libraries are implementing the same algorithm with different stopping conditions.

Here is a quote from [the original Minuit paper] on the implemented algorithm (https://www.sciencedirect.com/science/article/pii/0010465575900399).

and the stopping criteria

So,

Is Minuit doing BFGS or its ancestor?
From my experience, EDM is a good indicator of convergence. Is there anything similar in the current implementation of optimizers?
Is there a way to compute EDM with Julia, e.g. in Optim?

Thanks

(pin @pkofod, @andreasnoack, @anriseth from Optim/bfgs.jl)

pkofod · October 21, 2020, 10:57am

It’s doing something very similar to BFGS, look up variable metric or quasi-Newton methods. The stopping criterion is a test of a “hypothesis” that the gradient is zero that only makes sense for statistical problems. No this is not implemented in Optim or NLopt but you could do it with a callback I think. Many of these methods (DFP, BFGS, …) come from the econometrics/statistics literature so you will sometimes find such stopping criterions in the original papers, but they’re rarely used in general purpose software.

misha_mikhasenko · October 21, 2020, 6:54pm

Many thanks for the reply.

What should I look for? Could you expand, please?

Yes, I would like to try something in this spirit. What do you have in mind for a callback?
EDM requires hessian that is very expensive unless it is a by-product of minimization.

the objective stopping condition looks like a great thing to have. With a numerical threshold g_tol I have the impression that sometimes it takes ages to reach 1e-8. However, it is not clear how to adjust it, i.e. if it is safe to do

Yuan-Ru-Lin · January 21, 2023, 6:26pm

The stopping criteria can be implemented as follows.

"""
See also MIGrad in Chapter 4: Minuit Commands, https://root.cern.ch/download/minuit.pdf
"""
function miniutestop(state)
    mt = state.metadata
    edm = mt["g(x)"]' * mt["~inv(H)"] * mt["g(x)"] / 2
    edm < 1e-3 * 0.1 * 1.0
end

Here is a minimal working example:

using Distributions, Optim

"""
See also MIGrad in Chapter 4: Minuit Commands, https://root.cern.ch/download/minuit.pdf
""" 
function miunitstop(state)
    mt = state.metadata
    edm = mt["g(x)"]' * mt["~inv(H)"] * mt["g(x)"] / 2
    edm < 1e-3 * 0.1 * 1.0
end

data = randn(1000)
res = optimize([-Inf, 0.0], [Inf, Inf], [0.0, 1.0], Fminbox(BFGS()), Optim.Options(extended_trace=true, callback=miunitstop)) do pars
    -loglikelihood(Normal(pars...), data)
end

Please note that the callback has to be added in Optim.Options along with extended_trace=true for those metadata to be accessible.

Note also that the inverted hessian used in the calculation of EDM is just an approximate one that happens to be used in BFGS. So, after the minimum is found, you may want to calculate an exact one with Zygote.hessian(pars -> -loglikelihood(Normal(pars...), data), res.minimizer) (especially when you need to invert the hessian matrix to get the covariance matrix).

Demo

The miunitstop can reduce the number of calls by ~100x, with effectively the same results:

julia> res = optimize([-Inf, 0.0], [Inf, Inf], [0.0, 1.0], Fminbox(BFGS()), Optim.Options(extended_trace=true, callback=miunitstop)) do pars
           -loglikelihood(Normal(pars...), data)
       end
 * Status: failure

 * Candidate solution
    Final objective value:     1.421423e+03

 * Found with
    Algorithm:     Fminbox with BFGS

 * Convergence measures
    |x - x'|               = 1.34e-02 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.34e-02 ≰ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
    |g(x)|                 = 2.06e-02 ≰ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    1
    f(x) calls:    9
    ∇f(x) calls:   9


julia> res2 = optimize([-Inf, 0.0], [Inf, Inf], [0.0, 1.0], Fminbox(BFGS())) do pars
           -loglikelihood(Normal(pars...), data)
       end
 * Status: success (objective increased between iterations)

 * Candidate solution
    Final objective value:     1.421423e+03

 * Found with
    Algorithm:     Fminbox with BFGS

 * Convergence measures
    |x - x'|               = 9.17e-11 ≰ 0.0e+00
    |x - x'|/|x'|          = 9.15e-11 ≰ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
    |g(x)|                 = 3.75e-08 ≰ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    4
    f(x) calls:    713
    ∇f(x) calls:   713

julia> res.minimizer
2-element Vector{Float64}:
 0.013204458490624406
 1.002498042266284

julia> res2.minimizer
2-element Vector{Float64}:
 0.013199480602918739
 1.002487701716321

jling · January 30, 2023, 7:48pm

that’s pretty amazing, for the record, in my proof of concept package GitHub - JuliaHEP/LiteHF.jl: Light-weight HistFactory in pure Julia, attempts to be compatible with `pyhf` json format
(this is suppose to be pyhf or HistFactory in Julia), I use something like
LiteHF.jl/teststatistics.jl at 69439e0e2ac0669e6c38a79ebafc4046cb749b9d · JuliaHEP/LiteHF.jl · GitHub

which diesn’t reproduce BFGS or ROOT Minuit result numerically, but gives very close final result. It’s potentially useful to include your implementation as a legacy/sanity check option since it uses Optim.jl which is a dependency already

jling · April 1, 2025, 7:34pm

idk why but in the current project I work on, I find Minuit often agrees more with NelderMead, esp a box-ed version of it. Do you have any insight there?

I guess Minuit.Migrad and NelderMead both do not require gradient… but otherwise I can’t think of why they’re similar, do they both use Simplex?

Yuan-Ru-Lin · April 3, 2025, 11:17pm

MIGRAD does require gradient. In the document of Minuit, Sec. 5.1.1 states

[MIGRAD’s] main weakness is that it depends heavily on knowledge of the first derivatives, and fails miserably if they are very inaccurate.

I need more context I saw your messages on the hep Slack channel. Let me take a closer look at them and we can continue this discussion there.

jling · April 3, 2025, 11:32pm

Right, but by default it would try to use the numerical approximation of it if user does not provide a gradient?

At one point I was calling IMinuit via PythonCall so it must not have access to auto-diff gradient. (If you’re reading this after 2025, you should use Minuit2.jl instead of calling python wrapper

misha_mikhasenko · April 12, 2025, 3:25pm

With Minuit2 being easy to access, I’d love to see a comparison of Minuit2 and BFGS. I remember that the algorithms are very similar, but Minuit2 has some settings magic:

parameter ranges (addressed)
stopping criterion (addressed)
initial steps - i have a vague memory that it determines initial steps. I use it sometimes for initial_invH

BFGS(; initial_invH = x -> initial_invH)

Would be cool to set up the initial state between Minuit2 and BFGS the same and see which arrives first

Topic		Replies	Views
`Optimization.LBFGS` fails to converge while `Optim.NelderMead()` works General Usage question , optim , optimization	11	284	March 29, 2025
Help with gradient-based optimizers Optimization (Mathematical) question	14	450	September 21, 2024
Why Nelder-Mead minimization without minimal property check? Optimization (Mathematical) optimization	27	2734	February 8, 2024
Blackbox optim: customize stopping rule General Usage optimization	0	35	January 5, 2025
Optim.jl not honoring convergence criteria Optimization (Mathematical)	6	619	November 15, 2023

Connection between BFGS and ROOT.Minuit. Stopping criteria

Demo

Related topics