Maximum_a_posteriori not honoring abstol and reltol? Turing.jl

dlakelan · October 23, 2025, 9:28pm

So, my student has a problem where he’s just been letting maximum_a_priori run for as long as it needs… which is taking like 2000+ seconds… I suggested to try abstol or reltol to make sure it wasn’t spending all its time on refining the last representable decimal place… Sure enough, using

maxtime=200:
203.660970 seconds (10.85 M allocations: 6.851 GiB, 0.69% gc time)
ModeResult with maximized lp of -22226.09

maxtime=100 we get:

103.657475 seconds (7.72 M allocations: 5.732 GiB, 0.38% gc time)
ModeResult with maximized lp of -22228.50

60 seconds:

63.676032 seconds (6.35 M allocations: 5.243 GiB, 0.57% gc time)
ModeResult with maximized lp of -22237.03

Obviously in these problems running for more than 100 ish seconds has highly diminishing returns to improving the log_probability_density, but there are many of these problems on different data sets and rather than guessing a sufficient time, it’d be better to say, stop when the LP doesn’t improve by more than say 0.5 or something (ie. abstol=0.5)

but when I set abstol it ALWAYS runs to the end of the maxtime it never stops early. Even with abstol=50 or more. This is using LBFGS() algorithm. same with reltol.

Are abstol and reltol not honored? Or maybe just when using this algorithm? or what?

dlakelan · October 24, 2025, 2:40pm

Thought this might be a known issue so someone would just pipe up, but apparently not?

Anyway I’ll work on a MWE and post it later today, we can see if in fact it’s a problem or just something specific to our code.

dlakelan · October 24, 2025, 8:22pm


using Turing,Distributions,OptimizationOptimJL,PDMats


@model function mwe(mat)
    a ~ MvNormal(fill(0.0,20),mat)
end

covmat = PDiagMat(rand(Uniform(.1,10),20))

initval = fill(1.0,20)

vec = maximum_a_posteriori(mwe(covmat),Optim.LBFGS(); initial_params=initval,maxiters=50)
vec2 = maximum_a_posteriori(mwe(covmat),Optim.LBFGS(); initial_params=initval,abstol=12.0,maxiters=50)

norm(vec.values) |>display
norm(vec2.values) |> display

lpmax = logpdf(MvNormal(fill(0.0,20),covmat),fill(0.0,20))

diff = lpmax - vec.lp
diff2 = lpmax - vec2.lp

Ok, so doing this, the second optimization does come to an earlier stop with a larger error in the lp value. So evidently abstol DID work. So, now I’ll show this to my student and see if there’s something we can do to make abstol work for us as well.

I’ll come back and report what we find.

dlakelan · October 27, 2025, 9:54pm

Using my MWE on the current version of Turing works, but I think on the older version of Turing that was in our Manifest.toml from a while back it DOES NOT work. At least, my colleague reports that the MWE doesn’t stop early using abstol with that older version. I’m not sure what version that was but it might be that a bug got fixed in the interim between now and whenever in the past we added Turing to our project.

So, if you’re using optimization to find max a-priori estimates with Turing, use at least a late 2025 era version of all relevant packages.

khoffie · October 28, 2025, 7:44am

dlakelan’s colleague here. Apologies for the confusion, but the problem seems not related to the Turing version. Rather abstol sometimes works and sometimes doesn’t, depending on the problem. Setting abstol = 12.0 does not stop early, if the covariance matrix is highly correlated.

using Turing, LinearAlgebra, Optim

N = 20

@model function mwe(mat)
    a ~ MvNormal(fill(0.0, size(mat)[1]), mat)
end

covmat1 = Diagonal(rand(Uniform(.1, 10), N))

x = rand(Uniform(.1, 10), N);
covmat2 = x * x' + .01I

initval = fill(10.0, N);

##### model with covariance matrix 1
vec = maximum_a_posteriori(mwe(covmat1), Optim.LBFGS(); initial_params = initval);
vec2 = maximum_a_posteriori(mwe(covmat1), Optim.LBFGS(); initial_params = initval, abstol = 12.0);

norm(vec.values) |>display
norm(vec2.values) |> display

lpmax = logpdf(MvNormal(fill(0.0, N), covmat1), fill(0.0, N));
diff = lpmax - vec.lp
diff2 = lpmax - vec2.lp

#### model with covariance matrix 2
vec = maximum_a_posteriori(mwe(covmat2), Optim.LBFGS(); initial_params = initval);
vec2 = maximum_a_posteriori(mwe(covmat2), Optim.LBFGS(); initial_params = initval, abstol = 12.0);

lpmax = logpdf(MvNormal(fill(0.0, N),covmat2), fill(0.0, N));
diff = lpmax - vec.lp
diff2 = lpmax - vec2.lp

While the first does stop early, the second does not

dlakelan · October 29, 2025, 6:00pm

So, I ran the code adding show_trace=true and upping the abstol = 12000.0 and in fact the correlated case does terminate early with the abstol by 1 iteration… basically this fairly simple quadratic problem converges really quickly so is not a great test case, but abstol DOES work, with current versions of Turing etc:

without the abstol, it does 0,1,2,3,4 iterations with the abstol it stops after iteration number 3

I don’t know what abstol is actually measuring. Here are the iteration, function value, and gradient norm calcs:

     0     3.222598e+07     3.314722e+04
     1     9.291964e+02     8.245096e-01
     2    -2.231984e+01     1.105896e-03
     3    -2.231984e+01     3.099994e-08
     4    -2.231984e+01     7.543506e-24

So if abstol is 12000.0 I would think it should stop at iteration 2 where the function improvement was about 951.5 instead it does iteration 3 and stops there where the function improvement is 0 to 7 decimal places. while without the abstol it stops at iteration 4 where the function improvement was 0 to many many decimal places and the gradient was 0 to 24 decimal places.

penelopeysm · November 1, 2025, 12:14am

I don’t have the time right now to look into this (sorry), but I would suggest trying to cut out the Turing model, i.e. just use Optimization.jl directly on the function

mat = ...
dist = MvNormal(fill(0.0, size(mat)[1]), mat)
f(x) = -logpdf(dist, x)

and see if you still observe the same behaviour.

The maximum_a_posteriori function is pretty much a thin wrapper around this. Turing also makes sure the parameters are transformed to unconstrained space, but MvNormal is already unconstrained so that should be a no-op.

Topic		Replies	Views
[ANN]: Turing 0.13.0 - MLE/MAP and prediction Probabilistic Programming	6	2993	May 25, 2020
Set the solver tolerance in Optimization Based Bound Tightening (OBBT) Optimization (Mathematical)	2	614	November 21, 2018
Optim returns slightly different result in unit testing environment General Usage optim	6	571	September 3, 2018
Bayesian logistic regression with Turing.jl Probabilistic Programming turing , monte-carlo	29	4580	May 18, 2021
Is there a Julia equivalent of scipy.optimize.minimize(method='TNC')? Optimization (Mathematical)	23	2616	August 21, 2020

Maximum_a_posteriori not honoring abstol and reltol? Turing.jl

Related topics