So, my student has a problem where he’s just been letting maximum_a_priori run for as long as it needs… which is taking like 2000+ seconds… I suggested to try abstol or reltol to make sure it wasn’t spending all its time on refining the last representable decimal place… Sure enough, using
maxtime=200:
203.660970 seconds (10.85 M allocations: 6.851 GiB, 0.69% gc time)
ModeResult with maximized lp of -22226.09
maxtime=100 we get:
103.657475 seconds (7.72 M allocations: 5.732 GiB, 0.38% gc time)
ModeResult with maximized lp of -22228.50
60 seconds:
63.676032 seconds (6.35 M allocations: 5.243 GiB, 0.57% gc time)
ModeResult with maximized lp of -22237.03
Obviously in these problems running for more than 100 ish seconds has highly diminishing returns to improving the log_probability_density, but there are many of these problems on different data sets and rather than guessing a sufficient time, it’d be better to say, stop when the LP doesn’t improve by more than say 0.5 or something (ie. abstol=0.5)
but when I set abstol it ALWAYS runs to the end of the maxtime it never stops early. Even with abstol=50 or more. This is using LBFGS() algorithm. same with reltol.
Are abstol and reltol not honored? Or maybe just when using this algorithm? or what?
Thought this might be a known issue so someone would just pipe up, but apparently not?
Anyway I’ll work on a MWE and post it later today, we can see if in fact it’s a problem or just something specific to our code.
using Turing,Distributions,OptimizationOptimJL,PDMats
@model function mwe(mat)
a ~ MvNormal(fill(0.0,20),mat)
end
covmat = PDiagMat(rand(Uniform(.1,10),20))
initval = fill(1.0,20)
vec = maximum_a_posteriori(mwe(covmat),Optim.LBFGS(); initial_params=initval,maxiters=50)
vec2 = maximum_a_posteriori(mwe(covmat),Optim.LBFGS(); initial_params=initval,abstol=12.0,maxiters=50)
norm(vec.values) |>display
norm(vec2.values) |> display
lpmax = logpdf(MvNormal(fill(0.0,20),covmat),fill(0.0,20))
diff = lpmax - vec.lp
diff2 = lpmax - vec2.lp
Ok, so doing this, the second optimization does come to an earlier stop with a larger error in the lp value. So evidently abstol DID work. So, now I’ll show this to my student and see if there’s something we can do to make abstol work for us as well.
I’ll come back and report what we find.
2 Likes
Using my MWE on the current version of Turing works, but I think on the older version of Turing that was in our Manifest.toml from a while back it DOES NOT work. At least, my colleague reports that the MWE doesn’t stop early using abstol with that older version. I’m not sure what version that was but it might be that a bug got fixed in the interim between now and whenever in the past we added Turing to our project.
So, if you’re using optimization to find max a-priori estimates with Turing, use at least a late 2025 era version of all relevant packages.
dlakelan’s colleague here. Apologies for the confusion, but the problem seems not related to the Turing version. Rather abstol sometimes works and sometimes doesn’t, depending on the problem. Setting abstol = 12.0 does not stop early, if the covariance matrix is highly correlated.
using Turing, LinearAlgebra, Optim
N = 20
@model function mwe(mat)
a ~ MvNormal(fill(0.0, size(mat)[1]), mat)
end
covmat1 = Diagonal(rand(Uniform(.1, 10), N))
x = rand(Uniform(.1, 10), N);
covmat2 = x * x' + .01I
initval = fill(10.0, N);
##### model with covariance matrix 1
vec = maximum_a_posteriori(mwe(covmat1), Optim.LBFGS(); initial_params = initval);
vec2 = maximum_a_posteriori(mwe(covmat1), Optim.LBFGS(); initial_params = initval, abstol = 12.0);
norm(vec.values) |>display
norm(vec2.values) |> display
lpmax = logpdf(MvNormal(fill(0.0, N), covmat1), fill(0.0, N));
diff = lpmax - vec.lp
diff2 = lpmax - vec2.lp
#### model with covariance matrix 2
vec = maximum_a_posteriori(mwe(covmat2), Optim.LBFGS(); initial_params = initval);
vec2 = maximum_a_posteriori(mwe(covmat2), Optim.LBFGS(); initial_params = initval, abstol = 12.0);
lpmax = logpdf(MvNormal(fill(0.0, N),covmat2), fill(0.0, N));
diff = lpmax - vec.lp
diff2 = lpmax - vec2.lp
While the first does stop early, the second does not
So, I ran the code adding show_trace=true and upping the abstol = 12000.0 and in fact the correlated case does terminate early with the abstol by 1 iteration… basically this fairly simple quadratic problem converges really quickly so is not a great test case, but abstol DOES work, with current versions of Turing etc:
without the abstol, it does 0,1,2,3,4 iterations with the abstol it stops after iteration number 3
I don’t know what abstol is actually measuring. Here are the iteration, function value, and gradient norm calcs:
0 3.222598e+07 3.314722e+04
1 9.291964e+02 8.245096e-01
2 -2.231984e+01 1.105896e-03
3 -2.231984e+01 3.099994e-08
4 -2.231984e+01 7.543506e-24
So if abstol is 12000.0 I would think it should stop at iteration 2 where the function improvement was about 951.5 instead it does iteration 3 and stops there where the function improvement is 0 to 7 decimal places. while without the abstol it stops at iteration 4 where the function improvement was 0 to many many decimal places and the gradient was 0 to 24 decimal places.
I don’t have the time right now to look into this (sorry), but I would suggest trying to cut out the Turing model, i.e. just use Optimization.jl directly on the function
mat = ...
dist = MvNormal(fill(0.0, size(mat)[1]), mat)
f(x) = -logpdf(dist, x)
and see if you still observe the same behaviour.
The maximum_a_posteriori function is pretty much a thin wrapper around this. Turing also makes sure the parameters are transformed to unconstrained space, but MvNormal is already unconstrained so that should be a no-op.