DynamicHMC: Reached maximum number of iterations while bisecting interval for epsilon

@Tamas_Papp

I saw there is a previous thread discussing this problem, but I don’t quite get the solution so I’m starting a new one.

“Reached maximum number of iterations while bisecting interval for ϵ.”

I’ve seen this error message before and for that problem it turned out that my gradients were wrong. However, for this new problem, I’ve triple-checked my log density and gradients and don’t find any problem with them. Moreover, the parameter values from these iterations are quite close to the truth, which suggests to me that it kinda works, but I still get this error message. I also noticed that two things may happen to the parameter values, the log density, and the gradients: they either do not change after the first few iterations, or they oscillate between two sets of values. Then after many iterations this error message pops up.

I wonder if you have suggestions on what the issue might be?

It would be much easier to investigate what happens with a self-contained example I can run.

Unfortunately much of the code is unpublished work, but I’ll try to come up with a small example.

You can send it privately and I will treat it confidential.

Thank you for offering to help and I really appreciate it.

I found the problem: I need to create a gradient vector every time the function logdensity_and_gradient is called.

Previously I defined the problem as

struct MyProblem{T <: Real}
    ...
    ∇ :: Vector{T} # gradient
end

and calculated the gradient using

function LogDensityProblems.logdensity_and_gradient(problem::MyProblem{T}, pars) where T <: Real
    @unpack ..., ∇ = problem
    ...
    logl = ...
    copyto!(∇, ...)
    logl, ∇
end

But this won’t work. Instead, I need to use

function LogDensityProblems.logdensity_and_gradient(problem::MyProblem{T}, pars) where T <: Real
    ...
    logl = ...
    ∇ = fill(T(0), p)
    copyto!(∇, ...)
    logl, ∇
end

Although it runs now, I wonder if there is a way to avoid repeatedly allocating the gradient vector?

Not really, the API expects a fresh one. This keeps the code of DynamicHMC really clean in a functional style, at a small cost (for small dimensions, you can use SVector, for large ones, the cost of AD should dominate except for trivial problems).

That said, this was not explicitly documented, so I did

Got it. Thank you very much for the explanation.

1 Like

To see how much speed-up I could get by avoiding the repeated allocation (problem dimension p=500,000), I tried defining the gradient vector as a global variable (global \nabla = zeros(p)) and updating it in the logdensity_and_gradient function. However, the gradient doesn’t seem to be updated correctly.

Is this expected?

Possibly, but hard to say anything concrete without an MWE.

That said, with a dimension that high GC may be the least of your worries; a dimension that high may run into issues with NUTS (except for some special cases, independence or near-normality).