In case anyone is curious, the MWE for the offending code is
function f(param)
decay = exp(param) # make it positive
α = exp(-decay)
end
This “works” (for a given value of “works”
) for eg param = 800.0 but produces NaNs with AD. This is called inside a numerical solver where the solution is around param = 0.7, but large values are visited while the parameter space is explored.
Since the purpose of the first exp is to convert a parameter in \mathbb{R} to a positive number, I replaced it with logaddexp(zero(param), param) which does not suffer from numerical issues.
In the end I found it with the _finiteD checks as above.