I’m trying to figure out how to interpret the occasional failure of an optimization routine to move form a candidate point. Below (at bottom) is output from Optim using NewtonTrustRegion, with autodiff = true. The neither the objective value (log likelihood) or gradient norm move in iterations 7 through 11, but then starting moving again. However, there are related behaviors (that are perhaps variants or perhaps unique) wherein the problem just gets stuck (LL and gradient norm never move) or stuck with gradient norm returning NaN but LL giving a valid value.
I haven’t figured out how to write MWE that replicates this behavior, and I am unsure how to troubleshoot this behavior, or even if it is troublesome (though I think it is).
Here are important details:
- The objective function is nested logit, and so is non-very quadratic and not concave. My intuition is that a particular parameter
lamenters choice probabilities like this:P = exp(u1/lam) / (exp(u1/lam) + exp(u2/lam)), and so there are overflow concerns aslam→ 0. This value does pop up - I was concerned about
NaNvalues messing up gradient construction, so I had coded inisnan(P)logic to deal with this. However, I am unsure howisnanplays withautodiff. - So I instead modified the choice probs to be
P = 1.0 / (1.0 + exp((u2-u1)/lam)), which is seems fully robust to +/-Infvalues from no more than one of{u1, u2}, but is not robust to, e.g.,{Inf, Inf}. - I am aware of
LogExpFunction.jland use them where I can, but there is not code that necessarily works in all situation when trying to write choice probabilities. - The obj function accumulates
ll += logeps(P)to deal withP=0.0, where to disallow-InfI define:
function logeps(x::T) where T
log(max(eps(T), x))
end
To summarize, I think my primary questions are
-
What is the best
autodiff-compatible way to deal with overflow in choice probabilities in a setting like this (e.g., nested logit)? -
Is this definition of
logepscompatible withautodiff? -
It may not be that
autodiffis causing this behavior. If not, what are candidates? -
Does anyone have ideas for how I can troubleshoot this more? A lot of
Optimandautodifffeels like a black box (I understand things in theory, but not always the implementation details). -
Are there functions/behaviors to avoid when writing complex objective functions intended to be used with
autodiff?
Reference Optim output:
Iter Function value Gradient norm
...
6 7.631247e+05 6.546539e+04
* time: 22.29205012321472
7 6.942308e+05 1.115398e+05
* time: 26.592118978500366
8 6.100967e+05 1.183649e+06
* time: 31.020194053649902
9 6.100967e+05 1.183649e+06
* time: 31.09648108482361
10 6.100967e+05 1.183649e+06
* time: 31.188799142837524
11 6.100967e+05 1.183649e+06
* time: 31.308167934417725
12 5.997016e+05 4.227758e+05
* time: 35.722825050354004