# Gradient norm does not change in `Optim` using `autodiff`

I’m trying to figure out how to interpret the occasional failure of an optimization routine to move form a candidate point. Below (at bottom) is output from `Optim` using `NewtonTrustRegion`, with `autodiff = true`. The neither the objective value (log likelihood) or gradient norm move in iterations 7 through 11, but then starting moving again. However, there are related behaviors (that are perhaps variants or perhaps unique) wherein the problem just gets stuck (LL and gradient norm never move) or stuck with gradient norm returning `NaN` but LL giving a valid value.

I haven’t figured out how to write MWE that replicates this behavior, and I am unsure how to troubleshoot this behavior, or even if it is troublesome (though I think it is).

Here are important details:

• The objective function is nested logit, and so is non-very quadratic and not concave. My intuition is that a particular parameter `lam` enters choice probabilities like this: `P = exp(u1/lam) / (exp(u1/lam) + exp(u2/lam))`, and so there are overflow concerns as `lam` → 0. This value does pop up
• I was concerned about `NaN` values messing up gradient construction, so I had coded in `isnan(P)` logic to deal with this. However, I am unsure how `isnan` plays with `autodiff`.
• So I instead modified the choice probs to be `P = 1.0 / (1.0 + exp((u2-u1)/lam))`, which is seems fully robust to +/-`Inf` values from no more than one of `{u1, u2}`, but is not robust to, e.g., `{Inf, Inf}`.
• I am aware of `LogExpFunction.jl` and use them where I can, but there is not code that necessarily works in all situation when trying to write choice probabilities.
• The obj function accumulates `ll += logeps(P)` to deal with `P=0.0`, where to disallow `-Inf` I define:
``````function logeps(x::T) where T
log(max(eps(T), x))
end
``````

To summarize, I think my primary questions are

1. What is the best `autodiff`-compatible way to deal with overflow in choice probabilities in a setting like this (e.g., nested logit)?

2. Is this definition of `logeps` compatible with `autodiff`?

3. It may not be that `autodiff` is causing this behavior. If not, what are candidates?

4. Does anyone have ideas for how I can troubleshoot this more? A lot of `Optim` and `autodiff` feels like a black box (I understand things in theory, but not always the implementation details).

5. Are there functions/behaviors to avoid when writing complex objective functions intended to be used with `autodiff`?

Reference `Optim` output:

``````Iter     Function value   Gradient norm

...

6     7.631247e+05     6.546539e+04
* time: 22.29205012321472
7     6.942308e+05     1.115398e+05
* time: 26.592118978500366
8     6.100967e+05     1.183649e+06
* time: 31.020194053649902
9     6.100967e+05     1.183649e+06
* time: 31.09648108482361
10     6.100967e+05     1.183649e+06
* time: 31.188799142837524
11     6.100967e+05     1.183649e+06
* time: 31.308167934417725
12     5.997016e+05     4.227758e+05
* time: 35.722825050354004
``````
1 Like

Do you have constraints? Perhaps the solver is working on decreasing infeasibility and the objective simply does not move.
My guess for the constant step norm is that `1.183649e+06` is the current trust-region radius and the three consecutive steps within the trust region make the trust-region constraint active.

Thanks @cvanaret. No explicit constraints (see NB below), though I do use the old-school `lam = exp.(b[n:m])` trick to ensure some parameters are non-negative. So, as `b[n:m]``-Inf`, then `lam` → 0 and hence my overflow worry.

But that’s only a couple of parameters, most parameters do not have that behavior, and so I’m surprised that the gradient norm moves not at all.

NB: I’ve never quite gotten the `Fminbox` syntax down in my use standard use (which uses an anonymous function assignment plus autodiff), e.g.:

``````tdf = TwiceDifferentiable(vars -> ll_emsimp(vars, choices, tt, hvec), bni; autodiff = :forward)

optimize(tdf, bni, method = NewtonTrustRegion(), iterations = 200, show_trace = true, show_every = 1, g_tol = 1e-4)
``````