Optim user-supplied gradient not working

tyleransom · May 21, 2019, 1:43am

I’m estimating a maximum likelihood problem, and I’ve specified both the objective function and the gradient correctly. (I know this because I am able to reproduce estimates using the same exact data as a Stata example.)

When I allow Optim’s ForwardDiff to compute the gradient and hessian, I get the correct answer, and the gradient evaluated at the optimum is very close to zero (as expected). [see code below:]

    td = TwiceDifferentiable(f, bstart, autodiff = :forwarddiff)
    rs = optimize(td, bstart, LBFGS(; linesearch = LineSearches.BackTracking()), Optim.Options(iterations=100_000,g_tol=1e-8,f_tol=1e-8))
    q = Array{Float64}(undef, length(β))
    grad = g!(q,β)
    gradzero = g!(q,zeros(size(β)))

However, when I instead pass my user-defined gradient to Optim, the algorithm terminates at the initial point (and the gradient evaluates to what it would be at the initial point). [see code below; only difference is I added g! to the TwiceDifferentiable call]

    td = TwiceDifferentiable(f, g!, bstart, autodiff = :forwarddiff)
    rs = optimize(td, bstart, LBFGS(; linesearch = LineSearches.BackTracking()), Optim.Options(iterations=100_000,g_tol=1e-8,f_tol=1e-8))
    q = Array{Float64}(undef, length(β))
    grad = g!(q,β)
    gradzero = g!(q,zeros(size(β)))

The full code is available here (autodiff example) and here (user-defined example). [The two files are identical except for adding the g! to the TwiceDifferentiable call]

I’m using Optim v0.18.1.

Does anyone know what might be going on? It seems like I’m missing something basic, but not immediately apparent.

Also, if anyone has performance tips, they would be most welcome! I plan on using the fg! capability once I figure out what’s going on with the g! call, since there is so much computational overlap between them.

Finally, if anyone knows how to precompile this function, I’d really appreciate a pointer.

Tamas_Papp · May 21, 2019, 5:44am

You might want to consider comparing the manually coded gradient to

ForwardDiff, and
finite differences

pkofod · May 21, 2019, 8:59am

link to offending line
You’re not writing into the correct cache, as you’re binding a completly new array to G. You can try replacing that line with G .= T(0).

kristoffer.carlsson · May 21, 2019, 2:28pm

Doing something like

    q = Array{Float64}(undef, length(β))
    grad = g!(q,β)
    gradzero = g!(q,zeros(size(β)))

is likely to confuse you.

Note that here, q, grad and gradzero points to the same array.

It is better to not use the output value for a mutating function (it can just return nothing) which would make this more obvious:

q = Array{Float64}(undef, length(β))
g!(q,β)
g!(q,zeros(size(β)))

that there is only one q.

tyleransom · May 21, 2019, 2:40pm

Thanks for pointing that out; I’m still trying to get a hang of mutating functions, so I appreciate your advice.

I followed @Tamas_Papp’s advice and it looks like I’m still not understanding something basic.

I now have the following near the end of my main function:

q = Array{Float64}(undef, length(β))
g = x -> ForwardDiff.gradient(f, x)

return β,se,ℓ,g!(q,β),g!(q,zeros(size(β))),g(β)

which, for the last three outputs, gives me

julia> [g1 g0 gee]
5×3 Array{Float64,2}:
   -93.6667    -93.6667   0.00814071
   -72.0       -72.0     -0.00332
    34.3333     34.3333  -0.0106256
    32.0        32.0      0.0055671
 -1345.0     -1345.0      0.0099657

I don’t understand why g!(q,β) (column 1) is giving something different than g(β) (column 3).

kristoffer.carlsson · May 21, 2019, 2:49pm

First of all, change your g! to return nothing because you are doing the mistake again that I just pointed out. In your example the second call to g! will overwrite the q that got set in the first call.

So you need to pass independent q into each g! call if you want to compare the result of it on different inputs. In fact, the whole point of g! is to reuse the same array over and over as to not reallocate it on each call.

So something like

g1 = Array{Float64}(undef, length(β))
g!(g1,β)
g2 = similar(g1)
g!(g2, zeros(size(β)))

Now g1 and g2 are two independent arrays.

tyleransom · May 21, 2019, 3:45pm

Thank you for your patience. I now understand!