Optim user-supplied gradient not working

I’m estimating a maximum likelihood problem, and I’ve specified both the objective function and the gradient correctly. (I know this because I am able to reproduce estimates using the same exact data as a Stata example.)

When I allow Optim’s ForwardDiff to compute the gradient and hessian, I get the correct answer, and the gradient evaluated at the optimum is very close to zero (as expected). [see code below:]

    td = TwiceDifferentiable(f, bstart, autodiff = :forwarddiff)
    rs = optimize(td, bstart, LBFGS(; linesearch = LineSearches.BackTracking()), Optim.Options(iterations=100_000,g_tol=1e-8,f_tol=1e-8))
    q = Array{Float64}(undef, length(β))
    grad = g!(q,β)
    gradzero = g!(q,zeros(size(β)))

However, when I instead pass my user-defined gradient to Optim, the algorithm terminates at the initial point (and the gradient evaluates to what it would be at the initial point). [see code below; only difference is I added g! to the TwiceDifferentiable call]

    td = TwiceDifferentiable(f, g!, bstart, autodiff = :forwarddiff)
    rs = optimize(td, bstart, LBFGS(; linesearch = LineSearches.BackTracking()), Optim.Options(iterations=100_000,g_tol=1e-8,f_tol=1e-8))
    q = Array{Float64}(undef, length(β))
    grad = g!(q,β)
    gradzero = g!(q,zeros(size(β)))

The full code is available here (autodiff example) and here (user-defined example). [The two files are identical except for adding the g! to the TwiceDifferentiable call]

I’m using Optim v0.18.1.

Does anyone know what might be going on? It seems like I’m missing something basic, but not immediately apparent.

Also, if anyone has performance tips, they would be most welcome! I plan on using the fg! capability once I figure out what’s going on with the g! call, since there is so much computational overlap between them.

Finally, if anyone knows how to precompile this function, I’d really appreciate a pointer.

You might want to consider comparing the manually coded gradient to

  1. ForwardDiff, and

  2. finite differences

link to offending line
You’re not writing into the correct cache, as you’re binding a completly new array to G. You can try replacing that line with G .= T(0).

2 Likes

Doing something like

    q = Array{Float64}(undef, length(β))
    grad = g!(q,β)
    gradzero = g!(q,zeros(size(β)))

is likely to confuse you.

Note that here, q, grad and gradzero points to the same array.

It is better to not use the output value for a mutating function (it can just return nothing) which would make this more obvious:

q = Array{Float64}(undef, length(β))
g!(q,β)
g!(q,zeros(size(β)))

that there is only one q.

Thanks for pointing that out; I’m still trying to get a hang of mutating functions, so I appreciate your advice.

I followed @Tamas_Papp’s advice and it looks like I’m still not understanding something basic.

I now have the following near the end of my main function:

q = Array{Float64}(undef, length(β))
g = x -> ForwardDiff.gradient(f, x)

return β,se,ℓ,g!(q,β),g!(q,zeros(size(β))),g(β)

which, for the last three outputs, gives me

julia> [g1 g0 gee]
5×3 Array{Float64,2}:
   -93.6667    -93.6667   0.00814071
   -72.0       -72.0     -0.00332
    34.3333     34.3333  -0.0106256
    32.0        32.0      0.0055671
 -1345.0     -1345.0      0.0099657

I don’t understand why g!(q,β) (column 1) is giving something different than g(β) (column 3).

First of all, change your g! to return nothing because you are doing the mistake again that I just pointed out. In your example the second call to g! will overwrite the q that got set in the first call.

So you need to pass independent q into each g! call if you want to compare the result of it on different inputs. In fact, the whole point of g! is to reuse the same array over and over as to not reallocate it on each call.

So something like

g1 = Array{Float64}(undef, length(β))
g!(g1,β)
g2 = similar(g1)
g!(g2, zeros(size(β)))

Now g1 and g2 are two independent arrays.

2 Likes

Thank you for your patience. I now understand!

1 Like