Using gradient function with Optim

Richard · September 15, 2017, 12:06pm

I am taking first steps with Julia, trying to use Optim.jl to estimate a probit. I have adapted some code I [found] :

using DataFrames
using Distributions
using Optim

# -------------- Normal distribution functions prevent 0.0 and 1.0 ----------------------

function normcdf(x::Vector{Float64})
  out = Distributions.cdf(Distributions.Normal(), x)
  out + (out .== 0.0)*eps(1.0) - (out .== 1.0)*eps(1.0)
end

function normpdf(x::Vector{Float64})
  Distributions.pdf(Distributions.Normal(), x)
end

# -------------- Likelihood and gradient  ----------------------

function λ(β::Vector{Float64})
	q =	2d-1
	q .* normpdf(q .* X*β) ./ normcdf(q.*X*β)
end

function LL(β::Vector{Float64})
	- sum( log.( normcdf( (2d-1) .* X*β) ) )
end

function g!(storage, β)
    storage[:] = -sum( λ(β).*X,  1 )
end

# ------------------------ Probit DGP ------------------------

N     = 10^5
k     = 4
u     = rand(Normal(),N)
X     = [ones(N) reshape( rand(Uniform(-1,1), N*(k-1)), (N,k-1) )]
βTrue = [2.0, 3, 5, 10]
y     = X*βTrue + u
d     = 1.0 * (y .>= 0 )

I then run this, first without supplying the gradient function:

res1 = Optim.optimize(LL, zeros(k), method=BFGS())

which works fine but when I supply the gradient function:

res2 = Optim.optimize(LL, g!, zeros(k), method=BFGS())

it reports convergence despite not iterating and the results being clearly incorrect. I presume I have made a simple mistake but I can’t see it.

antoine-levitt · September 15, 2017, 1:23pm

Have you tried checking your gradient? You can either do this manually (check that f(x+eps h) ~= f(x) + eps*<g,h>), or use Calculus or one of the automatic differentiation packages (it looks like the bottleneck of your function is a matmul, so ReverseDiff should be pretty efficient)

Richard · September 15, 2017, 2:55pm

Thanks for the tip. For res1 (the successful run) I ran the following:

q = Vector{Float64}(4)
g!(q, res1.minimizer)

which showed the gradient calculated by g! to look right

1×4 Array{Float64,2}:
-1.25987e-7  5.45146e-8  5.98943e-8  1.43703e-7

I should say, I’m attempting this as a learning exercise. I just want to understand how to supply the gradient function so I can apply this understanding in a different (more complicated) model. Since I will apply that to a bigger dataset, I want the speed that analytical derivatives allow.

Incidentally, how do you see that the bottleneck is a matmul (and what is a matmul)? Apologies if these are basic questions, I’m not from a CS background.

Elrod · September 15, 2017, 9:20pm

You’re probably on Optim version 0.7.8 or lower. Back then you needed

function g!(β, storage)
    storage[:] = -sum( λ(β).*X,  1 )
end

The new versions of Optim (0.9.x) follow the usual Julia style, where you mutate the first argument.
So either switch the order of arguments as above, or upgrade Optim to the latest version.

As a warning, if you downloaded Julia from JuliaPro, odds are you’d probably have a hard time upgrading.

Richard · September 18, 2017, 8:56am

Switching the arguments solved it - thank you! I was indeed using JuliaPro.

Topic		Replies	Views
Gradient not updating in call to optimize (Optim.jl) Optimization (Mathematical) package , optim	2	1095	April 17, 2021
Passing gradient to Optim New to Julia	4	328	October 8, 2019
'Optim' fails when supplying gradient and Hessian New to Julia optim	5	324	May 2, 2023
Optim user-supplied gradient not working Optimization (Mathematical) optim	6	1055	May 21, 2019
Obtain gradient of function at end of iterations in Optim.jl Optimization (Mathematical) optim	3	740	June 3, 2021

Using gradient function with Optim

Related topics