Optim, Forward Differentiation and OnceDifferentiable with MLE

optim

#1

Hi guys, I am estimating a model by Maximum Likelihood. So far, I’ve been using Nelder-Mead as the optimization method where I don’t need to provide any gradient or Hessian.

Convergence has been somewhat slow, and I usually need 10k+ iterations, so I’ve been trying to change it to a gradient-based method. Also, I would be able to obtain standard errors without the need for boostrap.

I’ve read through the examples in the Optim and NLSolversBase documentation and can replicate them in Julia. However, when I try to use, for example, automatic Differentiation, I get the following Method error

julia> optimize(wrapmle, theta, BFGS(), Optim.Options(iterations = 100000); autodiff = :forward)

ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(wrapmle),Fl
oat64},Float64,9})
Closest candidates are:
  Float64(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:185
  Float64(::T<:Number) where T<:Number at boot.jl:725
  Float64(::Int8) at float.jl:60

If I try to use it as described by the Optim documentation, so I could get the gradient and calculate standard errors, I get the same error

julia> func = OnceDifferentiable(wrapmle, theta; autodiff = :forward);
julia> optimize(func, theta)

ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(wrapmle),Fl
oat64},Float64,9})
Closest candidates are:
  Float64(::Real, ::RoundingMode) where T<:AbstractFloat at rounding.jl:185
  Float64(::T<:Number) where T<:Number at boot.jl:725
  Float64(::Int8) at float.jl:60

I don’t really understand what is going on here. I won’t post my mle function as it is quite long and verbose and highly specific. It has a bunch of numerical integration by quadrature and I use a lot of pdf.(Normal(),…) and cdf.(Normal(),…) expressions to evaluate standard normal distributions. Also I have around 25 parameters.

Is there a specific way I should be writing my objective function to enable automatic differentiation? What is causing this error? Thanks!

Edit: Just an observation, but when using Finite Differentiation instead of Forward, I don’t have this problem!


#2

It seems that your objective function is non-generic in some way w.r.t. the special number type that ForwardDiff uses to perform automatic differentiation. ForwardDiff’s specific requirements for differentiability are listed here, if that helps.


#3

Thanks for the reply! I am guessing that given the error message I’m getting, the requirement being violated is probably The target function must be written generically enough to accept numbers of type T<:Real as input (or arrays of these numbers).

However, I am very new to programming in general, and I don’t really understand what it is trying to say. For example, my objective function has as input DataFrame objects. Unfortunately I am away from my computer to test it, but would it be the case that my function use DataFrame objects as inputs, for example?


#4

It should be simple as long as those inputs are fixed (data, and not parameters). You only need differentiability in the latter.

Perhaps if you show your code you can get more specific help (if you don’t want to share data because it is private or large, just include a function that generates random data).


#5

Okay, so I’ve put all necessary files in this gist.

If someone could indicate a direction, I would be very thankful!


#6

Simplifying this further would increase your chances of getting help. I am sure you can remove a lot of moving parts from this.


#7

Just a quick comment: in your gist, you use TwiceDifferentiable but pass the function to a first-order method; use OnceDifferentiable.

I don’t have time to run your whole example and it’s not obvious where the MethodError comes up from the code. You can try to debug this further yourself by passing in DualNumbers to wrapmle or run ForwardDiff.gradient on the wrapper function.


#8

I think the problem is that you preallocate arrays as Float64, but then try storing ForwardDiff.Dual in them. For example, line 24 will make Xs0 an array of whatever type Xs is (likely Float64), but then line 30 stores a ForwardDiff.Dual in Xs0. The code attempts to convert to a Float64, but no conversion exists, hence your error message. The solution is to make sure that your arrays are the same type as theta. This applies to Xs0, Xs1, integ0, integ1, temp0, temp1, and possibly others.


#9

Hi Paul and anriseth, this has been really helpful! To be honest (and I am a little ashamed by this), I was trying to use ForwardDiff without really understanding what it was. After you guys replied here, I searched a bit about Dual Numbers and it made stuff much more clear!

So, instead of creating empty Arrays of type Float64 to store stuff inside the function, I am creating Arrays{Any}(undef,size) and testing by passing DualNumbers to wrapmle! It seems to be following through now.

Thanks a million!


#10

This is great progress, but what you need to do to go all the way (and not risk killing performance) is to replace that Any with the element type of the array of x, so it’s Dual when ForwardDiff is operating on your mle function and Float64 when Optim is evaluating it :slight_smile:


#11

Hi pkofod, how could I do this? For example,

using Optim

function testfc(x)
        test = Array{Any}(undef, 1, 2)
        test[1] = x[1]
        test[2] = x[2]
        fc = (1.0 - test[1])^2 + 100.0 * (test[2] - test[1]^2)^2
end

func = TwiceDifferentiable(testfc, [0.0, 0.0]; autodiff = :forward)        
optimize(func, [0.0, 0.0], Newton())

Should I use an If statement? How do I tell the function when to change the type of test Array? I tried making the Array a Dual type, but Julia tells me it is not a valid type?

In the documentation of ForwardDiff, they link to this Discourse post link, but to be honest, I don’t really understand the example and what is going on.


#12

Try

function testcf(x::AbstractArray{T}) where {T}
    test = Array{T}(undef, 1, 2)
end

Also take a look at eltype. You could also do copy or similar.


#13

Wow, okay, I used eltype and computation time of my function dropped from 24 seconds to just 5! I know assigning correct types are important for performance in Julia, but I really underestimated how big of a difference they can make! Thanks a lot!


#14

Most of the time spent in optimize is going to be evaluating the objective and gradient. If you use autodiff on an unstable objective to get the gradient, I could imagine the performance taking a severe hit as you observed.


#15

24->5 is actually not a lot, the penalty for unstable code is usually larger. Check that all your code uses static types (read the “performance tips” section of the manual and use @code_warntype to check).