Maximum Likelihood: Normal Linear Model

Hello,

I’m trying to optimize a simple normal linear model likelihood function and have two questions. The code I am using is below:

using Optim

nobs = 500
nvar = 1
β = ones(nvar)*3.0
x = [ones(nobs) randn(nobs,nvar-1)]
ε = randn(nobs)*0.5
y = x*β + ε

function LL_anon(X, Y, β, σ)
  -(-length(X)*log(2π)/2 - length(X)*log(σ) - (sum((Y - X*β).^2) / (2σ^2)))
end
LL_anon(X,Y, pars) = LL_anon(X,Y, pars...)

res2 = optimize(vars -> LL_anon(x,y, vars...), [1.0,1.0]) # Start values: β=1.0, σ=1.0

This actually works and I get the following:

 * Algorithm: Nelder-Mead
 * Starting Point: [1.0,1.0]
 * Minimizer: [2.980587812647935,0.5108406803949835]
 * Minimum: 3.736217e+02
 * Iterations: 47
 * Convergence: true
   *  √(Σ(yᵢ-ȳ)²)/n < 1.0e-08: true
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 92

My questions are as follows:

  1. My initial example has only an intercept and sigma and when I try and add another X variable, I get an error message and I’m not sure why?

MethodError: no method matching LL_anon(::Array{Float64,2}, ::Array{Float64,1}, ::Float64, ::Float64, ::Float64)
Closest candidates are:
  LL_anon(::Any, ::Any, ::Any, ::Any) at In[297]:2
  LL_anon(::Array{Float64,1}, ::Array{Float64,1}, ::Any, ::Any) at In[113]:2
  LL_anon(::Any, ::Any, ::Any) at In[297]:4
  ...

Stacktrace:
 [1] (::##245#246)(::Array{Float64,1}) at .\In[299]:1
 [2] value!!(::NLSolversBase.NonDifferentiable{Float64,Array{Float64,1},Val{false}}, ::Array{Float64,1}) at C:\Users\dolacomb\.julia\v0.6\NLSolversBase\src\interface.jl:9
 [3] initial_state(::Optim.NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters}, ::Optim.Options{Float64,Void}, ::NLSolversBase.NonDifferentiable{Float64,Array{Float64,1},Val{false}}, ::Array{Float64,1}) at C:\Users\dolacomb\.julia\v0.6\Optim\src\multivariate/solvers/zeroth_order\nelder_mead.jl:136
 [4] optimize(::NLSolversBase.NonDifferentiable{Float64,Array{Float64,1},Val{false}}, ::Array{Float64,1}, ::Optim.NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters}, ::Optim.Options{Float64,Void}) at C:\Users\dolacomb\.julia\v0.6\Optim\src\multivariate/optimize\optimize.jl:25
 [5] #optimize#151(::Array{Any,1}, ::Function, ::Tuple{##245#246}, ::Array{Float64,1}) at C:\Users\dolacomb\.julia\v0.6\Optim\src\multivariate/optimize\interface.jl:62
 [6] #optimize#148(::Array{Any,1}, ::Function, ::Function, ::Array{Float64,1}) at C:\Users\dolacomb\.julia\v0.6\Optim\src\multivariate/optimize\interface.jl:52
 [7] optimize(::Function, ::Array{Float64,1}) at C:\Users\dolacomb\.julia\v0.6\Optim\src\multivariate/optimize\interface.jl:52
  1. If I change the start value in my original working example to [2.0,2.0] I get the following error message:
log will only return a complex result if called with a complex argument. Try log(complex(x)).

Stacktrace:
 [1] nan_dom_err at .\math.jl:300 [inlined]
 [2] log at .\math.jl:419 [inlined]
 [3] LL_anon(::Array{Float64,2}, ::Array{Float64,1}, ::Float64, ::Float64) at .\In[302]:2
 [4] (::##251#252)(::Array{Float64,1}) at .\In[304]:1
 [5] value(::NLSolversBase.NonDifferentiable{Float64,Array{Float64,1},Val{false}}, ::Array{Float64,1}) at C:\Users\dolacomb\.julia\v0.6\NLSolversBase\src\interface.jl:19
 [6] update_state!(::NLSolversBase.NonDifferentiable{Float64,Array{Float64,1},Val{false}}, ::Optim.NelderMeadState{Array{Float64,1},Float64,Array{Float64,1}}, ::Optim.NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters}) at C:\Users\dolacomb\.julia\v0.6\Optim\src\multivariate/solvers/zeroth_order\nelder_mead.jl:193
 [7] optimize(::NLSolversBase.NonDifferentiable{Float64,Array{Float64,1},Val{false}}, ::Array{Float64,1}, ::Optim.NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters}, ::Optim.Options{Float64,Void}, ::Optim.NelderMeadState{Array{Float64,1},Float64,Array{Float64,1}}) at C:\Users\dolacomb\.julia\v0.6\Optim\src\multivariate/optimize\optimize.jl:51
 [8] optimize(::NLSolversBase.NonDifferentiable{Float64,Array{Float64,1},Val{false}}, ::Array{Float64,1}, ::Optim.NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters}, ::Optim.Options{Float64,Void}) at C:\Users\dolacomb\.julia\v0.6\Optim\src\multivariate/optimize\optimize.jl:25
 [9] #optimize#151(::Array{Any,1}, ::Function, ::Tuple{##251#252}, ::Array{Float64,1}) at C:\Users\dolacomb\.julia\v0.6\Optim\src\multivariate/optimize\interface.jl:62
 [10] #optimize#148(::Array{Any,1}, ::Function, ::Function, ::Array{Float64,1}) at C:\Users\dolacomb\.julia\v0.6\Optim\src\multivariate/optimize\interface.jl:52
 [11] optimize(::Function, ::Array{Float64,1}) at C:\Users\dolacomb\.julia\v0.6\Optim\src\multivariate/optimize\interface.jl:52

Again, I’m not sure why this is happening and since start values are very important I’d like to know how to overcome this issue.

I’d like to attach the .ipynb file but cannot do so.

Any help would be greatly appreciated.

Your second issue arises when the optimiser attempts to set a negative value for σ: log(σ) becomes complex. If you need a real solution you can set boxed limits on your problem:

lower = [0.0,0.0]
upper = [Inf,Inf]
res2 = optimize(vars -> LL_anon(x,y, vars...),[2.0,2.0],lower,upper)

As for your first issue you can see that you’re passing a 2 dimensional array (::Array{Float64,2}) into X but it only wants a 1 dimensional one (::Array{Float64,1}). From the sound of it you’re constructing your X input incorrectly then. Can you show us how you’re setting this up?

1 Like

Thank you for that second solution, which is now obvious to me. Sometimes a second (or more) set of eyes is needed…

As to the first issue, I’m generating the data as follows:

nobs = 500
nvar = 2
β = ones(nvar)*3.0
x = [ones(nobs) randn(nobs,nvar-1)]
ε = randn(nobs)*0.5
y = x*β + ε

When I generate the x matrix, it is of type Array{Float64,2}, i.e. like so:

500×2 Array{Float64,2}:
 1.0  -0.3026   
 1.0   0.499314 
 1.0  -0.890789 
 1.0   0.0385097
 1.0   0.740315 
 1.0   0.361159 
 1.0   0.356262 
 1.0   0.596329 
 1.0   0.634283 
 1.0   1.36919  
 1.0   0.535277 
 1.0  -0.456864 
 1.0  -0.947847 

The first column is the intercept and the second is the first generated covariate. Again, I’m generating the x matrix as follows:

x = [ones(nobs) randn(nobs,nvar-1)]

For example, when nvar=2, I should have an x matrix with the intercept plus one other column for the other covariate.

Maybe I need to change the inputs to the anonymous function? Or perhaps change the optimization method? Again, I’m not sure what the problem is because it works for the intercept only model.

Again, thank you for your help as it is greatly appreciated.

I think your first issue is because the optimization function uses a single vector of parameter values, so when it’s splatted (...) it passes three separate arguments to the likelihood function rather than a vector of length two for β and a single argument for σ. A quick fix would be to explicitly index into vars for each subset of parameter values in your anonymous function. So something like

res2 = optimize(vars -> LL_anon(x, y, vars[1:nvar], vars[nvar + 1], [1.0, 1.0, 1.0])

I see what this code is supposed to be doing which is grabbing each parameter in a vector, i.e. vars[1:nvar] grabs all of the betas and vars[nvar+1] grabs the sigma.

However, when I try this code I receive the following error after running the complete code from the beginning:

MethodError: no method matching optimize(::##5#6)
Closest candidates are:
  optimize(::F<:Function, ::T<:AbstractFloat, ::T<:AbstractFloat; method, rel_tol, abs_tol, iterations, store_trace, show_trace, callback, show_every, extended_trace) where {F<:Function, T<:AbstractFloat} at C:\Users\dolacomb\.julia\v0.6\Optim\src\univariate/optimize/interface.jl:14
  optimize(::F<:Function, ::T<:AbstractFloat, ::T<:AbstractFloat, ::Optim.GoldenSection; rel_tol, abs_tol, iterations, store_trace, show_trace, callback, show_every, extended_trace, nargs...) where {F<:Function, T<:AbstractFloat} at C:\Users\dolacomb\.julia\v0.6\Optim\src\univariate/solvers/golden_section.jl:54
  optimize(::F<:Function, ::T<:AbstractFloat, ::T<:AbstractFloat, ::Optim.Brent; rel_tol, abs_tol, iterations, store_trace, show_trace, callback, show_every, extended_trace) where {F<:Function, T<:AbstractFloat} at C:\Users\dolacomb\.julia\v0.6\Optim\src\univariate/solvers/brent.jl:58
  ...

Again, your solutions makes perfect sense to me but obviously there is an issue.

Any other help would be appreciated. Thank you for taking the time to look at this.

I think I have figured this out and would like to post my solution for others and to ask some follow up questions. Here is the code I used:

using Optim

nobs = 5000
nvar = 3
coef = [-3.0,3.0,-3.0]
x = [ones(nobs) randn(nobs,nvar-1)]
ε = randn(nobs)
y = x*coef + ε

function LL_anon(X, Y, β, σ)
  llike = (-length(X)*log(2π)/2 - (length(X)*log(σ^2))/2 - (sum((Y - X*β).^2) / (2σ^2)))
    llike = -llike
end
LL_anon(X,Y, pars) = LL_anon(X,Y, pars...)

start = abs.(randn(nvar+1))

lower = [-Inf,-Inf,-Inf,0.0]
upper = [Inf,Inf,Inf,Inf]
res2 =  optimize(vars -> LL_anon(x, y, vars[1:nvar], vars[nvar+1]),start,lower,upper)

Results of Optimization Algorithm
 * Algorithm: Fminbox with Conjugate Gradient
 * Starting Point: [0.976620488369098,1.6087510271304073, ...]
 * Minimizer: [-2.9763390777806227,2.999366763084127, ...]
 * Minimum: 1.317942e+04
 * Iterations: 3
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 3.11e-06 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: true
     |f(x) - f(x')| = 3.24e-11 |f(x)|
   * |g(x)| ≤ 1.0e-08: false 
     |g(x)| = 2.75e-04 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 175
 * Gradient Calls: 108

res2.minimizer
4-element Array{Float64,1}:
 -2.97634 
  2.99937 
 -2.9866  
  0.582568

The coefficients are pretty much spot on but the sigma should be 1, and it’s 0.58 so it’s a bit off and I’m not sure why.

I have a couple of follow up questions:

  1. I cannot see to change the optimization routine, i.e. to BFGS like so:
res2 =  optimize(vars -> LL_anon(x, y, vars[1:nvar], vars[nvar+1]),start,lower,upper,BFGS())

MethodError: objects of type Array{Float64,1} are not callable
Use square brackets [] for indexing an Array.

Is this possible or do I need t use NLopt or something else? I’d like to get a numerical Hessian from the output so I can get standard errors and t-statistics.

  1. Can this be populated with the number of variables, i.e. can the length of these vectors be made arbitrarily long with Inf, -Inf, for each variable?

I want to thank everyone for their help in this matter!

I got a slightly different version of you code working. Instead of setting the bounds using lower and upper (which didn’t work for me, but I didn’t try very hard), I just use log_σ and exponentiated it inside the function. This example includes using the OnceDifferentiable and TwiceDifferentiable constructors to get autodiff’d gradients and Hessians. Passing these to optimize chooses L-BFGS and Newton respectively.

using Optim

nobs = 500
nvar = 2
β = ones(nvar) * 3.0
x = [ones(nobs) randn(nobs, nvar - 1)]
ε = randn(nobs) * 0.5
y = x * β + ε

function LL_anon(X, Y, β, log_σ)
    σ = exp(log_σ)
    -(-length(X) * log(2π)/2 - length(X) * log(σ) - (sum((Y - X * β).^2) / (2σ^2)))
end

opt = optimize(vars -> LL_anon(x,y, vars[1:nvar], vars[nvar + 1]),
                [1.0, 1.0, 1.0])

# Use forward autodiff to get first derivative, then optimize
fun = OnceDifferentiable(vars -> LL_anon(x, y, vars[1:nvar], vars[nvar + 1]),
                         ones(3))
opt1 = optimize(fun1, ones(3))

# Get Hessian, use Newton!
fun2 = TwiceDifferentiable(vars -> LL_anon(x, y, vars[1:nvar], vars[nvar + 1]),
                           ones(3))
opt2 = optimize(fun2, ones(3))

1 Like

That’s a quite unfortunate error message. This happens because Optim thinks you are asking it to use BFGS on a problem where the objective is vars → LL_anon(x, y, vars[1:nvar], vars[nvar+1]), the gradient is start, the combined objective and gradient is lower, and the initial value of the problem is upper. You need to tell it that it is supposed to use the Fminbox solver with BFGS used internally. I almost don’t want to tell you how to do it, because the API is going to change very soon (when https://github.com/JuliaLang/METADATA.jl/pull/15162 is merged and Optim is updated to v0.15.0). However, in Optim v0.14.x that you’re probably using, the syntax is

optimize(f, x0, lower, upper, Fminbox{BFGS}()

but when Optim is updated (very soon!) it will be

optimize(f, lower, upper, x0, Fminbox(BFGS())

Edit: Optim is now updated

2 Likes

I would just do a change of variables from sigma to log_sigma = log(sigma). Now it is defined on the whole real line, and you don’t need constraints.

If you want the maximum of sigma itself, you don’t even need a Jacobian.

2 Likes

Thank you for taking the time to provide this example! It works like a charm! Just to be complete, I’m going to post the complete solution here so that others may be helped. Ultimately, I want to apply these techniques to spatial econometric models which should be forthcoming by the end of the summer.

using Optim

nobs = 500
nvar = 2
β = ones(nvar) * 3.0
x = [ones(nobs) randn(nobs, nvar - 1)]
ε = randn(nobs) * 0.5
y = x * β + ε

function LL_anon(X, Y, β, log_σ)
    σ = exp(log_σ)
    -(-length(X) * log(2π)/2 - length(X) * log(σ) - (sum((Y - X * β).^2) / (2σ^2)))
end

opt = optimize(vars -> LL_anon(x,y, vars[1:nvar], vars[nvar + 1]),
               ones(nvar+1))

# Use forward autodiff to get first derivative, then optimize
fun1 = OnceDifferentiable(vars -> LL_anon(x, y, vars[1:nvar], vars[nvar + 1]),
                         ones(nvar+1))
opt1 = optimize(fun1, ones(nvar+1))

Results of Optimization Algorithm

  • Algorithm: L-BFGS
  • Starting Point: [1.0,1.0,1.0]
  • Minimizer: [2.994204150985705,2.9900626550063305, …]
  • Minimum: 3.538340e+02
  • Iterations: 12
  • Convergence: true
    • |x - x’| ≤ 1.0e-32: false
      |x - x’| = 8.92e-12
    • |f(x) - f(x’)| ≤ 1.0e-32 |f(x)|: false
      |f(x) - f(x’)| = 9.64e-16 |f(x)|
    • |g(x)| ≤ 1.0e-08: true
      |g(x)| = 6.27e-09
    • Stopped by an increasing objective: true
    • Reached Maximum Number of Iterations: false
  • Objective Calls: 50
  • Gradient Calls: 50
opt1.minimizer
3-element Array{Float64,1}:
  2.9942 
  2.99006
 -1.0651  #Note: needs to be exponentiated
# Get Hessian, use Newton!
fun2 = TwiceDifferentiable(vars -> LL_anon(x, y, vars[1:nvar], vars[nvar + 1]),
                           ones(nvar+1))
opt2 = optimize(fun2, ones(nvar+1))

Results of Optimization Algorithm

  • Algorithm: Newton’s Method
  • Starting Point: [1.0,1.0,1.0]
  • Minimizer: [2.99420415098702,2.9900626550079026, …]
  • Minimum: 3.538340e+02
  • Iterations: 9
  • Convergence: true
    • |x - x’| ≤ 1.0e-32: false
      |x - x’| = 1.36e-11
    • |f(x) - f(x’)| ≤ 1.0e-32 |f(x)|: false
      |f(x) - f(x’)| = 1.61e-16 |f(x)|
    • |g(x)| ≤ 1.0e-08: true
      |g(x)| = 6.27e-09
    • Stopped by an increasing objective: true
    • Reached Maximum Number of Iterations: false
  • Objective Calls: 45
  • Gradient Calls: 45
  • Hessian Calls: 9
fieldnames(fun2)
13-element Array{Symbol,1}:
 :f       
 :df      
 :fdf     
 :h       
 :F       
 :DF      
 :H       
 :x_f     
 :x_df    
 :x_h     
 :f_calls 
 :df_calls
 :h_calls 

opt2.minimizer
3-element Array{Float64,1}:
  2.98627
  3.00654
 -1.11313
numerical_hessian = (fun2.H) #.H is the numerical Hessian
3×3 Array{Float64,2}:
 64.8715      -9.45045      0.000121521
 -0.14568     66.4507       0.0        
  1.87326e-6   4.10675e-9  44.7214     

Form here, one can take the inverse of the numerical Hessian and take the square root of the diagonal elements to get standard errors.

Again, thanks to all for a great learning experience. I really appreciate all of the help and replies.

2 Likes

Three comments on the Optim and *Differentiable interfaces, in case you were not aware.

  1. By default derivative information is calculated with finite differences. I would recommend trying automatic differentiation with ForwardDiff on your problems when you can:
df = TwiceDifferentiable(f, x0; autodiff=:forward)
  1. You can access the gradient and Hessian values using the NLSolversBase interface:
using NLSolversBase
gr = gradient(df)
hess = hessian(df) # instead of df.H
  1. I recommend that you call Optim.minimizer(opt2) rather than opt2.minimizer
3 Likes

Thank you very much for your comments as they are greatly appreciated.

I’m curious as to why you recommend Optim.minimizer(opt2) rather than opt2.minimizer as well as hess = hessian(df) versus df.H?

It appears that they give the same answer but there may be other differences that I’m not aware of at this point. I did see the hessian command but did not know how to access it via the df = TwiceDifferentiable(f, x0; autodiff=:forward) call or more to the point, I wasn’t sure how to use the TwiceDifferentiable command.

I’ve looked at the Optim documentation but I’ve obviously missed something in my reading, which I will try and remedy.

Incidentally for anyone who is interested, I have successfully optimized the likelihood function for a spatial autoregressive (SAR) model and will be working on the other spatial econometric models soon.

Thanks again to everyone for their help. Working mainly as a Bayesian (and having coded several spatial models), this is my first venture into maximum likelihood estimation in Julia and it’s been a pleasure to work with the language. Once you get the hang of the syntax, it’s pretty straightforward.

1 Like

Two reasons:

  • An API call may be doing some other things in addition to reading the property value. (See, for example, gradient!(df, x) in the NLSolversBase README
  • In case the internal code of a package changes.

If there are changes in the structs in a package, API calls, such as Optim.minimizer or hessian, can be deprecated so that your code doesn’t break. So if you try to access opt2.minimizer but we have for some reason decided to change the property to opt2.minimum_x, then Optim.minimizer can be updated to reflect this, but opt.minimizer will just disappear.

Great :slight_smile: I believe examples of maximum likelihood problems can be very valuable to other newcomers to Julia as well. If you are able to write up an example of any of the problems you work on we’ll add it to the Optim docs (or some other centralised tutorial repository).

2 Likes

I just realised another thing that is important when you want to extract the information of the Hessian.

When optimize has reached tolerance and says it has converged, it does not update the Hessian as it is only needed if we were to take another iteration step. Thus, the value of hessian(fun2) is actually the Hessian at the penultimate iteration.

So to get a better approximation of the Bayesian standard errors you need to do the following:

using Optim, NLSolversBase
# define df and x0 ...
res = optimize(df, x0, Newton())
xmin = Optim.minimizer(res)
hessian!(df, xmin) # Update the Hessian calculation for the minimizer `xmin`
numerical_hessian = hessian(df)
3 Likes

Thank you very much for the additional information. I was unaware of this issue however and I appreciate the help!

I’m also very willing to write all of this up for a tutorial for the Optim package as well. However, I’m not sure how to proceed so if you can kindly point me in the right direction, I’d be happy to contribute.

I’m not sure if this should be a separate discussion, but maybe a “Tutorial” section on the forum would be an addition that would be helpful? I’m envisioning a “one stop shop” for various topics where individuals can get information regarding various topics, although I’m happy to defer to the community regarding this issue.

Please do! One solution is to produce a Jupyter notebook, or even better, at document that uses Literate to generate a Jupyter notebook from a scripts.

Thank you for the information! Obviously, a Jupyter notebook is something I’m familiar with but I’m not familiar with Literate, which I’m assuming is a package that will take a script and produce documentation?

If someone could provide a tutorial link I’d appreciate it.

Than you!

It’s not too difficult. If it’s easier to produce a notebook we could try the reverse: produce the notebook and generate the literate scripts based on something you know (how to make notebooks)

2 Likes

In addition to the example in the Literate docs, here’s an example that I used to write a short tutorial/example for LineSearches.jl

I prefer writing my examples in an editor to writing them in a Jupyter notebook, but as @pkofod said we can turn your Jupyter notebook example into a Literate example if you would like to stick with the notebook :wink:

2 Likes

Thank you all for the information! I’m eager to learn how to use Literate so I will concentrate on that for the time being. Your examples look straightforward and very helpful. Amazing how it produces such nice output…

Also, the Hessian code you mentioned earlier (i.e. hessian!) matched the analytical VC matrix calculation perfectly so that was nice to see (the analytical solution breaks down with increasing N so that’s why I use the numerical hessian).

I’ll be in touch once the example is complete.

Thanks!