Maximum Likelihood Multivariate Model

I am trying to optimize a likelihood function for the multivariate case. This is the code I am using

using Optim, LinearAlgebra, ForwardDiff

#observed covariance
cov_obs =   [1.3629    0.408729  0.581832
            0.408729  1.38639   0.452567
            0.581832  0.452567  1.27911]

# define SEM in RAM notation
function f(x)
      S =   [x[1] 0 0 0
            0 x[2] 0 0
            0 0 x[3] 0
            0 0 0 x[4]]

      F =   [1 0 0 0
            0 1 0 0
            0 0 1 0]

      A =  [0 0 0 1
            0 0 0 x[5]
            0 0 0 x[6]
            0 0 0 0]

      Cov_Exp =  F*inv(I-A)*S*transpose(inv(I-A))*transpose(F)
      return log(det(Cov_Exp)) + tr(cov_obs*inv(Cov_Exp)) - log(det(cov_obs)) - 3
end


###############################################################
### works without Gradient

#initial values
x0 = ones(6)

result = optimize(f, x0)


###############################################################
### does not work with gradient

func = TwiceDifferentiable(x -> f(x),
                           ones(6), autodiff=:forward)

# does not work
optimize(f, x0, LBFGS(), autodiff = :forward)

optimize(func, x0)

# works
ForwardDiff.gradient(f, x0)

I am not able to solve the problem using an algorithm that requires a gradient. This is the error I get:

DomainError with -0.004445959499469189:
log will only return a complex result if called with a complex argument. Try log(Complex(x)).
throw_complex_domainerror(::Symbol, ::Float64) at math.jl:32
log(::Float64) at log.jl:285
log at dual.jl:203 [inlined]
f(::Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(f),Float64},Float64,6},1}) at SEM.jl:40
vector_mode_gradient!(::DiffResults.MutableDiffResult{1,Float64,Tuple{Array{Float64,1}}}, ::typeof(f), ::Array{Float64,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(f),Float64},Float64,6,Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(f),Float64},Float64,6},1}}) at apiutils.jl:37
gradient! at gradient.jl:35 [inlined]
gradient! at gradient.jl:33 [inlined]
(::NLSolversBase.var"#14#18"{Float64,typeof(f),ForwardDiff.GradientConfig{ForwardDiff.Tag{typeof(f),Float64},Float64,6,Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(f),Float64},Float64,6},1}}})(::Array{Float64,1}, ::Array{Float64,1}) at oncedifferentiable.jl:69
value_gradient!!(::OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}) at interface.jl:82
value_gradient!(::OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}) at interface.jl:69
value_gradient!(::Optim.ManifoldObjective{OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}}, ::Array{Float64,1}) at Manifolds.jl:50
(::LineSearches.var"#ϕdϕ#6"{Optim.ManifoldObjective{OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}},Array{Float64,1},Array{Float64,1},Array{Float64,1}})(::Float64) at LineSearches.jl:84
(::LineSearches.HagerZhang{Float64,Base.RefValue{Bool}})(::Function, ::LineSearches.var"#ϕdϕ#6"{Optim.ManifoldObjective{OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}},Array{Float64,1},Array{Float64,1},Array{Float64,1}}, ::Float64, ::Float64, ::Float64) at hagerzhang.jl:227
HagerZhang at hagerzhang.jl:101 [inlined]
perform_linesearch!(::Optim.LBFGSState{Array{Float64,1},Array{Array{Float64,1},1},Array{Array{Float64,1},1},Float64,Array{Float64,1}}, ::LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}, ::Optim.ManifoldObjective{OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}}) at perform_linesearch.jl:53
update_state!(::OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Optim.LBFGSState{Array{Float64,1},Array{Array{Float64,1},1},Array{Array{Float64,1},1},Float64,Array{Float64,1}}, ::LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}) at l_bfgs.jl:198
optimize(::OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}, ::Optim.Options{Float64,Nothing}, ::Optim.LBFGSState{Array{Float64,1},Array{Array{Float64,1},1},Array{Array{Float64,1},1},Float64,Array{Float64,1}}) at optimize.jl:57
optimize(::OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}, ::Optim.Options{Float64,Nothing}) at optimize.jl:33
#optimize#93 at interface.jl:116 [inlined]
(::Optim.var"#kw##optimize")(::NamedTuple{(:autodiff,),Tuple{Symbol}}, ::typeof(optimize), ::Function, ::Array{Float64,1}, ::LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}, ::Optim.Options{Float64,Nothing}) at none:0
(::Optim.var"#kw##optimize")(::NamedTuple{(:autodiff,),Tuple{Symbol}}, ::typeof(optimize), ::Function, ::Array{Float64,1}, ::LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}) at none:0
top-level scope at SEM.jl:62

Thanks!

Maximilian

I guess the issue is that for some parameters your determinant becomes negative and you can’t take the log anymore. Ideally you would like to bound your parameters such that doesn’t happen, but it might be difficult in your case. Otherwise you can return a very bad likelihood when your determinant is negative, and that hopefully keeps the optimiser away from these regions, but that can also sometimes confuse it.

Thanks for the suggestion, that’s what I thought too, but then I solved the same problem without bounds in R with a similar optimizer, so I assume that the error does not occur because of the specific case, but because I did something fundamentally wrong… I think maybe the automatic differentiation does not work properly for my function, but my knowledge about Julia is too limited to really understand why.

I tried your code (not that I understand what it really is), and noticed that

optimize(f, x0, LBFGS(), autodiff = :forward)

works if you use a starting guess that is closer to what you get from

result = optimize(f, x0)

The solution seems to be close to

0.837     1.068     0.635     0.525     0.778     1.107

If you don’t want to bound a parameter, you can just transform it by using it as exp(\beta) in your code. The \beta will be in log-units hence unrestricted.

To get standard errors, you can use the delta method.

Thanks! Maybe it really was only a bad choice of starting values.

I tried to use the delta method, however

parameters = Optim.minimizer(optimize(func, x0))
 
se = sqrt.(diag(inv(hessian!(func, parameters))))

seems to give wrong estimates. The solution should be close to

 0.12
 0.1 
 0.13
 0.13
 0.14
 0.21

but it is

 1.45
 1.29
 1.59
 1.6
 1.73
 2.63

I’m not too familiar with SEM models, but if I understand your code correctly, I think S is supposed to be a covariance matrix? If that’s the case, then we know that x[1:4] should be non-negative.

Note that a lot of optimization software may work around this constraint, even if you don’t explicitly provide it to the optimizer. For example, the optimizer may evaluate the LLK with x[1] = -1, get NaN, and then half-step until the likelihood (a) is defined and (b) improves over the current values. You may also just get lucky and the optimizer never tries to evaluate negative values of x[1:4].

However, a lot of computation time will be saved if one simply tells the optimizer not to try negative values of x[1:4], so if you know the constraints, it’s typically better to provide them.

You are completely right, and I agree that this would be computationally more efficient. The reason why I would like to test if the optimization works in cases where I know it should work without constraints is that all SEM Programms and Packages I know do not constrain parameter values by default. This is because in the SEM community getting a negative value for one of your variances is often used as an indicator that you missspecified your model (it even has it’s own name, “Heywood-Case”) or that your model is not appropriate for the data. My current project requires fitting many SEMs in parallel, and since there is no SEM package for Julia at the moment (that I know of) I thought about writing a small package afterwards, that should behave as expected by potential users.

For what it’s worth: if you constrain your parameters to be positive but still use the same starting value and LBFGS, you still get the same result

julia> result = optimize(f, 0, Inf, x0, Fminbox(LBFGS()));

julia> Optim.minimizer(result)
6-element Array{Float64,1}:
 0.8374272886760326
 1.0684678826164824
 0.6348739347327949
 0.5254726972531473
 0.7778310850049915
 1.107254474224332

Hmm, I see. I’m certainly not an expert on what the SEM community does, but there’s at least one key point to be made here.

When one relaxes the constraints to make sure the estimator still respects them, we are relaxing the constraints on the estimator. However, during optimization, we trying to optimize a likelihood function to find the solution which gives us this estimator. There’s no strong reason to believe that the constraints should be satisfied during the various proposals used by the optimizer, where as MLE theory tells us that if (a) the true values are within (not on!) the boundary and (b) we have a lot of data, the unconstrained solution should be within the boundary with high probability. So it makes sense to check the unconstrained solution to see that it respects the constraints, but it’s not a bad sign if one of the proposal values used by the optimizer doesn’t fit the constraints.

Finally, my guess is that Heywood-Cases happen when one is using something like a Method-of-Moments estimator rather than an MLE estimator? Or does the variances of S not directly affect the likelihood function?

I agree with the argument from @pistacliffcho and additionally want to point out that the constraints seem to hinder the performance and do not improve it (at least for this case)

@benchmark result = optimize(f, x0, LBFGS(), autodiff = :forward)
BenchmarkTools.Trial: 
  memory estimate:  917.23 KiB
  allocs estimate:  4171
  --------------
  minimum time:     274.429 μs (0.00% GC)
  median time:      292.406 μs (0.00% GC)
  mean time:        377.829 μs (19.25% GC)
  maximum time:     64.559 ms (98.99% GC)
  --------------
  samples:          10000
  evals/sample:     1
 @benchmark result = optimize(f, 0, Inf, x0, Fminbox(LBFGS()), autodiff = :forward)
BenchmarkTools.Trial: 
  memory estimate:  1.98 MiB
  allocs estimate:  10215
  --------------
  minimum time:     657.603 μs (0.00% GC)
  median time:      720.054 μs (0.00% GC)
  mean time:        899.320 μs (18.46% GC)
  maximum time:     65.088 ms (98.21% GC)
  --------------
  samples:          5545
  evals/sample:     1

You shouldn’t expect it to. It’s a primal log barrier method, so it’s solving a sequence of problems, but that is wasteful in this case because the optimum is actually interior here.

In general you cannot expect that providing constraints make anything faster. Intuitively we might think that we’re “telling the optimizer more information about the problem”, but usually you then have to enforce that constraint (directly or indirectly) and that is costly.

Apropo telling the optimizer more about the problem: In these SEM problems computing the hessian is rather expensive. Is there anything to gain from telling LBFGS() the hessian e.g. for the first approximation?

In this case, the OP just has box constraints. Shouldn’t that be quick to enforce?

Depends on the method you use to enforce that constraint, but yes in general. Fminbox has been part of Optim for a long time, but in the soon-to-come remake, I plan to do better for box constraints. Stay tuned :slight_smile:

3 Likes