Multivariate Maximum Likelihood

Cristian_Montes · July 22, 2022, 4:24am

I am writing a simple program to determine maximum likelihood parameters for a non-linear function. For that I am using a multivariate normal, as the function needs to calibrate two functions with correlated errors. I run the same code in R with no issues, but Julia seem to have issues with the variance-covariance function and tells me, after several iterations, that the Cholesky factorization failed. That has happened to me in R before, but I changed to another package that uses eigen factorization as an alternative and it was able to complete the task. Is there any other multivariate normal function, different from the MvNormal that could choose different factorizations?

Paul_Soderlind · July 22, 2022, 8:01am

[not an answer to factorization question] It sounds like the iterations take you into a case where the covariance matrix is singular. Maybe you need to restrict/transform the parameters so that cannot happen. Or use other settings in the optimization and hope for luck…

dgagnon · July 22, 2022, 12:03pm

If I understand correctly, the eigen factorization will work on the matrix but the Cholesky will fail. Is that correct?
If so, I invite you to check the eigen values. One or more may be very small, almost 0, explaining why the Cholesky factorization does not work.

cgeoga · July 22, 2022, 2:54pm

I think some more details and perhaps a MWE would be necessary to give guidance that is definitely relevant, but here are two generic thoughts:

Did you provide bounds constraints to your optimizer? If it is a line-search based optimizer, particularly with an approximate Hessian like BFGS, it’s possible that a backtracing line search is looking too far out and touching invalid parameters (the first step of BFGS in Ipopt is massive and potentially problematic in my experience, for example). If you aren’t super familiar with what I mean here, a simple thing to try is to make your bounds constraints on valid parameters tighter and see if the issue goes away.
Are you by any chance assembling a covariance matrix with the squared exponential covariance function and no nugget/measurement noise/some other perturbation? If so, that kernel is analytic everywhere, in particular also at the origin, and so kernel matrices using that function are very often numerically rank deficient and the Cholesky factorization can fail. An example:

pts = range(0.0, 1.0, length=1000)
M   = Symmetric([exp(-abs2(x-y)) for x in pts, y in pts])
cholesky(M) # error, even though this is a valid covariance matrix

So instead you could try a Matern covariance, which reduces to some very nice forms for half-integer orders. For \nu=1/2 or \nu=3/2 the numerics are much better and you probably wouldn’t have this problem.

Hard to say something more specific than that based on the amount of detail you provide. But hopefully that’s helpful.

Cristian_Montes · July 22, 2022, 4:34pm

Thanks for your reply. I do provide bounds for variances to be positive and correlation parameters to be between 0 and 1. The function is really simple. And it help us calculate the form of trees based on their relative height and diameters. The model works fine for single equation, but it is the multivariate version that doesn’t like what the optimization algorithm is throwing at it.

I see your point, but probably this could be some future improvement over current MvNormal. mvtnorm in R doesn’t have the problem (but I can’t do automatic differentiation in R when the model has some “if” statements).

function taper_mb_multi(θ, Hrel)
β₁₁ = θ[1]
β₂₁ = θ[2]
β₃₁ = θ[3]
β₄₁ = θ[4]

α₁₁ = exp(θ[5]) / (1 + exp(θ[5]))
α₂₁ = exp(θ[6]) / (1 + exp(θ[6]))

β₁₂  = θ[8]
β₂₂  = θ[9]
β₃₂  = θ[10]
β₄₂  = θ[11]

α₁₂ = exp(θ[12]) / (1 + exp(θ[12]))
α₂₂ = exp(θ[13]) / (1 + exp(θ[13]))

I₁₁ = (Hrel .<= α₁₁) 
I₂₁ = (Hrel .<= α₂₁) 

I₁₂ = (Hrel .<= α₁₂)
I₂₂ = (Hrel .<= α₂₂)

ŷᵢ = β₁₁ .* (Hrel .-1) .+ β₂₁ .* (Hrel.^2 .-1)  .+ β₃₁ .* I₁₁ .* (α₁₁ .- Hrel).^2 .+ β₄₁ .* I₂₁ .* (α₂₁ .- Hrel).^2
ŷₒ = β₁₂ .* (Hrel .-1) .+ β₂₂ .* (Hrel.^2 .-1)  .+ β₃₂ .* I₁₂ .* (α₁₂ .- Hrel).^2 .+ β₄₂ .* I₂₂ .* (α₂₂ .- Hrel).^2

return [ŷᵢ  ŷₒ]

end

function loglik_mv(θ, RelHt, RelDib, RelDob)

n              = size(RelDob,1)
result         = 0.0

D̂              = taper_mb_multi(θ, RelHt)

σᵢ             = exp(θ[7])
σₒ             = exp(θ[14])
ρ              = exp(θ[15])/(1+exp(θ[15]));

Σ              = [σᵢ^2    ρ*σᵢ*σₒ
                  ρ*σᵢ*σₒ   σₒ^2]

dᵢ            = D̂[:,1] .- RelDib
dₒ            = D̂[:,2] .- RelDob

for i in 1:n
  result       += logpdf(MvNormal(Σ),[dᵢ[i], dₒ[i]])
end

return result

end

Topic		Replies	Views
Multivariate Normal with Positive Semi-Definite Covariance Matrix Statistics question	19	6407	October 25, 2018
What is the best way to factorize/decompose a covariance matrix? Numerics question , linearalgebra	21	4364	March 9, 2020
How do I fit an MvNormal to a matrix with Distributions.jl? General Usage linearalgebra , fit , distributions	6	2096	June 17, 2021
Random draws of multivariate normal with positive semi-definite covariance matrix Statistics linearalgebra	4	1106	March 26, 2022
Maximum Likelihood Multivariate Model Optimization (Mathematical) optim	15	1581	January 10, 2020

Multivariate Maximum Likelihood

Related topics