Multivariate Maximum Likelihood

I am writing a simple program to determine maximum likelihood parameters for a non-linear function. For that I am using a multivariate normal, as the function needs to calibrate two functions with correlated errors. I run the same code in R with no issues, but Julia seem to have issues with the variance-covariance function and tells me, after several iterations, that the Cholesky factorization failed. That has happened to me in R before, but I changed to another package that uses eigen factorization as an alternative and it was able to complete the task. Is there any other multivariate normal function, different from the MvNormal that could choose different factorizations?

[not an answer to factorization question] It sounds like the iterations take you into a case where the covariance matrix is singular. Maybe you need to restrict/transform the parameters so that cannot happen. Or use other settings in the optimization and hope for luck…

If I understand correctly, the eigen factorization will work on the matrix but the Cholesky will fail. Is that correct?
If so, I invite you to check the eigen values. One or more may be very small, almost 0, explaining why the Cholesky factorization does not work.

I think some more details and perhaps a MWE would be necessary to give guidance that is definitely relevant, but here are two generic thoughts:

  • Did you provide bounds constraints to your optimizer? If it is a line-search based optimizer, particularly with an approximate Hessian like BFGS, it’s possible that a backtracing line search is looking too far out and touching invalid parameters (the first step of BFGS in Ipopt is massive and potentially problematic in my experience, for example). If you aren’t super familiar with what I mean here, a simple thing to try is to make your bounds constraints on valid parameters tighter and see if the issue goes away.
  • Are you by any chance assembling a covariance matrix with the squared exponential covariance function and no nugget/measurement noise/some other perturbation? If so, that kernel is analytic everywhere, in particular also at the origin, and so kernel matrices using that function are very often numerically rank deficient and the Cholesky factorization can fail. An example:
pts = range(0.0, 1.0, length=1000)
M   = Symmetric([exp(-abs2(x-y)) for x in pts, y in pts])
cholesky(M) # error, even though this is a valid covariance matrix

So instead you could try a Matern covariance, which reduces to some very nice forms for half-integer orders. For \nu=1/2 or \nu=3/2 the numerics are much better and you probably wouldn’t have this problem.

Hard to say something more specific than that based on the amount of detail you provide. But hopefully that’s helpful.

Thanks for your reply. I do provide bounds for variances to be positive and correlation parameters to be between 0 and 1. The function is really simple. And it help us calculate the form of trees based on their relative height and diameters. The model works fine for single equation, but it is the multivariate version that doesn’t like what the optimization algorithm is throwing at it.

I see your point, but probably this could be some future improvement over current MvNormal. mvtnorm in R doesn’t have the problem (but I can’t do automatic differentiation in R when the model has some “if” statements).

function taper_mb_multi(θ, Hrel)
β₁₁ = θ[1]
β₂₁ = θ[2]
β₃₁ = θ[3]
β₄₁ = θ[4]

α₁₁ = exp(θ[5]) / (1 + exp(θ[5]))
α₂₁ = exp(θ[6]) / (1 + exp(θ[6]))

β₁₂  = θ[8]
β₂₂  = θ[9]
β₃₂  = θ[10]
β₄₂  = θ[11]

α₁₂ = exp(θ[12]) / (1 + exp(θ[12]))
α₂₂ = exp(θ[13]) / (1 + exp(θ[13]))

I₁₁ = (Hrel .<= α₁₁) 
I₂₁ = (Hrel .<= α₂₁) 

I₁₂ = (Hrel .<= α₁₂)
I₂₂ = (Hrel .<= α₂₂)

ŷᵢ = β₁₁ .* (Hrel .-1) .+ β₂₁ .* (Hrel.^2 .-1)  .+ β₃₁ .* I₁₁ .* (α₁₁ .- Hrel).^2 .+ β₄₁ .* I₂₁ .* (α₂₁ .- Hrel).^2
ŷₒ = β₁₂ .* (Hrel .-1) .+ β₂₂ .* (Hrel.^2 .-1)  .+ β₃₂ .* I₁₂ .* (α₁₂ .- Hrel).^2 .+ β₄₂ .* I₂₂ .* (α₂₂ .- Hrel).^2

return [ŷᵢ  ŷₒ]


function loglik_mv(θ, RelHt, RelDib, RelDob)

n              = size(RelDob,1)
result         = 0.0

D̂              = taper_mb_multi(θ, RelHt)

σᵢ             = exp(θ[7])
σₒ             = exp(θ[14])
ρ              = exp(θ[15])/(1+exp(θ[15]));

Σ              = [σᵢ^2    ρ*σᵢ*σₒ
                  ρ*σᵢ*σₒ   σₒ^2]

dᵢ            = D̂[:,1] .- RelDib
dₒ            = D̂[:,2] .- RelDob

for i in 1:n
  result       += logpdf(MvNormal(Σ),[dᵢ[i], dₒ[i]])

return result