GLM.jl LogisticRegression errors: matrix is not positive definite; Cholesky factorization failed

The following errors:

using DataFrames, GLM

df = DataFrame(x1=[1,2,3,4], x2=[1,2,3,4], y=[1,1,0,0])

mdl = glm(@formula(y~x1+x2), df, Binomial(), LogitLink())

predict(mdl, df[:, [:x1, :x2]])

ERROR: LoadError: PosDefException: matrix is not positive definite; Cholesky factorization failed.

On the other hand, logistic regression is successfully done with ScikitLearn:


using DataFrames, ScikitLearn

@sk_import linear_model: LogisticRegression

df = DataFrame(x1=[1,2,3,4], x2=[1,2,3,4], y=[1,1,0,0])

mdl = fit!(LogisticRegression(), Matrix(df[:, [:x1, :x2]]), df.y)

ps = predict_proba(mdl, Matrix(df[:, [:x1, :x2]]))

Multi-collinearity cannnot be an issue for logistic regression unlike linear regression.

Then, any idea why GLM fails in this case?

Yes, collinearity can be a problem for nonlinear models, including logistic regression. The coefficients are simply not identified in your case, because x2 is an exact copy of x1. If ScikitLearn gives an answer, it must be eliminating either x1 or x2, or doing something similar to deal with the problem. If the index function is t(x1a+x2b), you are probably getting an estimate of a+b, which may be misleading if you don’t notice the collinearity problem. Or, it might restrict a=b, which will also give a biased estimate. In cases like this, I prefer what GLM does, an error message that helps you find the problem.

2 Likes

Scikit learn glm isn’t vanilla glm. It is glm with L2 penalty with magic number 1 as the hyper parameter

2 Likes

Linear regression entails matrix inversion, and this is the mechanism via which collinearity affects linear regression in case that the matrix is singular.

However in logistic regression, the estimation of coefficients is based on some likelihood function instead of normal equation as is used in linear regression.

Since there is no closed form expression for obtaining the solution, iterative calculation is instead used for the estimation. There is no way collinearity kicks in the process.

I think you need https://github.com/JuliaStats/GLM.jl/pull/314.

2 Likes

Thanks for the information. But it is still unclear how logistic regression can suffer from collinearity unless somehow GLM’s logistic regression algorithm employs matrix calculation like in linear regression( It seems because Cholesky decomposition was mentioned in the error message)

This is mistaken. Estimation of nonlinear models does entail solving first order conditions, numerically, rather than analytically, as can be done for linear models. With perfect collinearity, there is more than one solution to the first order conditions, and the matrix of second derivatives is singular. Most iterative gradient-based optimization methods (e.g., BFGS) will not work. It may be the case that certain procedures can still produce an estimate (e.g., regularization methods, as noted in another response, or using a derivative free optimization method) but the estimate will not be consistent, in general.

6 Likes

The algorithm for fitting generalized linear models in the GLM package is iteratively reweighted least squares (IRLS) which performs a weighted linear least squares fit at each iteration. Thus a rank-deficient model matrix means that the coefficients are unidentifiable in a GLM as well as in an LM.

The identifiability is a property of the linear predictor, Xβ, where X is the model matrix and β is the coefficient vector. It doesn’t really matter whether the linear predictor is also the fitted value, as in a linear model, or if an inverse link function is applied to it, as in logistic regression.

4 Likes

Possibly useful: https://github.com/timholy/PositiveFactorizations.jl

2 Likes

This is a great answer. For such situations - with the consistency caveat applied - I’ve applied the Moore-Penrose Inverse with success…

Thanks a lot. It would be useful if GLM could produce p-value and confidence interval along with model coefficients rather than aborts in case there is a collinearity (like in R) so that one can drop collinear predictors.