PosDefException: matrix is not positive definite; Cholesky factorization failed

pdeffebach · July 30, 2021, 4:25pm

You still haven’t provided a full MWE that gives the error you described. But here is an MWE that does and how to have a solution

julia> function _onehot(df,symb)
           copy = df
           for c in unique(copy[!,symb])
               copy[!,Symbol(c)] = copy[!,symb] .== c
           end
           return(copy)
       end;

julia> begin
       using DataFrames, Chain
       teams = ["Jazz", "Heat", "Hawks"]
       rank = ["first", "second", "third"]
       outcome = [true, false]
       df = DataFrame(Id = 1:50, team = rand(teams, 50), rank = rand(rank, 50), outcome = rand(outcome, 50))
       df2 = @chain df begin
           _onehot(:team)
           _onehot(:rank)
       end
       fm_bad = @formula(outcome ~ Jazz + Heat + Hawks + first + second + third)
       # will fail, you include too many dummy variables
       # logit_bad = glm(fm_bad, df2, Binomial(), ProbitLink())
       fm_good1 = @formula(outcome ~ Jazz + Heat + second + third)
       # will work, excluding one dummy from each
       logit_good1 = glm(fm_good1, df2, Binomial(),ProbitLink())
       fm_good2 = @formula(outcome ~ team + rank)
       # even better, GLM handles the collinearity
       logit_good2 = glm(fm_good2, df2, Binomial(), ProbitLink())
       end
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Binomial{Float64}, ProbitLink}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}}

outcome ~ 1 + team + rank

Coefficients:
──────────────────────────────────────────────────────────────────────────
                  Coef.  Std. Error      z  Pr(>|z|)  Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept)    0.128893    0.343977   0.37    0.7079   -0.54529  0.803075
team: Heat    -0.94062     0.511527  -1.84    0.0659   -1.94319  0.0619533
team: Jazz    -0.432373    0.465327  -0.93    0.3528   -1.3444   0.479652
rank: second  -0.502133    0.474511  -1.06    0.2900   -1.43216  0.427891
rank: third   -0.217286    0.464381  -0.47    0.6399   -1.12746  0.692884
──────────────────────────────────────────────────────────────────────────

Topic		Replies	Views
GLM.jl LogisticRegression errors: matrix is not positive definite; Cholesky factorization failed Statistics question , glm	14	4973	June 9, 2022
Error in Regression but I don't think there is collinearity: "PosDefException: matrix is not positive definite; Cholesky factorization failed." New to Julia dataframes , glm	0	312	March 7, 2021
Linear regression with a positive definite matrix in GLM.jl? Statistics glm	11	2691	February 1, 2019
Error with Bayesian Gaussian Process "PosDefException: matrix is not positive definite; Cholesky factorization failed." Statistics	2	442	August 13, 2022
Help fitting linear probability model with GLM.jl Statistics glm , econometrics	4	788	August 26, 2021

PosDefException: matrix is not positive definite; Cholesky factorization failed

Related topics