What shape should data be in to perform a linear fit with GLM?

Okay I did something like this:

input = Array{Float64}(undef, num_points, 1+dims)
output = Array{Float64}(undef, num_points)

... write data to input and output

model = lm(input, output)

with one more dimension than I really have, having every point in that dimension set to 1, as it otherwise does not include an offset in the linear fit.

It sometimes failed with this error:

ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.
Stacktrace:
 [1] checkpositivedefinite at /build/julia/src/julia-1.5.3/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/factorization.jl:18 [inlined]
 [2] cholesky!(::LinearAlgebra.Hermitian{Float64,Array{Float64,2}}, ::Val{false}; check::Bool) at /build/julia/src/julia-1.5.3/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/cholesky.jl:219
 [3] cholesky! at /build/julia/src/julia-1.5.3/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/cholesky.jl:218 [inlined] (repeats 2 times)
 [4] GLM.DensePredChol(::Array{Float64,2}, ::Bool) at /home/alice/.julia/packages/GLM/dbJFe/src/linpred.jl:107
 [5] cholpred at /home/alice/.julia/packages/GLM/dbJFe/src/linpred.jl:117 [inlined]
 [6] fit(::Type{GLM.LinearModel}, ::Array{Float64,2}, ::Array{Float64,1}, ::Bool; wts::Array{Float64,1}) at /home/alice/.julia/packages/GLM/dbJFe/src/lm.jl:127
 [7] fit at /home/alice/.julia/packages/GLM/dbJFe/src/lm.jl:127 [inlined]
 [8] #lm#2 at /home/alice/.julia/packages/GLM/dbJFe/src/lm.jl:144 [inlined]
 [9] lm at /home/alice/.julia/packages/GLM/dbJFe/src/lm.jl:144 [inlined] (repeats 2 times)
 [10] multistart_compute_clusters(::Main.DatasetMod.Dataset, ::Int64) at /home/alice/src/VCLR/src/clr.jl:527
 [11] optimize!(::Main.CLRSolver.CLRBranchAndBound) at /home/alice/src/VCLR/src/clr.jl:79
 [12] top-level scope at REPL[13]:1
 [13] run_repl(::REPL.AbstractREPL, ::Any) at /build/julia/src/julia-1.5.3/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl:288

This happens in the case I also mentioned in my original post:

Additionally, I assume the model will yell at me if the data does not have enough data points to uniquely determine the fit, but in this case I’d really prefer to just get any fit that minimizes the squared error.

However it appears to have been fixed by setting allowrankdeficient to true.

model = lm(input, output, true)

Thanks for the help!

2 Likes