How can I substitute NaN in a GLM model with zeroes?

bertulli · July 24, 2023, 7:44pm

Hi all!

I have a linear model, created with GLM.jl. Since I correlated some variables with a categorical one, I have lots of NaN as coefficients, when the data are provided for only some of the categorical values:

julia> model
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

Base power mean (W) ~ 1 + mnemonic + APSR (s flag) + Is conditional + Dest reg == source reg + Barrel shift amount + Has barrel shift + Has immediate operand + mnemonic & Binary weight + Barrel shift amount & Has barrel shift + mnemonic & APSR (s flag) + mnemonic & Is conditional + mnemonic & Dest reg == source reg + mnemonic & Barrel shift amount + mnemonic & Has barrel shift + mnemonic & Has immediate operand + mnemonic & Barrel shift amount & Has barrel shift + mnemonic & Binary weight & APSR (s flag) + mnemonic & Binary weight & Is conditional + mnemonic & Binary weight & Dest reg == source reg + mnemonic & Binary weight & Barrel shift amount + mnemonic & Binary weight & Has barrel shift + mnemonic & Binary weight & Has immediate operand + mnemonic & Binary weight & Barrel shift amount & Has barrel shift

Coefficients:
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                                                  Coef.     Std. Error       t  Pr(>|t|)      Lower 95%      Upper 95%
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
(Intercept)                                                                 0.0804049      0.0305935      2.63    0.0086    0.0204363      0.140373
mnemonic: add                                                              -0.0209991      0.0306201     -0.69    0.4929   -0.0810198      0.0390215
mnemonic: and                                                               0.000384739    0.0309979      0.01    0.9901   -0.0603766      0.0611461
mnemonic: asr                                                              -5.10902e-6     0.0385573     -0.00    0.9999   -0.0755842      0.0755739
mnemonic: b                                                                 0.00127954     0.00981586     0.13    0.8963   -0.0179612      0.0205203
mnemonic: bfc                                                               0.0          NaN            NaN       NaN     NaN            NaN
mnemonic: bfi                                                               0.0          NaN            NaN       NaN     NaN            NaN
mnemonic: bic                                                               0.0          NaN            NaN       NaN     NaN            NaN

This means that, when I do a prediction with variables for which the model is not trained for, I got an error. Is there a way I can substitute each NaN with 0.0, so that the model doesn’t throw error anymore, but just ignores the missing coefficients?

Thanks!

Topic		Replies	Views
How can I use a linear model with NaN parameters due to missing train data? Modelling & Simulations statistics , flux , mlj , glm , linear-regression	0	406	July 15, 2023
Missing or NaN Data in GLM (e.g., in DataFrame, @formula) Statistics glm	10	6442	September 12, 2018
Predict fails for simple case of GLM? General Usage question , glm	6	746	January 5, 2022
Normalization and Linear Model NaN error? Statistics	3	1030	December 3, 2021
Prevent GLM from dropping rows with missings Statistics	1	207	January 5, 2023

How can I substitute NaN in a GLM model with zeroes?

Related topics