Hello, how do I check for VIF Variance inflation factor - Wikipedia in a GLM.jl lm
model? I went through the JuliaStats org repositories and package documentations and could not find anything.
Hey. The following works w/ GLM:
using DataFrames, GLM
# Make data
N=10_000;
X0 = ones(N); X1=randn(N); X2=randn(N); X3=randn(N); Īµ = randn(N);
Ī²_0 = 4.0; Ī²_1 = 2.0; Ī²_2 =10.5; Ī²_3 =-9.5; Ļ_Īµ =1.0;
y = Ī²_0*X0 .+ Ī²_1*X1 .+ Ī²_2*X2 .+ Ī²_3*X3 .+ Ļ_Īµ*Īµ
X = [one.(X1) X1 X2 X3]
d = DataFrame(X1=X1, X2=X2, X3=X3, y=y)
# linear algebra
Ī²_LS = X \ y
# GLM
m_y = lm(@formula(y ~ 1 + X1 + X2 + X3), d)
m_X1 = lm(@formula(X1 ~ 1 + X2 + X3), d)
m_X2 = lm(@formula(X2 ~ 1 + X1 + X3), d)
m_X3 = lm(@formula(X3 ~ 1 + X1 + X2), d)
vif_X1 = 1.0/(1.0-r2(m_X1))
vif_X2 = 1.0/(1.0-r2(m_X2))
vif_X3 = 1.0/(1.0-r2(m_X3))
1 Like
Ok, so it has to be done by handā¦
- Correct. GLM.jl currently does not have a
vif
function. - RegTools.jl, has it, but is no longer maintainedā¦
- Itās easy to do w/ linear algebra or to define your own function
using Statistics, LinearAlgebra
vifm = diag(inv(cor(X[:,2:end])))
vifm ā [vif_X1; vif_X2; vif_X3;] # verify result above
# own function
vif_GLM(mod) = diag(inv(cor(mod.model.pp.X[:,2:end])))
julia> vif_GLM(m_y) ā vifm ā [vif_X1; vif_X2; vif_X3;]
true
2 Likes
Try w/ ārealā data:
using DataFrames, GLM, Statistics, LinearAlgebra, RDatasets
airquality = rename(dataset("datasets", "airquality"), "Solar.R" => "Solar_R")
#
y = airquality.Wind
X1 = airquality.Temp
X2 = airquality.Solar_R
X3 = airquality.Ozone
d = DataFrame(X1=X1, X2=X2, X3=X3, y=y)
d = d[completecases(d), :]
X = [one.(d.X1) d.X1 d.X2 d.X3] .|> Float64
y = d.y .|> Float64
#
Ī²_LS = X \ y
vifm = diag(inv(cor(X[:,2:end])))
#
m_y = lm(@formula(y ~ 1 + X1 + X2 + X3), d)
m_X1 = lm(@formula(X1 ~ 1 + X2 + X3), d)
m_X2 = lm(@formula(X2 ~ 1 + X1 + X3), d)
m_X3 = lm(@formula(X3 ~ 1 + X1 + X2), d)
vif_X1 = 1.0/(1.0-r2(m_X1))
vif_X2 = 1.0/(1.0-r2(m_X2))
vif_X3 = 1.0/(1.0-r2(m_X3))
#
vifm ā [vif_X1; vif_X2; vif_X3;]
#
vif_GLM(mod) = diag(inv(cor(mod.model.pp.X[:,2:end])))
vif_GLM(m_y) ā vifm ā [vif_X1; vif_X2; vif_X3;]
Thank you!
1 Like
Wrap R function
using RCall
formula=@formula(Price~Baths+Beds+Size+Lot)
function computing_vif(data::AbstractDataFrame,formula::FormulaTerm)
@rput data;@rput formula
R"""
library(car)
model <- lm(formula, data = data)
vif_data=vif(model)
"""
return @rget vif_data
end
A pure-Julia solution is:
using MixedModelsExtras
vif(my_glm_model)
# or
gvif(my_glm_model, scale = true) # This produces GVIF^(1/(2*Df))
See discussion at Add VIF? Ā· Issue #428 Ā· JuliaStats/GLM.jl Ā· GitHub and linked issues.
2 Likes