Regression Variance Inflation Factor (VIF)

Hello, how do I check for VIF Variance inflation factor - Wikipedia in a GLM.jl lm model? I went through the JuliaStats org repositories and package documentations and could not find anything.

Hey. The following works w/ GLM:

using DataFrames, GLM

# Make data
N=10_000; 
X0 = ones(N); X1=randn(N); X2=randn(N); X3=randn(N); ε = randn(N);
β_0 = 4.0; β_1 = 2.0; β_2 =10.5; β_3 =-9.5; σ_ε =1.0;
y = β_0*X0 .+ β_1*X1 .+ β_2*X2 .+ β_3*X3 .+ σ_ε*ε
X = [one.(X1) X1 X2 X3] 
d = DataFrame(X1=X1, X2=X2, X3=X3, y=y)

# linear algebra
β_LS = X \ y

# GLM
m_y  = lm(@formula(y ~ 1 + X1 + X2 + X3), d)
m_X1 = lm(@formula(X1 ~ 1 + X2 + X3), d)
m_X2 = lm(@formula(X2 ~ 1 + X1 + X3), d)
m_X3 = lm(@formula(X3 ~ 1 + X1 + X2), d)

vif_X1 = 1.0/(1.0-r2(m_X1))
vif_X2 = 1.0/(1.0-r2(m_X2))
vif_X3 = 1.0/(1.0-r2(m_X3))

Ok, so it has to be done by hand…

  1. Correct. GLM.jl currently does not have a vif function.
  2. RegTools.jl, has it, but is no longer maintained…
  3. It’s easy to do w/ linear algebra or to define your own function
using Statistics, LinearAlgebra
vifm = diag(inv(cor(X[:,2:end])))
vifm ≈ [vif_X1; vif_X2; vif_X3;]  # verify result above

# own function 
vif_GLM(mod) = diag(inv(cor(mod.model.pp.X[:,2:end])))

julia> vif_GLM(m_y) ≈ vifm ≈ [vif_X1; vif_X2; vif_X3;]
true
2 Likes

Try w/ “real” data:

using DataFrames, GLM, Statistics, LinearAlgebra, RDatasets 
airquality = rename(dataset("datasets", "airquality"), "Solar.R" => "Solar_R")
#
y =  airquality.Wind
X1 = airquality.Temp    
X2 = airquality.Solar_R  
X3 = airquality.Ozone 
d = DataFrame(X1=X1, X2=X2, X3=X3, y=y)
d = d[completecases(d), :]
X = [one.(d.X1) d.X1 d.X2 d.X3]  .|> Float64
y = d.y .|> Float64
#
β_LS = X \ y
vifm = diag(inv(cor(X[:,2:end])))
#
m_y  = lm(@formula(y ~ 1 + X1 + X2 + X3), d)
m_X1 = lm(@formula(X1 ~ 1 + X2 + X3), d)
m_X2 = lm(@formula(X2 ~ 1 + X1 + X3), d)
m_X3 = lm(@formula(X3 ~ 1 + X1 + X2), d)

vif_X1 = 1.0/(1.0-r2(m_X1))
vif_X2 = 1.0/(1.0-r2(m_X2))
vif_X3 = 1.0/(1.0-r2(m_X3))
#
vifm ≈ [vif_X1; vif_X2; vif_X3;]
#
vif_GLM(mod) = diag(inv(cor(mod.model.pp.X[:,2:end])))
vif_GLM(m_y) ≈ vifm ≈ [vif_X1; vif_X2; vif_X3;]

Thank you!

1 Like