Regression Variance Inflation Factor (VIF)

Hello, how do I check for VIF Variance inflation factor - Wikipedia in a GLM.jl lm model? I went through the JuliaStats org repositories and package documentations and could not find anything.

Hey. The following works w/ GLM:

using DataFrames, GLM

# Make data
N=10_000; 
X0 = ones(N); X1=randn(N); X2=randn(N); X3=randn(N); Īµ = randn(N);
Ī²_0 = 4.0; Ī²_1 = 2.0; Ī²_2 =10.5; Ī²_3 =-9.5; Ļƒ_Īµ =1.0;
y = Ī²_0*X0 .+ Ī²_1*X1 .+ Ī²_2*X2 .+ Ī²_3*X3 .+ Ļƒ_Īµ*Īµ
X = [one.(X1) X1 X2 X3] 
d = DataFrame(X1=X1, X2=X2, X3=X3, y=y)

# linear algebra
Ī²_LS = X \ y

# GLM
m_y  = lm(@formula(y ~ 1 + X1 + X2 + X3), d)
m_X1 = lm(@formula(X1 ~ 1 + X2 + X3), d)
m_X2 = lm(@formula(X2 ~ 1 + X1 + X3), d)
m_X3 = lm(@formula(X3 ~ 1 + X1 + X2), d)

vif_X1 = 1.0/(1.0-r2(m_X1))
vif_X2 = 1.0/(1.0-r2(m_X2))
vif_X3 = 1.0/(1.0-r2(m_X3))
1 Like

Ok, so it has to be done by handā€¦

  1. Correct. GLM.jl currently does not have a vif function.
  2. RegTools.jl, has it, but is no longer maintainedā€¦
  3. Itā€™s easy to do w/ linear algebra or to define your own function
using Statistics, LinearAlgebra
vifm = diag(inv(cor(X[:,2:end])))
vifm ā‰ˆ [vif_X1; vif_X2; vif_X3;]  # verify result above

# own function 
vif_GLM(mod) = diag(inv(cor(mod.model.pp.X[:,2:end])))

julia> vif_GLM(m_y) ā‰ˆ vifm ā‰ˆ [vif_X1; vif_X2; vif_X3;]
true
2 Likes

Try w/ ā€œrealā€ data:

using DataFrames, GLM, Statistics, LinearAlgebra, RDatasets 
airquality = rename(dataset("datasets", "airquality"), "Solar.R" => "Solar_R")
#
y =  airquality.Wind
X1 = airquality.Temp    
X2 = airquality.Solar_R  
X3 = airquality.Ozone 
d = DataFrame(X1=X1, X2=X2, X3=X3, y=y)
d = d[completecases(d), :]
X = [one.(d.X1) d.X1 d.X2 d.X3]  .|> Float64
y = d.y .|> Float64
#
Ī²_LS = X \ y
vifm = diag(inv(cor(X[:,2:end])))
#
m_y  = lm(@formula(y ~ 1 + X1 + X2 + X3), d)
m_X1 = lm(@formula(X1 ~ 1 + X2 + X3), d)
m_X2 = lm(@formula(X2 ~ 1 + X1 + X3), d)
m_X3 = lm(@formula(X3 ~ 1 + X1 + X2), d)

vif_X1 = 1.0/(1.0-r2(m_X1))
vif_X2 = 1.0/(1.0-r2(m_X2))
vif_X3 = 1.0/(1.0-r2(m_X3))
#
vifm ā‰ˆ [vif_X1; vif_X2; vif_X3;]
#
vif_GLM(mod) = diag(inv(cor(mod.model.pp.X[:,2:end])))
vif_GLM(m_y) ā‰ˆ vifm ā‰ˆ [vif_X1; vif_X2; vif_X3;]

Thank you!

1 Like

Wrap R function

    using RCall
    formula=@formula(Price~Baths+Beds+Size+Lot)
     function computing_vif(data::AbstractDataFrame,formula::FormulaTerm)
        @rput data;@rput formula
        R"""
            library(car)
            model <- lm(formula, data = data)
            vif_data=vif(model)
        """
        return @rget  vif_data
     end

A pure-Julia solution is:

using MixedModelsExtras
vif(my_glm_model)
# or
gvif(my_glm_model, scale = true) # This produces GVIF^(1/(2*Df))

See discussion at Add VIF? Ā· Issue #428 Ā· JuliaStats/GLM.jl Ā· GitHub and linked issues.

2 Likes