Running counterfactuals on a regression with interactions

samerb · March 8, 2021, 2:43am

Let’s say I have a data frame with columns c1,c2,…,cn and I run a regression of c1 on c2, c3, c4 and c2.*c4:

sm = GLM.lm( @formula(c1 ~ c2 + c3 +c4 +c2&c4), df)

If I were in Python, I could then do

formula = 'c1 ~ c2 + c3 +c4 +c2&c4'
import patsy
y, X = patsy.dmatrices(formula, df)

This would give me X, the design matrix of the regression–i.e. columns corresponding to c2, c3, c4, and c2.*c4.
How can I get this matrix in Julia? I could probably do this manually, but if I had a large number of regressors and interactions, it might get tricky, so I’d like a systematic approach.

The reason I’m asking is that I eventually want to do counterfactuals for what would happen if I change c2 to 0 or to 1. You could throw this in as a bonus, but I think once I get the first part, I can figure that out.

tbeason · March 8, 2021, 2:53am

I think you are looking for modelmatrix(sm)

samerb · March 8, 2021, 2:57am

That doesn’t appear to be in the documentation. Just curious how you found it (not that important though)

tbeason · March 8, 2021, 2:58am

My brain

No idea when I found it.

It is in the StatsModels.jl docs. If something isn’t in GLM.jl docs you should look there.

https://juliastats.org/StatsModels.jl/stable/api/#StatsBase.modelmatrix

nilshg · March 8, 2021, 7:38am

Why do you need the matrix though if you want to do counterfactuals? predict does this for you automatically:

julia> using DataFrames, GLM

julia> df = DataFrame(a = 1:10, b = 6:15);

julia> df.y = 0.5 * df.a .+ 0.8*df.b .+ df.a .* df.b;

julia> sm = lm(@formula(y ~ a*b), df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

y ~ 1 + a + b + a & b

Coefficients:
────────────────────────────────────────────────────────────────────────────────────
             Coef.     Std. Error                  t  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────────────────
(Intercept)    0.0  NaN                       NaN       NaN         NaN        NaN
a              0.5    4.8306e-13    1035067731159.83    <1e-81        0.5        0.5
b              0.8    1.23567e-13   6474215637663.89    <1e-87        0.8        0.8
a & b          1.0    2.28609e-14  43742897935693.22    <1e-93        1.0        1.0
────────────────────────────────────────────────────────────────────────────────────

julia> round.(predict(sm, DataFrame(a = 1:10, b = 0)))
10-element Vector{Float64}:
 0.0
 1.0
 1.0
 2.0
 2.0
 3.0
 3.0
 4.0
 4.0
 5.0

Topic		Replies	Views
Julia equivalent of the R function `model.matrix` General Usage statistics	2	1012	October 31, 2020
Having trouble with StatsModels.jl `modelmatrix` to replicate R results Statistics	1	63	April 11, 2025
Linear regression without the intercept term Statistics question , regression , fit , glm	7	4147	March 8, 2023
GLM is slow on large datasets. Using OnlineStats for regressions? MixedModels? Performance glm	25	5092	November 26, 2018
The simplest linear fit with GLM Tooling glm	13	5287	November 11, 2021

Running counterfactuals on a regression with interactions

Related topics