Running counterfactuals on a regression with interactions

Let’s say I have a data frame with columns c1,c2,…,cn and I run a regression of c1 on c2, c3, c4 and c2.*c4:

sm = GLM.lm( @formula(c1 ~ c2 + c3 +c4 +c2&c4), df)

If I were in Python, I could then do

formula = 'c1 ~ c2 + c3 +c4 +c2&c4'
import patsy
y, X = patsy.dmatrices(formula, df)

This would give me X, the design matrix of the regression–i.e. columns corresponding to c2, c3, c4, and c2.*c4.
How can I get this matrix in Julia? I could probably do this manually, but if I had a large number of regressors and interactions, it might get tricky, so I’d like a systematic approach.

The reason I’m asking is that I eventually want to do counterfactuals for what would happen if I change c2 to 0 or to 1. You could throw this in as a bonus, but I think once I get the first part, I can figure that out.

I think you are looking for modelmatrix(sm)

1 Like

That doesn’t appear to be in the documentation. Just curious how you found it (not that important though)

My brain :slight_smile:

No idea when I found it.

It is in the StatsModels.jl docs. If something isn’t in GLM.jl docs you should look there.

https://juliastats.org/StatsModels.jl/stable/api/#StatsBase.modelmatrix

Why do you need the matrix though if you want to do counterfactuals? predict does this for you automatically:

julia> using DataFrames, GLM

julia> df = DataFrame(a = 1:10, b = 6:15);

julia> df.y = 0.5 * df.a .+ 0.8*df.b .+ df.a .* df.b;

julia> sm = lm(@formula(y ~ a*b), df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

y ~ 1 + a + b + a & b

Coefficients:
────────────────────────────────────────────────────────────────────────────────────
             Coef.     Std. Error                  t  Pr(>|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────────────────
(Intercept)    0.0  NaN                       NaN       NaN         NaN        NaN
a              0.5    4.8306e-13    1035067731159.83    <1e-81        0.5        0.5
b              0.8    1.23567e-13   6474215637663.89    <1e-87        0.8        0.8
a & b          1.0    2.28609e-14  43742897935693.22    <1e-93        1.0        1.0
────────────────────────────────────────────────────────────────────────────────────

julia> round.(predict(sm, DataFrame(a = 1:10, b = 0)))
10-element Vector{Float64}:
 0.0
 1.0
 1.0
 2.0
 2.0
 3.0
 3.0
 4.0
 4.0
 5.0
4 Likes