Running counterfactuals on a regression with interactions

Let’s say I have a data frame with columns c1,c2,…,cn and I run a regression of c1 on c2, c3, c4 and c2.*c4:

sm = GLM.lm( @formula(c1 ~ c2 + c3 +c4 +c2&c4), df)

If I were in Python, I could then do

formula = 'c1 ~ c2 + c3 +c4 +c2&c4'
import patsy
y, X = patsy.dmatrices(formula, df)

This would give me X, the design matrix of the regression–i.e. columns corresponding to c2, c3, c4, and c2.*c4.
How can I get this matrix in Julia? I could probably do this manually, but if I had a large number of regressors and interactions, it might get tricky, so I’d like a systematic approach.

The reason I’m asking is that I eventually want to do counterfactuals for what would happen if I change c2 to 0 or to 1. You could throw this in as a bonus, but I think once I get the first part, I can figure that out.

I think you are looking for modelmatrix(sm)

1 Like

That doesn’t appear to be in the documentation. Just curious how you found it (not that important though)

My brain :slight_smile:

No idea when I found it.

It is in the StatsModels.jl docs. If something isn’t in GLM.jl docs you should look there.

Why do you need the matrix though if you want to do counterfactuals? predict does this for you automatically:

julia> using DataFrames, GLM

julia> df = DataFrame(a = 1:10, b = 6:15);

julia> df.y = 0.5 * df.a .+ 0.8*df.b .+ df.a .* df.b;

julia> sm = lm(@formula(y ~ a*b), df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

y ~ 1 + a + b + a & b

             Coef.     Std. Error                  t  Pr(>|t|)  Lower 95%  Upper 95%
(Intercept)    0.0  NaN                       NaN       NaN         NaN        NaN
a              0.5    4.8306e-13    1035067731159.83    <1e-81        0.5        0.5
b              0.8    1.23567e-13   6474215637663.89    <1e-87        0.8        0.8
a & b          1.0    2.28609e-14  43742897935693.22    <1e-93        1.0        1.0

julia> round.(predict(sm, DataFrame(a = 1:10, b = 0)))
10-element Vector{Float64}: