Regression implementation using FixedEffectModels.jl, InteractiveFixedEffectModels.jl, GLFixedEffectModels.jl packages

Hi guys,
I need to implement a regression like
Y ~ A : B : C
or
Y ~ A : B_C

where main effects are missing, in interactions A - numeric variable, B and C - categorical variables (factors)) or in the second case - variable B_C - new categorical variable which is a combination of B and C. Intercept must be present.

Can someone help me, please, implement these formulas using following packages:

  • FixedEffectModels.jl,
  • InteractiveFixedEffectModels.jl,
  • GLFixedEffectModels.jl?

I’ve tried different syntax and options, but I keep getting different errors, for example
ERROR: MethodError: no method matching getindex (:: DataFrame, :: Expr) for FixedEffectModels.jl package.

Thank you so much!!

It’s hard to see where your error is coming from without an MWE - are you actually using the colon : operator to constuct your formulas (which is where the Expr error might come from?

I think you might be looking for something like this:

using DataFrames, FixedEffectModels

data = DataFrame(Y = rand(100), A = rand(100), B = rand('a':'z', 100), C = rand('a':'z', 100))

data[!, :B_C] = data.B .* data.C

reg(data, @formula(Y ~ A&fe(B_C)))

I haven’t used InteractiveFixedEffectModels before, but the readme suggests that you can do something like

ife(B, C, 2)

instead of creating the combination of B and C manually.

3 Likes

Hi @nilshg, thanks a lot for the help!
When I try your example, I have following error:

julia> data[!, :B_C] = data.B .* data.C
ERROR: MethodError: no method matching setindex!(::DataFrame, ::Array{String,1}, ::typeof(!), ::Symbol)
Closest candidates are:
  setindex!(::DataFrame, ::AbstractArray{T,1} where T, ::AbstractArray{Bool,1}, ::Union{Signed, Symbol, Unsigned}) at /home/antonina_kliuieva/.julia/packages/DataFrames/0Em9Q/src/dataframe/dataframe.jl:561
  setindex!(::DataFrame, ::AbstractArray{T,1} where T, ::AbstractArray{#s68,1} where #s68<:Real, ::Union{Signed, Symbol, Unsigned}) at /home/antonina_kliuieva/.julia/packages/DataFrames/0Em9Q/src/dataframe/dataframe.jl:567
  setindex!(::DataFrame, ::Any, ::Colon, ::Any) at /home/antonina_kliuieva/.julia/packages/DataFrames/0Em9Q/src/dataframe/dataframe.jl:692
  ...
Stacktrace:
 [1] top-level scope at none:0

Which version of DataFrames are you on? This shouldn’t error.

DataFrames v0.18.4

Okay that’s quite out of date - current version is 0.21.8, so you might want to ]up

I tried to install a new version of the DataFrames, but it looks like there is some problems with dependencies:

(v1.1) pkg> add DataFrames@0.21.8
 Resolving package versions...
ERROR: Unsatisfiable requirements detected for package DataFrames [a93c6f00]:
 DataFrames [a93c6f00] log:
 β”œβ”€possible versions are: [0.11.7, 0.12.0, 0.13.0-0.13.1, 0.14.0-0.14.1, 0.15.0-0.15.2, 0.16.0, 0.17.0-0.17.1, 0.18.0-0.18.4, 0.19.0-0.19.4, 0.20.0-0.20.2, 0.21.0-0.21.8] or uninstalled
 β”œβ”€restricted to versions 0.21.8 by an explicit requirement, leaving only versions 0.21.8
 └─restricted by compatibility requirements with RCall [6f49c342] to versions: [0.19.0-0.19.4, 0.20.0-0.20.2] β€” no versions left
   └─RCall [6f49c342] log:
     β”œβ”€possible versions are: [0.12.0-0.12.1, 0.13.0-0.13.9] or uninstalled
     └─restricted to versions 0.13.6 by an explicit requirement, leaving only versions 0.13.6

Same error when I just use ] up

RCall definitely supports DataFrames 0.21.8. Your problem seems to be that you are on Julia version 1.1, current release is 1.5.2, so all sorts of packages might be held back by that. Best to update Julia to 1.5.2 and then install DataFrames and the other packages you need in a fresh environment

2 Likes

Thanks a lot, I’ll try this!

@nilshg, thank you so much for the recommendations! :blush:
it looks like the problem was with outdated versions of Julia and packages. For now my code is working. Thanks a lot!!

Could you please also suggest me which functions should I use to extract the model coefficients (name of the effects, estimate, stderr, p_value, etc.) and also residuals and predict? (for the FixedEffectModels.jl package)
I’ll be very grateful for this, as I cannot find good documentation for this package:(

It’s mentioned in the Readme here.

I agree that the docs are a bit sparse, but that’s because it builds on functionality from other packages (e.g. the syntax for interaction effects and the @formula macro are from Statsmodels, so you might learn something from reading these docs as well.

1 Like

Great, thanks a lot! :blush:

In case you’re not absolutely committed to using FixedEffectsModels, this should be possible using β€œvanilla” GLM/statsmodels as well:

using DataFrames, GLM
data = DataFrame(a = rand(10), b = repeat('a':'b', inner=5), c = repeat('x':'y', outer=5))
data.y = 1 .+ data.a .+ (data.b .== 'b').*2 .+ (data.c .== 'y').*3 .+ randn(10)
lm(@formula(y ~ a & b & c), data)

should give you something like

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

y ~ 1 + a & b & c

Coefficients:
─────────────────────────────────────────────────────────────────────────────
                    Coef.  Std. Error      t  Pr(>|t|)   Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────────────────
(Intercept)       4.73999     0.64628   7.33    0.0007    3.07868    6.40131
a & b: a & c: x  -4.92912     1.16859  -4.22    0.0083   -7.93306   -1.92518
a & b: b & c: x  -4.72008     2.07218  -2.28    0.0717  -10.0468     0.606636
a & b: a & c: y  -1.39266     3.13547  -0.44    0.6755   -9.45266    6.66733
a & b: b & c: y   2.27275     1.21276   1.87    0.1198   -0.844756   5.39026
─────────────────────────────────────────────────────────────────────────────
2 Likes

Ah I see from your other post that you tried GLM already and ran into trouble with the number of levels in b and c, so ignore my suggestion :slight_smile:

@dave.f.kleinschmidt, thank you very much for your suggestion!
You are right, my initial model is already written in GLM.jl :wink:

What’s the difference between these three packages?

The clue is in the name - FixedEffectModels is for standard FE/IV models, InteractiveFixedEffectModels is for FE models with interactions, and GLFixedEffectModels is for general linear fixed effect models, e.g. logit models.

1 Like

With GLM you can also fit models with categorical variables if you code them as dummy. And it also accepts interactions and logit models.
Why do we need to use the packages named in this thread?

I think you’re alluding to the problem in your post - coding fixed effects as dummies quickly becomes computationally infeasible (as every level adds a column to your model matrix), so these packages implement fast and efficient estimators that allow you to estimate high dimensional fixed effects.

1 Like