Hi guys,
I need to implement a regression like Y ~ A : B : C
or Y ~ A : B_C
where main effects are missing, in interactions A - numeric variable, B and C - categorical variables (factors)) or in the second case - variable B_C - new categorical variable which is a combination of B and C. Intercept must be present.
Can someone help me, please, implement these formulas using following packages:
FixedEffectModels.jl,
InteractiveFixedEffectModels.jl,
GLFixedEffectModels.jl?
Iβve tried different syntax and options, but I keep getting different errors, for example ERROR: MethodError: no method matching getindex (:: DataFrame, :: Expr) for FixedEffectModels.jl package.
Itβs hard to see where your error is coming from without an MWE - are you actually using the colon : operator to constuct your formulas (which is where the Expr error might come from?
I think you might be looking for something like this:
using DataFrames, FixedEffectModels
data = DataFrame(Y = rand(100), A = rand(100), B = rand('a':'z', 100), C = rand('a':'z', 100))
data[!, :B_C] = data.B .* data.C
reg(data, @formula(Y ~ A&fe(B_C)))
I havenβt used InteractiveFixedEffectModels before, but the readme suggests that you can do something like
ife(B, C, 2)
instead of creating the combination of B and C manually.
RCall definitely supports DataFrames 0.21.8. Your problem seems to be that you are on Julia version 1.1, current release is 1.5.2, so all sorts of packages might be held back by that. Best to update Julia to 1.5.2 and then install DataFrames and the other packages you need in a fresh environment
@nilshg, thank you so much for the recommendations!
it looks like the problem was with outdated versions of Julia and packages. For now my code is working. Thanks a lot!!
Could you please also suggest me which functions should I use to extract the model coefficients (name of the effects, estimate, stderr, p_value, etc.) and also residuals and predict? (for the FixedEffectModels.jl package)
Iβll be very grateful for this, as I cannot find good documentation for this package:(
I agree that the docs are a bit sparse, but thatβs because it builds on functionality from other packages (e.g. the syntax for interaction effects and the @formula macro are from Statsmodels, so you might learn something from reading these docs as well.
In case youβre not absolutely committed to using FixedEffectsModels, this should be possible using βvanillaβ GLM/statsmodels as well:
using DataFrames, GLM
data = DataFrame(a = rand(10), b = repeat('a':'b', inner=5), c = repeat('x':'y', outer=5))
data.y = 1 .+ data.a .+ (data.b .== 'b').*2 .+ (data.c .== 'y').*3 .+ randn(10)
lm(@formula(y ~ a & b & c), data)
should give you something like
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
y ~ 1 + a & b & c
Coefficients:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
(Intercept) 4.73999 0.64628 7.33 0.0007 3.07868 6.40131
a & b: a & c: x -4.92912 1.16859 -4.22 0.0083 -7.93306 -1.92518
a & b: b & c: x -4.72008 2.07218 -2.28 0.0717 -10.0468 0.606636
a & b: a & c: y -1.39266 3.13547 -0.44 0.6755 -9.45266 6.66733
a & b: b & c: y 2.27275 1.21276 1.87 0.1198 -0.844756 5.39026
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The clue is in the name - FixedEffectModels is for standard FE/IV models, InteractiveFixedEffectModels is for FE models with interactions, and GLFixedEffectModels is for general linear fixed effect models, e.g. logit models.
With GLM you can also fit models with categorical variables if you code them as dummy. And it also accepts interactions and logit models.
Why do we need to use the packages named in this thread?
I think youβre alluding to the problem in your post - coding fixed effects as dummies quickly becomes computationally infeasible (as every level adds a column to your model matrix), so these packages implement fast and efficient estimators that allow you to estimate high dimensional fixed effects.