Extract model coefficients with FixedEffectModels.jl package

Hi everyone!
In my regression model (Y ~ A:B), a numeric variable (A) interacts with a categorical variable (B). Since the categorical variable has a lot of unique levels, fitting the model using GLM.jl package consumes a lot of RAM. I used the FixedEffectModels.jl package and it looks much better! However, I have difficulties in extracting model coefficients such as name of the effects, estimates, stderr, p_value, t_value, etc. and also with residuals and predict. I read the documentation here and here, but I didnโ€™t see the functions I needed :confused:
When building the model, I selected the option save = true, which saves the residuals and fixed effects estimates. When I extract fe(model), I get a table like this:

julia> fe(model)
444378ร—1 DataFrame
โ”‚ Row    โ”‚ fe_B&A โ”‚
โ”‚        โ”‚ Float64                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 1      โ”‚ -1.47252               โ”‚
โ”‚ 2      โ”‚ -1.47252               โ”‚
โ‹ฎ
โ”‚ 444376 โ”‚ -0.610356              โ”‚
โ”‚ 444377 โ”‚ -0.610356              โ”‚
โ”‚ 444378 โ”‚ -0.610356              โ”‚

How can I also get names of the effects to match them with the fixed effects estimates? (because, I see only row numbers)
Also, how can I get other coefficients (stderr, p_value, t_value, etc)?

Thanks a lot for your help!

1 Like

All of the functions here should be implemented: https://juliastats.org/StatsBase.jl/stable/statmodels/

2 Likes

I also wasnโ€™t able to find the FE names.
I donโ€™t think itโ€™s currently possible.

@matthieu?

Ps: When you have High dimensional FE, the FEs become hard to interpret anyway, users rarely try.

1 Like

Thanks for the advice!
I have already tried most of these functions, but did not get the acceptable results.
For example:

  1. coefnames()
julia> coefnames(model)
1-element Array{String,1}:
 "(Intercept)"

but I need names of fixed effects too (not Intercept only).

  1. coef()
 julia> coef(model)
1-element Array{Float64,1}:
 2.821248194859784

does not return coefficients for fixed effects.

  1. stderror()
julia> stderror(model)
1-element Array{Float64,1}:
 0.027292973948148374

returns result for intercept only (fixed effects needed as well).

  1. residuals()
julia> residuals(model)
444378-element Array{Union{Missing, Float64},1}:
 -0.4949565332014614
 -0.38919178650364045
 -0.4186431448593252
 -0.2590869635253943
 -0.17627291925168315
  โ‹ฎ
 -0.34583852183630376
 -1.5652955707355511
 -1.1542833700855635
 -2.1144655963560104

how can I match values โ€‹โ€‹with coefficient names?

  1. predict()
julia> predict(fit)
ERROR: predict is not defined for FixedEffectModel.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] predict(::FixedEffectModel) at /home/antonina_kliuieva/.julia/packages/StatsBase/EA8Mh/src/statmodels.jl:368
 [3] top-level scope at REPL[19]:1

have error :frowning:
etc.

So, my question is still openโ€ฆ
Sorry, Iโ€™m new to Julia, maybe Iโ€™m missing something :confused:

1 Like

No, you are not missing anything unfortunately. There is an issue here filed last week to make prediction easier with fixed effects.

With regards to the values of the fixed effects, I think that the only solution is to

  1. hcat the vector of intercepts with your data
  2. Keep just the columns of interest, i.e. B and A
  3. Call unique on the data frame
  4. Work with the results of that to match combinations of :B and :A to intercepts

It would be nice if obtaining the value of the intercepts were as easy as in reghdfe in Stata or fixest in R. Hopefully we can build more UX polish into the package in the future.

Ummm, the package created to evaluate high-dimensional fixed effect variables doesnโ€™t provide an easy way to extract estimates for these fixed effects?
Maybe you know - the order of fe definition in fe(model) DataFrame is the same as the original data set? ะกan I expect, for example, that estimate -1.47252 (first row from the dataframe below) corresponds to effect that is in the first row of the input data?
That is, can I join the fe dataframe with input data by Row?

julia> fe(model)
444378ร—1 DataFrame
โ”‚ Row    โ”‚ fe_B&A โ”‚
โ”‚        โ”‚ Float64                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 1      โ”‚ -1.47252               โ”‚
โ”‚ 2      โ”‚ -1.47252               โ”‚
โ‹ฎ
โ”‚ 444376 โ”‚ -0.610356              โ”‚
โ”‚ 444377 โ”‚ -0.610356              โ”‚
โ”‚ 444378 โ”‚ -0.610356              โ”‚

Thank you!

There should be a โ€œsave=trueโ€ option. Add that at the end of the regression function and then you should be able to do everything. Something like this:

reg_res = reg(df, @formula(X ~ Y + Z + fe(K) ),save=true);
residuals(reg_res)

I have the save=true option enabled, but most functions donโ€™t give the same result as GLM (please, see my reply to @pdeffebach above).

For example, I get an array like this:

julia> residuals(model)
444378-element Array{Union{Missing, Float64},1}:
 -0.4949565332014614
 -0.38919178650364045
 -0.4186431448593252
 -0.2590869635253943
 -0.17627291925168315
  โ‹ฎ
 -0.34583852183630376
 -1.5652955707355511
 -1.1542833700855635
 -2.1144655963560104

How can I match each value with the name of effect for which it is calculated?

Yes.
Stata xtreg reports estimates & stats for fe
STATA reghdfe does not for โ€œabsorbed FEโ€

yes, in general economists donโ€™t care about the values of the fixed effects. We just want to use โ€œwithin-unit variationโ€, hence the lack of emphasis on analyzing these fixed effects.

yes, just use hcat(df, fe(model)). That will work.

1 Like

Thanks a lot @pdeffebach, Iโ€™ll try this.
If I understand correctly, at the moment there is no way to obtain stderr, t_value, p_value for A:fe(B)?

No. My understanding is that this is why it converges so fast. By not calculating (or even materializing) the FE, you can estimate things faster and with less memory.

Ok, thank you!

One more question (maybe someone knows) - will this issue (with predict function) be taken into account in the near future? I just would like to understand whether it is value to wait or is it better to focus on another package.
Thanks!

You could extend my hack in the issue to take account of the interactions in your model :smiley: (you might have noticed if you tried that why I put in the issue comment breaks down if there are interactions in the model)

Overall it might be helpful to understand a bit better what you are trying to achieve - as Peter says, fixed effects models are heavily used by economists (and the author of FixedEffectsModels is an economist as well) who mostly use them to get closer causal identification of marginal effects (by controlling for unobserved, time-invariant heterogeneity), and because of that donโ€™t actually care about the fixed effects themselves. It appears that your use case is different, so a different approach might be warranted.

2 Likes