Hi @mthelm85, delighted that you have found Econometrics.jl useful ![]()
There are different levels of how in depth to answer that question but I will try to answer it to a level which which should help you understand how to use and interpret it. The absorb terms gives the same parameter estimates to the (non-intercept) features as if you had included those as “fixed-effects” (econ talk) which is the same as the dummy variable model where you have an indicator for each value.
So, what’s the difference between including the features as indicators or absorbing those? The first difference is that if you include them as a normal feature you would get a parameter estimate for each (technically you can add some extra code to recover those afterwards). It is an application of the Frisch Waugh-Lovell Theorem. However, in some instances you should not actually believe in those estimates due to the curse of dimensionality. In other words, those can’t be consistently estimated because by increasing your sample size, you would be increasing the number of parameters. For example. you include a household indicator, but in order to get more observations, you would need to include more households and thus more household parameters. Under that case, the estimator would not yield an estimator of the partial effect (i.e., does not converge in probability as the sample size increases). The other case would be when there are many parameters for that dimension making the estimation procedure unfeasible (e.g., would have to operate on a huge matrix). Absorbing those features is a very efficient way of estimating the model without having to create and operate with those huge matrices.
The explanation thus far has been about the point estimates. Those ideas about how the covariates are included depending on how the sample increase also affect issues such as the second moment (e.g., variance-covariance matrix / standard errors / confidence intervals). However, that analysis depends on the sampling design assume and is not something confined to the estimation per se. As @pdeffebach mentioned, Sergio’s work goes into many of those considerations including how to properly account for singletons for the degree of freedom corrections for inference and other issues. Additional considerations come into selecting cluster-robust covariance estimators of different kinds (see A Practitioner’s Guide to Cluster-Robust Inference)… Still there have been some other work by Athey, Imbens, Jeff, and others with a series of papers on other considerations.
Mathieu’s https://github.com/FixedEffects/FixedEffectModels.jl is another package that implements efficient estimation of high-dimensional categorical variables (e.g., the backend for that one uses a different approach than the more basic method of alternating projections but similar functionality in case of the panel data within estimator). The within-estimator is also a common name for that class and is also explained in detailed in 10.18637/jss.v027.i02 (I found the descriptions in the plm article very informative when I was first developing the package).
Hope that helps!