ANN: RegressionTables.jl produces publication-quality regression tables

announcement

#21

The work in progress is the following:

  • A separate Regression / Econometrics environment that builds on the IO (CSV/Feather) + DataFrames + StatsBase + StatsModels + GLM
  • It will have a suite of various packages that provide more functionality
    • A utility package for various transformations and helpers: generalized within transformation (absorbs fixed effects and handles singletons), first-difference transformation, between estimator, two stage estimators, subset linear independent predictors, etc.
    • Intermediate package for computing the distance and kernel auto-tuners for correlation structures which will then be used to provide Sandwich estimators (multi-way clustering, SHAC, HAC, HC, etc.)
    • The covariance matrices package for sandwich estimators and bootstrapping
    • Regression Hypothesis Tests and Diagnostics: StatsBase will host Wald test, LR, and score tests. Hypothesis testing for various tests will construct the according hypothesis test (Wald test, robust Hausman, etc.)

DataFrames / StatsBase / StatsModels / GLM have been updated so now is a matter of unifying the various packages and finish the implementation of the missing features.

A few comments: in the future DataFrameRegressionModel will probably be depreciated in favor of inheritance from StatsBase.RegressionModel. Covariance matrices will be able to work with all RegressionModel rather than GLM.GeneralizedLinearModel with minimal effort.


#22

Could you give me more information about RegressionModel? Is that the default form of output that we hope to be standard for all regression code? As in the same syntax for getting the covariance matrix, coefficients, etc?


#23

See Abstraction for Statistical Models for details about the inheritance and supported methods. StatsBase provides a hierarchy to inherit such that regression models in any package can be defined as

mutable struct MyRegressionModel <: StatsBase.RegressionModel
    ...
end

Therefore, for any <:RegressionModel struct if one can extract the variance covariance estimate using vcov(model<:RegressionModel). Users will not interact with the covariance estimator directly for most applications. For example,

model = RegressionModel(StatsModel::Formula,
                        data::DataFrames.AbstractDataFrame,
                        options = @options(vce = HC3))

after fitting the model with

StatsBase.fit!(model)

or making a call that triggers fit! (e.g., RegressionModel or coeftable), one can extract the variance covariance with vcov(model). For multiway clustering the clusters will be able to be specified in the Formula à la

formula = @formula(response ~ exogenous +
                  (endogenous ~ instruments) +
                  (absorb = fixedeffect1 + fixedeffect2) +
                  (cluster = PID + TID))
model = RegressionModel(formula, data,
                        options = @options(vce = HC1))

For HAC estimators, there will be a suite that will compute distance metrics based on temporal distance (periods, Date, DateTime) and spatial (based on coordinates, datum, distance metric, etc.). Given the distance matrices and selected options the kernels will be auto-tuned or use a provided kernel and the multidimensional weights will be mapped to a single correlation structure to be passed to the CovarianceMatrices package.

That’s the main idea, but certain aspects might be modified from now until release. What’s your opinion on the approach?


#24

That’s great. Thanks for all this work! So much better than R, as long as we all work to maintain that standard for new estimation packages.


#25

As for estimators, most if not all are transformations to the linear predictor and the response. For first-difference, the implementation matches that of Stata (dropping gaps, and smart frequency identification), but allows to handle categorical variables by not differentiating these. The between estimator will be available for both PID and TID (Stata can be “hacked” for TID by specifying xtset TID). The fixed effects work à la reghdfe which is better than the standard xtreg, areg or xtivregress. The random effects currently supports the GLS harmonic mean estimator (same formula for 2SLS which matches Stata’s xtreg default, but is slightly different for xtivregress). The variance covariance estimators use the same finite sample adjustment as reghdfe for most cases. For all models, linear dependent predictors are dropped. GLM will depend on GLM.jl so all benefits to making GLM robust will be spread (e.g., finally getting the marginal effects worked out, etc.)


#26

@IljaK91 That package looks great. At the same time, I feel that the updated vcov matrix should go into the DataFrameRegressionModel (or StatsBase.RegressionModel in the future).


#27

@Nosferican It would be great if your package could use FixedEffectModels.jl for high dimensional fixed effects in linear models. IMHO this is where Julia has an edge over other econometrics packages (as well as that it’s much easier to write nonlinear estimation code using JuMP, but it might be hard to pack that into your package – this here is an example).


#28

For the variance covariance a common practice would be

struct mutable MyModel <: RegressionModel
    ...
    vcov::Hermitian{Float64}
    ...
end
function fit!(model::MyModel)
    ...
    setfield!(model, :vcov, CovarianceMatrices.vcov(MyModel))
    ....
end
vcov(model::MyModel) = getfield(model, :vcov)

As for absorption, the generalized within transformation uses the Method of Alternating Projections similar to R’s fle, reghdfe, etc. lsmr is currently broken in Julia 0.7-Dev, but other methods could be added eventually. The implementation is here and is quite fast.

Something I am pushing hard for is to have utilities in different packages such that they can be used by other packages very easily (dependent only in sharing basic inheritance). Eventually FixedEffectsModels would be able to just call that utility rather than having to maintain it in-house. Any further development will propagate across all implementations and less code duplication.


#29

Just to clarify, is it possible to change the vcov matrix in DataFrameRegressionModel to the one produced by CovarianceMatrices.jl right now?


#30

Would just have to update the package and minor declarations. No big issue. However, it would only work for DataFrameRegressionModel which is an undesirable feature. For a quick solution, why not wrap the model fitting and overwrite the vcov in a wraper?


#31

If I knew how to do that, I’d be a happy man :grin:


#32

Thank you for your suggestion. Maybe it is not a very advanced thing to do, but I don’t have any idea how to do this.

Let’s say I have an adjusted vcov from CovarianceMatrices.jl also a DataFrameRegressionModel from GLM.jl. How do I pass the adjusted vcov matrix to regtable() then? An explanation involving FixedEffectModels.jl would be just as good.

I am working on an empirical paper right now and printing regression tables automatically would save be a considerable amount of time and also decrease the likelihood of errors on my behalf.

Thanks a lot!


#33

Just create a new RegressionResultFE with your adjusted VCov matrix, e.g.

using RegressionTables, DataFrames, FixedEffectModels, RDatasets

df = dataset("datasets", "iris")
df[:SpeciesDummy] = pool(df[:Species])

rr1 = reg(df, @model(SepalLength ~ SepalWidth   , fe = SpeciesDummy))

mynewvcovmatrix = Array{Float64,2}(1,1)
mynewvcovmatrix[1,1]=1.0

myrr = RegressionResultFE(rr1.coef, mynewvcovmatrix, rr1.esample, rr1.augmentdf, rr1.coefnames, rr1.yname, rr1.formula, rr1.feformula, rr1.nobs, rr1.df_residual, rr1.r2, rr1.r2_a, rr1.r2_within, rr1.F, rr1.p, rr1.iterations, rr1.converged)

regtable(myrr)

or similarly with whatever result struct you’re using.


#34

I am sorry that I am bothering you again with this, but it looks like CovarianceMatrices.jl is not working with the regression results of RegressionResultFE.

My code is:

using RegressionTables, DataFrames, FixedEffectModels, RDatasets, CovarianceMatrices

df = dataset("datasets", "iris")
df[:SpeciesDummy] = pool(df[:Species])

rr1 = reg(df, @model(SepalLength ~ SepalWidth   , fe = SpeciesDummy))

mynewvcovmatrix = vcov(rr1, QuadraticSpectralKernel(NeweyWest))

myrr = RegressionResultFE(rr1.coef, mynewvcovmatrix, rr1.esample, rr1.augmentdf, rr1.coefnames, rr1.yname, rr1.formula, rr1.feformula, rr1.nobs, rr1.df_residual, rr1.r2, rr1.r2_a, rr1.r2_within, rr1.F, rr1.p, rr1.iterations, rr1.converged)

regtable(myrr)

The error message I get is


MethodError: no method matching vcov(::FixedEffectModels.RegressionResultFE, ::CovarianceMatrices.QuadraticSpectralKernel{CovarianceMatrices.Optimal{CovarianceMatrices.NeweyWest},CovarianceMatrices.#k_qs})
Closest candidates are:
  vcov(!Matched::DataFrames.DataFrameRegressionModel, ::CovarianceMatrices.HAC{CovarianceMatrices.Optimal{T<:CovarianceMatrices.OptimalBandwidth}}; args...) where T<:CovarianceMatrices.OptimalBandwidth at C:\Users\Ilja\.julia\v0.6\CovarianceMatrices\src\HAC.jl:235
  vcov(!Matched::Union{DataFrames.DataFrameRegressionModel, DataFrames.DataFrameStatisticalModel}, ::Any...; kwargs...) at C:\Users\Ilja\.julia\v0.6\DataFrames\src\statsmodels\statsmodel.jl:27
  vcov(::FixedEffectModels.AbstractRegressionResult) at C:\Users\Ilja\.julia\v0.6\FixedEffectModels\src\RegressionResult.jl:13
  ...

#35

Estimate your model and construct your adjusted covariance matrix in any way that works for you. Then fill it into a structure that is supported by RegressionTables, i.e. the structures from FixedEffectModels.jl (which you can find in src/RegressionResult.jl) or GLM.jl. In my example above the structure is RegressionResultFE, but there are several others.

PM or email me if it’s still not clear.