ANN: RegressionTables.jl produces publication-quality regression tables

announcement

#14

Have you run Pkg.update() recently @Yifan_Liu ?


#15

That’s not incompatible. You could return a RegressionTable object by default, which is printed using the show method using the most appropriate format for the current display, but still allow choosing a different output via an argument, in which case you’d just print the requested representation directly.

EDIT: To give more context, a situation where automatically choosing the default output format is really useful is IJulia notebooks. When I use stargazer in RStudio notebooks, I find it annoying that I need to specify whether to output HTML or LaTeX, and if I get it wrong instead of a nice table I get a wall of text. Then if you decide to compile your notebook to HTML rather than to LaTeX, you need to change the code. That doesn’t sound like a correctly designed system.


#16

After updating all packages, I was able to install the package, but could not load it and got the error message:

Failed to precompile RegressionTables to 
C:\Users\user\.julia`Preformatted text`\lib\v0.6\RegressionTables.ji.
compilecache(::String) at loading.jl:710
_require(::Symbol) at loading.jl:497
require(::Symbol) at loading.jl:405
include_string(::String, ::String) at loading.jl:522
eval(::Module, ::Any) at boot.jl:235
(::Atom.##61#64)() at eval.jl:102
withpath(::Atom.##61#64, ::Void) at utils.jl:30
withpath(::Function, ::Void) at eval.jl:38
macro expansion at eval.jl:101 [inlined]
(::Atom.##60#63{Dict{String,Any}})() at task.jl:80

#17

@nalimilan Fair enough. I can put that into the next version.


#18

Thanks a lot for writing the package! I have one question: Is it possible to choose to a different set of standard errors to show? For many cases the standard homoskedastic standard errors from a fit(LinearModel,…) regression using GLM are not appropriate and instead I would like to use HC or HAC standard errors. Is there a way to have them included automatically in your output?


#19

@IljaK91 regtable() prints the square root of the diagonal of vcov(dfrm::DataFrameRegressionModel) (in case of GLM.jl’s output), or the vcov field of a AbstractRegressionResult (if you use FixedEffectModels.jl). As long as you pass the adjusted vcov matrix in these objects to regtable(), they should print correctly.

Are you adjusting the standard errors yourself, or are you using a package to do it for you? If so, it would be great to support it.


#20

It would be great if https://github.com/gragusa/CovarianceMatrices.jl was supported directly. I use that package to get robust standard errors. There should be also a bit of information at the bottom of the table about which standard errors were used, (HC, HAC, and which type).

I don’t know if GLM.jl supports somehow directly providing the adjusted SEs, in which case this request would be obsolete. So far run regressions with GLM.fit() and adjust the SEs afterwards using CovarianceMatrices.jl. This is a bit tiresome, if you have many regressions.


#21

The work in progress is the following:

  • A separate Regression / Econometrics environment that builds on the IO (CSV/Feather) + DataFrames + StatsBase + StatsModels + GLM
  • It will have a suite of various packages that provide more functionality
    • A utility package for various transformations and helpers: generalized within transformation (absorbs fixed effects and handles singletons), first-difference transformation, between estimator, two stage estimators, subset linear independent predictors, etc.
    • Intermediate package for computing the distance and kernel auto-tuners for correlation structures which will then be used to provide Sandwich estimators (multi-way clustering, SHAC, HAC, HC, etc.)
    • The covariance matrices package for sandwich estimators and bootstrapping
    • Regression Hypothesis Tests and Diagnostics: StatsBase will host Wald test, LR, and score tests. Hypothesis testing for various tests will construct the according hypothesis test (Wald test, robust Hausman, etc.)

DataFrames / StatsBase / StatsModels / GLM have been updated so now is a matter of unifying the various packages and finish the implementation of the missing features.

A few comments: in the future DataFrameRegressionModel will probably be depreciated in favor of inheritance from StatsBase.RegressionModel. Covariance matrices will be able to work with all RegressionModel rather than GLM.GeneralizedLinearModel with minimal effort.


#22

Could you give me more information about RegressionModel? Is that the default form of output that we hope to be standard for all regression code? As in the same syntax for getting the covariance matrix, coefficients, etc?


#23

See Abstraction for Statistical Models for details about the inheritance and supported methods. StatsBase provides a hierarchy to inherit such that regression models in any package can be defined as

mutable struct MyRegressionModel <: StatsBase.RegressionModel
    ...
end

Therefore, for any <:RegressionModel struct if one can extract the variance covariance estimate using vcov(model<:RegressionModel). Users will not interact with the covariance estimator directly for most applications. For example,

model = RegressionModel(StatsModel::Formula,
                        data::DataFrames.AbstractDataFrame,
                        options = @options(vce = HC3))

after fitting the model with

StatsBase.fit!(model)

or making a call that triggers fit! (e.g., RegressionModel or coeftable), one can extract the variance covariance with vcov(model). For multiway clustering the clusters will be able to be specified in the Formula à la

formula = @formula(response ~ exogenous +
                  (endogenous ~ instruments) +
                  (absorb = fixedeffect1 + fixedeffect2) +
                  (cluster = PID + TID))
model = RegressionModel(formula, data,
                        options = @options(vce = HC1))

For HAC estimators, there will be a suite that will compute distance metrics based on temporal distance (periods, Date, DateTime) and spatial (based on coordinates, datum, distance metric, etc.). Given the distance matrices and selected options the kernels will be auto-tuned or use a provided kernel and the multidimensional weights will be mapped to a single correlation structure to be passed to the CovarianceMatrices package.

That’s the main idea, but certain aspects might be modified from now until release. What’s your opinion on the approach?


#24

That’s great. Thanks for all this work! So much better than R, as long as we all work to maintain that standard for new estimation packages.


#25

As for estimators, most if not all are transformations to the linear predictor and the response. For first-difference, the implementation matches that of Stata (dropping gaps, and smart frequency identification), but allows to handle categorical variables by not differentiating these. The between estimator will be available for both PID and TID (Stata can be “hacked” for TID by specifying xtset TID). The fixed effects work à la reghdfe which is better than the standard xtreg, areg or xtivregress. The random effects currently supports the GLS harmonic mean estimator (same formula for 2SLS which matches Stata’s xtreg default, but is slightly different for xtivregress). The variance covariance estimators use the same finite sample adjustment as reghdfe for most cases. For all models, linear dependent predictors are dropped. GLM will depend on GLM.jl so all benefits to making GLM robust will be spread (e.g., finally getting the marginal effects worked out, etc.)


#26

@IljaK91 That package looks great. At the same time, I feel that the updated vcov matrix should go into the DataFrameRegressionModel (or StatsBase.RegressionModel in the future).


#27

@Nosferican It would be great if your package could use FixedEffectModels.jl for high dimensional fixed effects in linear models. IMHO this is where Julia has an edge over other econometrics packages (as well as that it’s much easier to write nonlinear estimation code using JuMP, but it might be hard to pack that into your package – this here is an example).


#28

For the variance covariance a common practice would be

struct mutable MyModel <: RegressionModel
    ...
    vcov::Hermitian{Float64}
    ...
end
function fit!(model::MyModel)
    ...
    setfield!(model, :vcov, CovarianceMatrices.vcov(MyModel))
    ....
end
vcov(model::MyModel) = getfield(model, :vcov)

As for absorption, the generalized within transformation uses the Method of Alternating Projections similar to R’s fle, reghdfe, etc. lsmr is currently broken in Julia 0.7-Dev, but other methods could be added eventually. The implementation is here and is quite fast.

Something I am pushing hard for is to have utilities in different packages such that they can be used by other packages very easily (dependent only in sharing basic inheritance). Eventually FixedEffectsModels would be able to just call that utility rather than having to maintain it in-house. Any further development will propagate across all implementations and less code duplication.


#29

Just to clarify, is it possible to change the vcov matrix in DataFrameRegressionModel to the one produced by CovarianceMatrices.jl right now?


#30

Would just have to update the package and minor declarations. No big issue. However, it would only work for DataFrameRegressionModel which is an undesirable feature. For a quick solution, why not wrap the model fitting and overwrite the vcov in a wraper?


#31

If I knew how to do that, I’d be a happy man :grin:


#32

Thank you for your suggestion. Maybe it is not a very advanced thing to do, but I don’t have any idea how to do this.

Let’s say I have an adjusted vcov from CovarianceMatrices.jl also a DataFrameRegressionModel from GLM.jl. How do I pass the adjusted vcov matrix to regtable() then? An explanation involving FixedEffectModels.jl would be just as good.

I am working on an empirical paper right now and printing regression tables automatically would save be a considerable amount of time and also decrease the likelihood of errors on my behalf.

Thanks a lot!


#33

Just create a new RegressionResultFE with your adjusted VCov matrix, e.g.

using RegressionTables, DataFrames, FixedEffectModels, RDatasets

df = dataset("datasets", "iris")
df[:SpeciesDummy] = pool(df[:Species])

rr1 = reg(df, @model(SepalLength ~ SepalWidth   , fe = SpeciesDummy))

mynewvcovmatrix = Array{Float64,2}(1,1)
mynewvcovmatrix[1,1]=1.0

myrr = RegressionResultFE(rr1.coef, mynewvcovmatrix, rr1.esample, rr1.augmentdf, rr1.coefnames, rr1.yname, rr1.formula, rr1.feformula, rr1.nobs, rr1.df_residual, rr1.r2, rr1.r2_a, rr1.r2_within, rr1.F, rr1.p, rr1.iterations, rr1.converged)

regtable(myrr)

or similarly with whatever result struct you’re using.