ANN: RegressionTables.jl produces publication-quality regression tables

announcement

#1

Hi all,

I’ve been working on RegressionTables.jl, a package that produces regression tables as seen in scientific journals (similar to R’s stargazer and Stata’s esttab).

It currently works with output from the absolutely terrific FixedEffectModels.jl. Version 0.0.2, which also supports output from GLM.jl, is tagged and will hopefully go live tomorrow.

Quick demo:

using RegressionTables, DataFrames, FixedEffectModels, RDatasets

df = dataset("datasets", "iris")
df[:SpeciesDummy] = pool(df[:Species])

rr1 = reg(df, @model(SepalLength ~ SepalWidth   , fe = SpeciesDummy))
rr2 = reg(df, @model(SepalLength ~ SepalWidth + PetalLength   , fe = SpeciesDummy))
rr3 = reg(df, @model(SepalLength ~ SepalWidth + PetalLength + PetalWidth  , fe = SpeciesDummy))
rr4 = reg(df, @model(SepalWidth ~ SepalLength + PetalLength + PetalWidth  , fe = SpeciesDummy))

regtable(rr1,rr2,rr3,rr4; renderSettings = asciiOutput())

produces

----------------------------------------------------------
                         SepalLength            SepalWidth
               ------------------------------   ----------
                    (1)        (2)        (3)          (4)
----------------------------------------------------------
SepalWidth     0.804***   0.432***   0.496***             
                (0.106)    (0.081)    (0.086)             
PetalLength               0.776***   0.829***      -0.188*
                           (0.064)    (0.069)      (0.083)
PetalWidth                            -0.315*     0.626***
                                      (0.151)      (0.123)
SepalLength                                       0.378***
                                                   (0.066)
----------------------------------------------------------
SpeciesDummy        Yes        Yes        Yes          Yes
----------------------------------------------------------
Estimator           OLS        OLS        OLS          OLS
----------------------------------------------------------
N                   150        150        150          150
R2                0.726      0.863      0.867        0.635
----------------------------------------------------------

LaTeX output can be generated by using renderSettings = latexOutput(). See the readme for the full list of arguments and options.

Look forward to comments and feedback.


Markdown tables
#2

Nice!!
So I noticed you created your own rendering functions for the different outputs (ascii and latex) of the table. I’m not sure what the current situation is, but it would be great if we could centralize some kind of effort when it comes to converting texts, particularly tables, and printing them. Converting an array to markdown, latex, html, in the form of a table is very useful.
In your case it might not be so useful to decouple the frontend (formatting and printing the table) from the backend (parsing all the variables in the table from the regression model), but I thought your effort in those differently formatted outputs could be of use to a larger more generalized package.


#3

There’s some related discussion in the thread Pretty printing of tables.


#4

That’s really a useful package, thanks for writing it!

Regarding the rendering, the Julian approach is to implement various show methods for your RegressionTable type, one for each backend (plain text, LaTeX, Markdown, HTML, CSV…). That way you don’t need to define your own render method, nor require the user to explicitly pass the expected type of output via an argument (except of course when s/he wants to): it’s automatically chosen depending on the current interface, so it will look good in the REPL, IJulia, Atom, etc. For example, asciiOutput would be replaced with MIME"text/plain" and so on. Julia is much more powerful than R in that regard. See the manual for details.


`show` is too low-level
#5

Thanks for the work! I’m excited to try it out.

I agree with this a ton. Being able to output an arbitrary matrix to a latex table with multi-level column titles is a godsend for my job. R’s fragmented regression landscape is case in point. Writers of packages like Stargazer have to keep up to date with any new model objects that people produce. Rather, it would Julia ecosystem were centered around users making their own matrices in a matrix or data-frame object, then give them options for nice tables from there.

Of course, being able to pretty print a latex table like this is an incredible tool and is the very first step for any analysis. If the table isn’t made in the code, it doesn’t exist!


#6

Additionally, an idea that would make this package blow Stargazer out of the water would be the inclusion of a Dict as an optional argument, which would then perform a string replace on variable names.

The way I imagine a functionality might be

 dict = Dict{String,Integer}(SepalWidth=> "Sepal Width", "SepalLength" => "Sepal Length")
regtable(rr1,rr2,rr3,rr4; renderSettings = asciiOutput(), Labels = dict)

And could then print “Sepal Width” whenever it sees the string “SepalWidth”.

Could this happen in the render command that prints the table itself?


#7

Great! I have been using Stargazer for a long time, and it is really useful for academic writing.


#8

Actually, I’ve been thinking for some time that this should be handled at the root, by specifying “variable labels” on data frame columns, which would be passed to the models so that they can be used for printing. See this issue. That way you only need to specify the information once, and often survey datasets come with this kind of information already.


#9

As far as I understand, the concern of dataframes developers is that a) there is still a lot of core functionality that needs to be worked on and b) making it work as well as Stata would require a lot of changes for plotting packages and tables packages. It would have to become a huge standard for any package that interfaces with DataFrames.

However I think column metadata is crucial for reproducibility. It’s important for the names of rows and columns in tables to match the code itself, and for a user to be able to look at column metadata to find out the original source of variables.


#10

@genauguy I don’t have access to julia right now, but the following should work:

using RegressionTables, DataFrames, FixedEffectModels, RDatasets

df = dataset("datasets", "iris")
df[:SpeciesDummy] = pool(df[:Species])

rr1 = reg(df, @model(SepalLength ~ SepalWidth   , fe = SpeciesDummy))
rr2 = reg(df, @model(SepalLength ~ SepalWidth + PetalLength   , fe = SpeciesDummy))
rr3 = reg(df, @model(SepalLength ~ SepalWidth + PetalLength + PetalWidth  , fe = SpeciesDummy))
rr4 = reg(df, @model(SepalWidth ~ SepalLength + PetalLength + PetalWidth  , fe = SpeciesDummy))

regtable(rr1,rr2,rr3,rr4; renderSettings = asciiOutput(), labels = Dict("SepalLength" => "My dependent variable: SepalLength", "PetalLength" => "Length of Petal", "PetalWidth" => "Width of Petal", "(Intercept)" => "Const." , "isSmall" => "isSmall Dummies", "SpeciesDummy" => "Species Dummies"))

#11

Fantastic! I didn’t notice that feature at first. This is really great.


#12

@nalimilan I see your point. On the other hand, I feel that some of the backends like LaTeX will most likely not be called from an interface that requests LaTeX output. I could have used something like

tab = regtable(rr1,rr2)
asLaTeX(tab)

but decided against it to have everything in one line (I don’t imagine users to do a lot of other stuff with RegressionTables other than rendering).


#13

I received this error message when tried to install your package

unknown package RegressionTables
macro expansion at .\pkg\entry.jl:53 [inlined]
(::Base.Pkg.Entry.##1#3{String,Base.Pkg.Types.VersionSet})() at .\task.jl:335
sync_end() at task.jl:287
macro expansion at task.jl:303 [inlined]
add(::String, ::Base.Pkg.Types.VersionSet) at entry.jl:51
(::Base.Pkg.Dir.##4#7{Array{Any,1},Base.Pkg.Entry.#add,Tuple{String}})() at dir.jl:36
cd(::Base.Pkg.Dir.##4#7{Array{Any,1},Base.Pkg.Entry.#add,Tuple{String}}, ::String) at file.jl:59
#cd#1(::Array{Any,1}, ::Function, ::Function, ::String, ::Vararg{String,N} where N) at dir.jl:36
add(::String) at pkg.jl:117
include_string(::String, ::String) at loading.jl:522
eval(::Module, ::Any) at boot.jl:235
(::Atom.##61#64)() at eval.jl:102
withpath(::Atom.##61#64, ::Void) at utils.jl:30
withpath(::Function, ::Void) at eval.jl:38
macro expansion at eval.jl:101 [inlined]
(::Atom.##60#63{Dict{String,Any}})() at task.jl:80

#14

Have you run Pkg.update() recently @Yifan_Liu ?


#15

That’s not incompatible. You could return a RegressionTable object by default, which is printed using the show method using the most appropriate format for the current display, but still allow choosing a different output via an argument, in which case you’d just print the requested representation directly.

EDIT: To give more context, a situation where automatically choosing the default output format is really useful is IJulia notebooks. When I use stargazer in RStudio notebooks, I find it annoying that I need to specify whether to output HTML or LaTeX, and if I get it wrong instead of a nice table I get a wall of text. Then if you decide to compile your notebook to HTML rather than to LaTeX, you need to change the code. That doesn’t sound like a correctly designed system.


#16

After updating all packages, I was able to install the package, but could not load it and got the error message:

Failed to precompile RegressionTables to 
C:\Users\user\.julia`Preformatted text`\lib\v0.6\RegressionTables.ji.
compilecache(::String) at loading.jl:710
_require(::Symbol) at loading.jl:497
require(::Symbol) at loading.jl:405
include_string(::String, ::String) at loading.jl:522
eval(::Module, ::Any) at boot.jl:235
(::Atom.##61#64)() at eval.jl:102
withpath(::Atom.##61#64, ::Void) at utils.jl:30
withpath(::Function, ::Void) at eval.jl:38
macro expansion at eval.jl:101 [inlined]
(::Atom.##60#63{Dict{String,Any}})() at task.jl:80

#17

@nalimilan Fair enough. I can put that into the next version.


#18

Thanks a lot for writing the package! I have one question: Is it possible to choose to a different set of standard errors to show? For many cases the standard homoskedastic standard errors from a fit(LinearModel,…) regression using GLM are not appropriate and instead I would like to use HC or HAC standard errors. Is there a way to have them included automatically in your output?


#19

@IljaK91 regtable() prints the square root of the diagonal of vcov(dfrm::DataFrameRegressionModel) (in case of GLM.jl’s output), or the vcov field of a AbstractRegressionResult (if you use FixedEffectModels.jl). As long as you pass the adjusted vcov matrix in these objects to regtable(), they should print correctly.

Are you adjusting the standard errors yourself, or are you using a package to do it for you? If so, it would be great to support it.


#20

It would be great if https://github.com/gragusa/CovarianceMatrices.jl was supported directly. I use that package to get robust standard errors. There should be also a bit of information at the bottom of the table about which standard errors were used, (HC, HAC, and which type).

I don’t know if GLM.jl supports somehow directly providing the adjusted SEs, in which case this request would be obsolete. So far run regressions with GLM.fit() and adjust the SEs afterwards using CovarianceMatrices.jl. This is a bit tiresome, if you have many regressions.