[ANN] TexTables.jl for Building LaTeX tables in Julia

announcement

#1

Hi Everyone,

I have been working recently on a new package for building and printing LaTeX tables with multi-level row and column indices. It’s still in Beta and unregistered (for now, although I hope to register it officially in the very near future), but I would really appreciate any feedback.

It is designed for building all sorts of statistical tables in a very modular fashion and for quickly displaying them in the REPL or exporting them to LaTeX. It’s quite extensible, and I think the most important use cases will be for people who want to make their own custom tables, but I have implemented support for some basic regression tables, cross-tabulations, and summary statistics as proof-of-concept:

For example, it makes it extremely easy to put together regression tables:

julia> using StatsModels, GLM
julia> df = dataset("datasets", "attitude");
julia> m1 = lm(@formula( Rating ~ 1 + Raises ), df);
julia> m2 = lm(@formula( Rating ~ 1 + Raises + Learning), df);
julia> m3 = lm(@formula( Rating ~ 1 + Raises + Learning + Privileges), df);
julia> m4 = lm(@formula( Rating ~ 1 + Raises + Learning + Privileges
                                    + Complaints), df);
julia> m5 = lm(@formula( Rating ~ 1 + Raises + Learning + Privileges
                                    + Complaints + Critical), df);
julia> table = regtable(m1, m2, m3, m4, m5)
            |   (1)    |   (2)    |   (3)    |   (4)    |   (5)
-------------------------------------------------------------------
(Intercept) |  19.978* |   15.809 |   14.167 |   11.834 |   11.011
            | (11.688) | (11.084) | (11.519) |  (8.535) | (11.704)
     Raises | 0.691*** |   0.379* |    0.352 |   -0.026 |   -0.033
            |  (0.179) |  (0.217) |  (0.224) |  (0.184) |  (0.202)
   Learning |          |  0.432** |   0.394* |    0.246 |    0.249
            |          |  (0.193) |  (0.204) |  (0.154) |  (0.160)
 Privileges |          |          |    0.105 |   -0.103 |   -0.104
            |          |          |  (0.168) |  (0.132) |  (0.135)
 Complaints |          |          |          | 0.691*** | 0.692***
            |          |          |          |  (0.146) |  (0.149)
   Critical |          |          |          |          |    0.015
            |          |          |          |          |  (0.147)
-------------------------------------------------------------------
          N |       30 |       30 |       30 |       30 |       30
      $R^2$ |    0.348 |    0.451 |    0.459 |    0.715 |    0.715

It is equally easy to group the columns according to some headings:

julia> grouped_table = regtable( "Group 1"=>(m1,m2,m3),
                                 "Group 2"=>(m4, m5))
            |            Group 1             |      Group 2
            |   (1)    |   (2)    |   (3)    |   (4)   |   (5)
------------------------------------------------------------------
(Intercept) |   19.978 |   15.809 |   14.167 |  11.834 |   11.011
            | (11.688) | (11.084) | (11.519) | (8.535) | (11.704)
     Raises |    0.691 |    0.379 |    0.352 |  -0.026 |   -0.033
            |  (0.179) |  (0.217) |  (0.224) | (0.184) |  (0.202)
   Learning |          |    0.432 |    0.394 |   0.246 |    0.249
            |          |  (0.193) |  (0.204) | (0.154) |  (0.160)
 Privileges |          |          |    0.105 |  -0.103 |   -0.104
            |          |          |  (0.168) | (0.132) |  (0.135)
 Complaints |          |          |          |   0.691 |    0.692
            |          |          |          | (0.146) |  (0.149)
   Critical |          |          |          |         |    0.015
            |          |          |          |         |  (0.147)
------------------------------------------------------------------
          N |       30 |       30 |       30 |      30 |       30
      $R^2$ |    0.348 |    0.451 |    0.459 |   0.715 |    0.715

When printing to latex, multi-column or multi-row groupings are automatically handled with latex \multicolumn and \multirow environments. TexTables.jl also supports tables of summary statistics and cross-tabulations, including grouped summary tables:

julia> using RDatasets

julia> df = dataset("datasets", "iris");
julia> c1 = summarize_by(df, :Species, [:SepalLength, :SepalWidth])
           |             | Obs | Mean  | Std. Dev. |  Min  |  Max
-------------------------------------------------------------------
    setosa | SepalLength |  50 | 5.006 |     0.352 | 4.300 | 5.800
           |  SepalWidth |  50 | 3.428 |     0.379 | 2.300 | 4.400
-------------------------------------------------------------------
versicolor | SepalLength |  50 | 5.936 |     0.516 | 4.900 | 7.000
           |  SepalWidth |  50 | 2.770 |     0.314 | 2.000 | 3.400
-------------------------------------------------------------------
 virginica | SepalLength |  50 | 6.588 |     0.636 | 4.900 | 7.900
           |  SepalWidth |  50 | 2.974 |     0.322 | 2.200 | 3.800

For more details, see the documentation in the README. Please let me know what you think. I’m very eager to make improvements to make this as useful as possible.


#2

Is it possible to create double underline for the latex table?


#3

Like with the command \hline \hline instead of \bottomrule? I haven’t implemented that yet, but it would be very easy to add as a display option. I think it looks a little weird just at the bottom though? Would you want to replace \toprule as well?


#4

I cant say for other people, but I do prefer to see regression tables with double lines under caption.


#5

That is good feedback. I will definitely add this as a feature, since it’s easy enough to do I think. Thank you!
Any other ideas or things that you’d like to see?


#6

I am currently using stargazer, hopefully TexTables will have all the functionalities of it.


#7

I think the trouble is that the Julia package ecosystem for estimation is a bit more fragmented and so it would be very difficult to provide comprehensive support for combining regression tables across different packages all under one roof, especially because I wouldn’t want to make assumptions/decisions about what statistics or other model specific information to include. Stargazer is really quite comprehensive, but if you want to use a regression package they don’t support you’re just out of luck. That’s not as much of a problem in R, because people are mostly constrained to use the official packages for reasons of speed. But it’s a real problem if you want to roll your own estimators and still have the regression tables play nicely with the ones from more official packages.

For instance, in economics if you’re doing structural estimation you might well want to have a table that compares the results from simpler linear models to the results from a more complicated estimation routine that you wrote yourself. It would be really cool if there was a standard table format/API where if you implement 10 lines of code on your fitted model, you can get a nice tabular output that can be merged easily with the output tables from completely different packages.


#8

Hi,
I have tried to use the package to export DataFrame to latex, as I particularly like the ability to group the data. But I have not found any convenient method how to use the DataFrame.


#9

With coefficients represented by an <: AbstractDict, you could merge the results of various regressions and have empty cells for missing ones. The ordering may need to be specified manually.

That said, having an easy to use API for just constructing tables is almost as good.


#10

What is the difference of your package with respect to https://github.com/jmboehm/RegressionTables.jl?


#11

Currently there’s no built in method for exporting a DataFrame, since I haven’t decided yet on how the default printing method should be designed, but it should be fairly easy to do yourself with code like the following:

# Here's some data
df   = DataFrame(A=[5, 7, 10, 2], B=[1.2, 2.7, 4.9, 1.2])

# Get the size of the dataframe
n    = size(df, 1)

# Loop through the columns and convert them one at a time.
cols = [] 
for var in names(df)
    # If you want a different representation for missing values, 
    # just change the second argument of coalesce
    push!(cols, TableCol(var, collect(1:n), coalesce.(df[var], "")))
end

# Assemble and print the table
table = hcat(cols...) 
write("myfile.tex", table)

# output to file:
\begin{tabular}{r|cc}
\toprule
  & A  & B     \\ \hline
1 &  5 & 1.200 \\
2 &  7 & 2.700 \\
3 & 10 & 4.900 \\
4 &  2 & 1.200 \\
\bottomrule
\end{tabular}

Note: Because of the way that the data is represented internally, if you want to export a dataframe to LaTeX, you’ll need to have row numbers (since the columns are represented internally as an OrderedDict, they need to have unique identifiers). At some point in the future I could probably add the option to suppress printing all the row indices in the output file.

If you’re interested in other ways to construct TexTable objects, feel free to ask. Or, there’s a whole section in the documentation under Advanced Usage on constructing tables from scratch.

I hope this starts to answer the question. The examples that I have put together are mostly showcasing similar functionality to RegressionTables.jl since that is what I think is the most pressing need for a lot of people, however I think that the real goal of TexTables is to have a framework that users can build on that does not require me or some other package developer to have designed the specific table that they are outputting (even if some common ones are built in)


#12

This is actually almost exactly how I handled the underlying implementation (using an OrderedDict so that people wouldn’t have to re-specify the order with a clunky synatx). :slight_smile: The additional machinery that is wrapped around it is because I found it was tricky to encode the fact that in a lot of tables, what you actually have is two or more subtables in each row/column that need to be merged together while respecting sub-table boundary.

For instance, think about a regression table with statistics at the bottom (say, with m2 and m4 from my example up in the OP). How do you encode the fact that when you merge those two columns together, you need to insert the coefficients from m4 (some of which will not be in m2) in above the fit statistics from both models? In my first pass at implementing this, when you tried to merge the tables together, the coefficients from m4 that did not appear in m2 would get inserted below the fit statistics without manually re-specifying the order. In TexTables, what I did was use a multi-index to encode the difference between the two blocks, where the labels are set to "" on the highest level for both of them, and the printer type for the tables knows not to print out a column of the row index when all of the rows are empty.


#13

I think the solution to this is to use the StatsModels RegressionModel object type and make sure you define coef etc. for your subtype.


#14

That helps a lot. But it doesn’t solve the problem that you might want to put more detailed information in your output table (which type of regression you ran, which sets of instruments, whether you had fixed effects, etc…) There’s a lot of custom tables that people want, where they have a very particular way that they want things to appear. My hope is that if building the tables is really easy in the first place, then we wouldn’t have to rely as heavily on one or two centralized functions that have a lot of obscure display options that are hard to understand as a user.

Also, if anyone could point me to the full documentation of the RegressionModel API, I would be most appreciative. I think I saw it once somewhere (possibly imagined) but I’ve never been able to find it again.


#15

BTW: I already waited for someone to write a stargazer-like package for Julia. It’s very cool that you are doing that now @jacobadenbaum :blush:


#16

Quick Update: TexTables is now registered, so you should be able to install it with

Pkg.add("TexTables")

I’ve made a few chages for the first official release, and now it should work with any RegressionModel object that satisfies the API in StatsBase.

If you want to add additional metadata (like a row that describes whether or not you included fixed effects, or that describes which estimator you used) to your regression columns, use the setmeta! method. See the documentation for more details.

Please let me know if you have any problems/suggestions or if there are any additional features that you would like added.