Current state and the future of PrettyTables.jl

Hi!

I started writing PrettyTables.jl six years ago. My idea was to make a very simple package to print some tabular data just like we have in ASCII Table Generator – Quickly format ASCII table. Great for source code comments and markdown!

Time passed, and PrettyTables.jl became much larger than I anticipated. We now have four back ends (text, HTML, latex, and markdown) and a huge set of features.

PrettyTables.jl has being very stable so far! I cannot remember the last issue about an actual bug (and not feature request). However, the code is a mess very difficult to understand and with questionable decisions. For example, the back ends do not share that much code, leading to duplications everywhere. I think this is avoiding contributions from the community.

Hence, I decided to use this stability to focus on a full rewrite of the package. Previously, I tried to make a system so adaptable that it would be possible to add many columns as the user wants to print information. After all these years, we are still using two additional columns (number and labels) and no feature request for more data.

I was reading the amazing analysis in the R’s gt package documentation on tables: https://gt.rstudio.com, and I decided to use the same approach with some tuning. Hence, PrettyTables.jl v3 will be based on R’s gt package, including the some names (notice that I am not a native speaker and many keyword names I used was pretty bad :smiley:, pun intended). Hopefully, this new design will allow me to add more back ends much easier and feature that were long missing (like merging header cells).

I will also remove many features that I have not seen any package using to let the code base more simple and maintainable. Thus, I need some feedback about what I cannot remove and also patience because v3 will be completely breaking. It will be a bumpy ride :slight_smile:

(@bkamins) As a starting point, all features DataFrames.jl is using will continue working exactly the same with only API changes. My idea is to provide the same output we have today for the feature set used by DataFrames.jl.

75 Likes

We :heart: PrettyTables.jl and use it a lot to display GeoTables.jl:

Looking forward to spanner column label, footnotes and source notes:

Please retain the feature that allows the display of rows of columns without materialization. We can’t materialize columns sometimes (big data requirements).

8 Likes

Good! The spanner column label (which I will implement the API different, but it will achieve the same result) is the reason why I started this rewrite.

Sure! This is a top priority. I will not materialize anything that is not really necessary. For example, the summary cell will be a function f(data, j) that must return the summary for the jth column. Hence, I only evaluate at the cells that are actually displayed.

6 Likes

My first question is: is anyone using the standalone = true option in the HTML back end? I think I will drop it so that I do not have to maintain the CSS inside the pretty tables. Hence, if anyone wants, they can wrap the output inside the rest of the HTML body.

3 Likes

Excited to see this re-write!!!

I have long advocated for multi-column headers, in Latex, I use this a lot.

\multicol{2}{c}{\uline{A multi-column header}}

I’ve always thought gt had a bit too much magic and didn’t produce pretty enough LaTeX tables. I also think multi-row spans are not necessary and would never use them, however. I would be happy if this feature stayed forgotten for now, until the rest of the package is written.

7 Likes

You just read my thoughts! The main purpose is allowing to merge columns in the headers (now called column labels). gt allows you to add “spanner labels”. PrettyTables will allow you to merge column label cells. Hence, we will not have for now multi-row merging (problematic for text backend) but we will have multi-column merging.

The good side is that we will allow merging cells at the bottom of the header, which seems not possible for gt and it has some good use cases.

2 Likes

We could also think about implementing some gt-like frontend on top of SummaryTables.jl which already has support for column and row merging, but is (at least currently) more opinionated about how tables look. Might be less work than rewriting PrettyTables?

I have no idea. Nevertheless, I need to rewrite to clean the code and fix bad decisions. Otherwise, it will become almost unmaintainable if I continue to add features.

2 Likes

Ok makes sense, then the focus is less on becoming gt-like than on increasing maintainability, right? Just wanted to see if there’s some synergy to be had in this effort :slight_smile:

Yes! But, in the process, I will try to become very close to gt. The major problem here is text-based tables, which gt does not support but I can rely on the current code which is at least stable.

I agree. Isn’t gt basically a grammar of graphics, but for tables?

I am happy to make a table by myself and then use pretty_table to print it. Just like I’m happy to make a vector myself and have Makie draw it.

I also find it convenient to use AlgebraOfGraphics when I want to use Grammar of Graphics. But I’m happy that AoG and Makie are separate packages.

5 Likes

I use multi row spans in tables a lot, but if it makes things complicated I think it’s fine to leave this out.

It would be nice if I could replace the table1 R package (or more frequently, hand-constructed tables) with this rewrite! And if the backend code is easier to deal with, maybe sometime can make a typst one?!? :drooling_face:

2 Likes

Any chance that version 3 be called NeoPrettyTables.jl ?
This way, we can have both the old PrettyTables.jl and the NeoPrettyTables.jl for people to choose which one they want.

5 Likes

Have you tried the table one functionality with Typst support in SummaryTables.jl? Just trying to gauge from this thread what is missing in the table landscape, I know it’s not about that package but you don’t really need to wait for a new PrettyTables it seems to me :slightly_smiling_face:

2 Likes

I just want to second this! For packages on which popular packages depend on (like DataFrames), major versions can be really disruptive. They essentially split all the package versions downstream into two worlds, and as a user I have to then opt into all the new or all the old versions, which can be super painful.

If the rewrite was called PrettyTables2.jl, or NeoPrettyTables.jl or something entirely different, that situation would not arise at all. The old PrettyTables.jl could just stay around forever as is.

This suggestion does of course show that we really should get “private” dependencies at some point where different packages can take a dependency on different versions of PrettyTables, so that this problem doesn’t even arise. Not the topic of this thread, though :slight_smile:

8 Likes

No - I wasn’t aware of SummaryTables.jl until your earlier comment :confounded:, and only glaced at the docs after. I will certainly take a look!

Yes, it should be straightforward. I just finished the re-implementation of HTML back end and it was relatively easy!

I understand the reasons to create a new package. However, IMHO, this is not the way to go. We should not create a new package every time a breaking change happens. The semantic versioning should take care of it.

Maintaining a package that prints things is complicated. Output changes are not considered breaking changes. In multiple occasions, PrettyTables.jl broke due to printing output changes in new Julia releases. I do not have enough time to track those things in two packages. Maybe we can fork the current PrettyTables.jl and call it CompatPrettyTables.jl ou OldPrettyTables.jl if someone is willing to maintain it. Thus, packages that do not want to move to the new version, can change the dependency.

Anyway, unless you are using something really complicated, the adaptation should be very easy to perform. For example, in the past we had:

pretty_tables(matrix; header = (["Position", "Velocity"], ["m", "m/s"]), formatters = ((v, i, j) -> @sprintf("%6.2f", v))

Now we will have:

pretty_tables(matrix; column_labels = [["Position", "Velocity"], ["m", "m/s"]], formatters = ((v, i, j) -> @sprintf("%6.2f", v))

Handling breaking changes in PrettyTables.jl is way easier than the usual because most of the time you just call one function at the end to print your table.

24 Likes

Update!

The HTML back end is almost finished. I select it to begin because it would be the easiest one to allow me testing the API and freeze the features.

Here is the current API version for some features:

data = randn(10, 6)

column_labels = [
    ["" for _ in 1:6],
    ["Test $i" for i in 1:6]
]

row_labels = collect(1:1:10)

stubhead_label = "Row"

merge_cells = [
    MergeCells(1, 1, 2, "Group #1")
    MergeCells(1, 5, 2, "Group #2")
]

footnotes = [
    (:column_label, 1, 1) => "This is the first group."
    (:column_label, 1, 5) => "This is the second group."
]

source_notes = "This is random data to test PrettyTables.jl v3.\nThe HTML back end is almost finished!"

summary_cell = (data, i) -> sum(@views data[:, i])

highlighters = [
    HtmlHighlighter((data, i, j) -> data[i, j] < 0, ["color" => "red"])
]

pretty_table(
    data;
    column_labels,
    footnotes,
    highlighters,
    merge_cells,
    line_breaks = true,
    row_labels,
    source_notes,
    stand_alone = true,
    stubhead_label,
    summary_cell,
    summary_row_label = "Mean"
)

which leads to:

I already see some complications regarding the GT specification. Hence, I need to trim it a little bit to make it feasible given my availability:

  1. The summary rows will be available only at the end of the table.
  2. The source notes will be only one row. If you want more than one row, use line breaks.
  3. There will no support for merging rows.
  4. We will support merging only cells at the column labels.
  5. I have no idea yet how to treat the decoration of summary rows…

Well, at least the development is faster than I anticipated :smiley: However, the biggest problem will be the text back end.

EDIT: I forgot the title and subtitle.

25 Likes

Question: Do you think it would be useful to have a summary column at the end of the table to show information about each row?

1 Like

Is there support for nested groupings in the rows just like in the columns? Like this

If so, there could be multiple margin summary rows at the bottom and multiple margin summary columns on the right.