Configure DataFrame show

I need a way to configure the show method of DataFrames.jl.
I have in the past hacked the show method for the DataFrames type using internal DataFrames API. But of course this breaks when an update comes along.

What I would like is to globally do the following:

  • specify the number of rows printed while keeping the width adaptable
  • be able to print all columns where the columns not fitting on the same line are printed in a new table and
  • be able to specify the maximum string width.

In the past I have managed to hack something like this:

julia> dfshow(10, 20)
julia> df = DataFrame([Symbol(a)=>["sadlfhasdkfjhaasdfsdfsdfsadfasdfasdfasdfasdfa" for _ in 1:2000] for a in 'a':'f'])
2000×6 DataFrame
│ Row  │ a                       │ b                       │ c                       │
│      │ String                  │ String                  │ String                  │
├──────┼─────────────────────────┼─────────────────────────┼─────────────────────────┤
│ 1    │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
│ 2    │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
│ 3    │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
│ 4    │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
⋮
│ 1996 │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
│ 1997 │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
│ 1998 │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
│ 1999 │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
│ 2000 │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │

│ Row  │ d                       │ e                       │ f                       │
│      │ String                  │ String                  │ String                  │
├──────┼─────────────────────────┼─────────────────────────┼─────────────────────────┤
│ 1    │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
│ 2    │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
│ 3    │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
│ 4    │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
⋮
│ 1996 │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
│ 1997 │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
│ 1998 │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
│ 1999 │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │
│ 2000 │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │ sadlfhasdkfjhaasdfsd... │

Before opening an issue or trying to write a pull request, I thought I might ask if I just can’t find the correct documentation.

By the way this, using the newest DataFrame version with PrettyTables.jl:

julia> Base.active_repl.options.iocontext[:displaysize] = (20,displaysize(stdout)[2])
julia> df = DataFrame([Symbol(a)=>["sadlfhasdkfjhaasdfsdfsdfsadfasdfasdfasdfasdfa" for _ in 1:2000] for a in 'a':'f'])
2000×6 DataFrame
  Row │ a                                  b                           ⋯
      │ String                             String                      ⋯
──────┼─────────────────────────────────────────────────────────────────
    1 │ sadlfhasdkfjhaasdfsdfsdfsadfasdf…  sadlfhasdkfjhaasdfsdfsdfsad ⋯
    2 │ sadlfhasdkfjhaasdfsdfsdfsadfasdf…  sadlfhasdkfjhaasdfsdfsdfsad
    3 │ sadlfhasdkfjhaasdfsdfsdfsadfasdf…  sadlfhasdkfjhaasdfsdfsdfsad
    4 │ sadlfhasdkfjhaasdfsdfsdfsadfasdf…  sadlfhasdkfjhaasdfsdfsdfsad
    5 │ sadlfhasdkfjhaasdfsdfsdfsadfasdf…  sadlfhasdkfjhaasdfsdfsdfsad ⋯
    6 │ sadlfhasdkfjhaasdfsdfsdfsadfasdf…  sadlfhasdkfjhaasdfsdfsdfsad
  ⋮   │                 ⋮                                  ⋮           ⋱
 1996 │ sadlfhasdkfjhaasdfsdfsdfsadfasdf…  sadlfhasdkfjhaasdfsdfsdfsad
 1997 │ sadlfhasdkfjhaasdfsdfsdfsadfasdf…  sadlfhasdkfjhaasdfsdfsdfsad
 1998 │ sadlfhasdkfjhaasdfsdfsdfsadfasdf…  sadlfhasdkfjhaasdfsdfsdfsad ⋯
 1999 │ sadlfhasdkfjhaasdfsdfsdfsadfasdf…  sadlfhasdkfjhaasdfsdfsdfsad
 2000 │ sadlfhasdkfjhaasdfsdfsdfsadfasdf…  sadlfhasdkfjhaasdfsdfsdfsad
                                         5 columns and 1989 rows omitted

already helps, but isn’t quite what I want, since I can’t see all columns.

After 0.22 release you can expect that API will not change. But the output might change as printing changes are considered non-breaking in Julia in general (so even if we change nothing in DataFrames.jl Julia Base might start displaying things differently - actually this happens with every minor release of Julia).

  • specify the number of rows printed while keeping the width adaptable
  • be able to print all columns where the columns not fitting on the same line are printed in a new table and
  • be able to specify the maximum string width.

Probably @Ronis_BR can help you best with this but AFAICT:

  1. you can make a view (it is cheap) to get the number of rows you want and then ask to print all rows
  2. it is not feasible to do it cleanly in general, but an issue for adding a solution similar to what you ask for is open and it should be added at some point
  3. there is kwarg to do this that is called truncate
1 Like

Since I was using internal DataFrames methods for my hack, it was completely fine and expected that this could stop working in an update.

What I am looking for, however, is a way set the view options for the entire REPL session. Something similar to Base.active_repl.options.iocontext, I suppose, or in some global settings file.

DataFrames.jl intentionally does not have such global settings as it is safe to have them. But of course you can add some settings on your own - it should not be a problem.

This is not ideal, but I think you can tweak the following code to meet your needs:

function dfshow(df, num_printed_rows, max_str)
    rows, cols = size(df)
    display_rows, display_cols = displaysize(stdout)

    # Compute the size of the `Row` column.
    row_size = floor(Int, log10(rows)) + 1 + 2

    num_cols_per_line = floor(Int, (display_cols - row_size)/(max_str+2))
    num_cols_per_line == 0 && (num_cols_per_line = 1)

    col_beg = 1
    col_end = clamp(num_cols_per_line, 0, cols)

    while true
        show(view(df, :, col_beg:col_end),
             truncate = max_str,
             display_size = (num_printed_rows + 8, display_cols))
        println()

        col_beg = col_end + 1

        col_end == cols && break

        col_end = clamp(col_end + num_cols_per_line, 0, cols)
    end
end

Just three notes:

  1. There is probably 100 ways to write a better code than I did :smiley:
  2. The complicated part is to compute the size of the printed table. I think I need to add a function in PrettyTables to return such sizes. It will help a lot.
  3. If the column width is lower than max_str, then there will be a lot of empty space in the line.
3 Likes

Very cool! Thanks alot! I’ll play around with it.