Formatting DataFrames with PrettyTables

alexgr · September 30, 2022, 11:57am

I’d like to format some numeric columns with different decimal positions. The approach I am following is this:

using Random
using DataFrames
using PrettyTables

df = DataFrame(rand(5, 3), :auto)
pretty_table(df, formatters = ft_printf(["%.1f", "%.2f", "%.3f"], [1, 2, 3]))

┌─────────┬─────────┬─────────┐
│      x1 │      x2 │      x3 │
│ Float64 │ Float64 │ Float64 │
├─────────┼─────────┼─────────┤
│     0.1 │    0.78 │   0.750 │
│     0.2 │    0.69 │   0.366 │
│     0.4 │    0.15 │   0.127 │
│     0.4 │    0.84 │   0.096 │
│     0.1 │    0.88 │   0.107 │
└─────────┴─────────┴─────────┘

How would you go about specifying the format by column symbols instead of the position?
Something like…

pretty_table(df, formatters = Dict(:x1 => "%.1f", :x2 => "%.2f", :x3 =>"%.3f"))

alexgr · September 30, 2022, 4:15pm

Maybe an idea…

using Random
using DataFrames
using PrettyTables
using Formatting

function formatters(dataframe, formatters...)
    _dataframe = dataframe
    _formatters = Dict(formatters)

    return function (v, i, j) 
        column = propertynames(_dataframe)[j]
        if haskey(_formatters, column)
            return sprintf1(_formatters[column], v)
        end
        v
    end
end

df = DataFrame(rand(5, 3), :auto)

pretty_table(df, formatters = formatters(df, :x1 => "%.1f", :x2 => "%.2f", :x3 => "%.3f"))

rafael.guerra · September 30, 2022, 5:48pm

Could a reverse Dict be easier?

using Random, DataFrames, PrettyTables

df = DataFrame(rand(5, 3), [:x3, :x1, :x2])
d = Dict(Symbol.(names(df)) .=> 1:ncol(df))
pretty_table(df, formatters = ft_printf(["%.1f", "%.2f", "%.3f"], [d[:x1], d[:x2], d[:x3]]))

Ronis_BR · October 2, 2022, 3:55am

Natively it is not possible because PrettyTables does not “understand” the column names as in DataFrames. I think the best approach is the one proposed by @rafael.guerra

alexgr · October 2, 2022, 9:25am

Thank you both @rafael.guerra and @Ronis_BR.

I will use the closure approach (post #2) because it keeps the column and format tied together, do not rely on order and I find more ergonomic to type the column symbol than the quoted raw string. Also it allows to layout the code in rows which is nice to comment and uncomment specific columns.


fmt = formatters(df, 
    :x1 => "%.1f", 
    :x2 => "%.2f", 
    :x3 => "%.3f", 
)

pretty_table(df, formatters = fmt)

rafael.guerra · October 2, 2022, 9:53am

Alrigth. The same approach could be adapted to your use case:

using Random, DataFrames, PrettyTables

df = DataFrame(rand(5, 3), [:x3, :x1, :x2])

function formatters(df, fmt...)
    d = Dict(Symbol.(names(df)) .=> 1:ncol(df))
    keys = [first.(fmt)...]
    values = [last.(fmt)...]
    ft_printf(values, [d[s] for s in keys])
end

fmt = formatters(df, 
    :x1 => "%.1f", 
    :x2 => "%.2f", 
    :x3 => "%.3f",
)

pretty_table(df, formatters = fmt)

┌─────────┬─────────┬─────────┐
│      x3 │      x1 │      x2 │
│ Float64 │ Float64 │ Float64 │
├─────────┼─────────┼─────────┤
│   0.209 │     0.4 │    0.58 │
│   0.102 │     1.0 │    0.60 │
│   0.454 │     0.9 │    0.83 │
│   0.135 │     0.9 │    0.48 │
│   0.063 │     0.4 │    0.99 │
└─────────┴─────────┴─────────┘

alexgr · October 2, 2022, 10:35am

# Added this line to allow defaults for columns frequently used but not necessary present in df
 fmt = filter(pair -> haskey(d, pair.first), fmt)

using Random, DataFrames, PrettyTables

df = DataFrame(rand(5, 4), [:x3, :x1, :x2, :Price])

function formatters(df, fmt...)
    d = Dict(Symbol.(names(df)) .=> 1:ncol(df))
    fmt = filter(pair -> haskey(d, pair.first), fmt)
    keys = [first.(fmt)...]
    values = [last.(fmt)...]
    ft_printf(values, [d[s] for s in keys])
end

fmt = formatters(df, 
    :x1 => "%.1f", 
    # :x2 => "%.2f",
    :x3 => "%.3f",

    # defaults for columns frequently used but not necessary present in df
    :Count => "%.0f",
    :Price => "%.2f",
)

pretty_table(df, formatters = fmt)

┌─────────┬─────────┬──────────┬─────────┐
│      x3 │      x1 │       x2 │   Price │
│ Float64 │ Float64 │  Float64 │ Float64 │
├─────────┼─────────┼──────────┼─────────┤
│   0.155 │     0.1 │ 0.312756 │    0.01 │
│   0.527 │     0.6 │ 0.605655 │    0.80 │
│   0.688 │     0.1 │ 0.190063 │    0.17 │
│   0.899 │     0.3 │ 0.718441 │    0.71 │
│   0.100 │     0.1 │ 0.525271 │    0.94 │
└─────────┴─────────┴──────────┴─────────┘

Also

show(df, formatters = fmt)

5×4 DataFrame
 Row │ x3       x1       x2         Price   
     │ Float64  Float64  Float64    Float64 
─────┼──────────────────────────────────────
   1 │   0.152      0.2  0.121693      0.17
   2 │   0.782      0.7  0.352422      0.38
   3 │   0.546      0.9  0.0599519     0.03
   4 │   0.048      0.6  0.539459      0.31
   5 │   0.469      0.5  0.974332      0.37

Functions · DataFrames.jl

mwsohn · September 18, 2023, 9:08pm

Do you know if there is a way of suppressing the subheader (data types) from printing? Same data types such as CategoricalValues tends to be long and take up too much space.

rafael.guerra · September 18, 2023, 10:32pm

Following the example above, add the nosubheader keyword:

pretty_table(df, formatters=fmt, nosubheader=true)

mwsohn · September 19, 2023, 1:02am

Thank you. It works.

Topic		Replies	Views
Is there a better way to format each individual columns in a dataframe? General Usage dataframes , prettytables	6	671	March 16, 2024
Custom formatter for a DataFrame column? Data	1	1051	December 15, 2017
Format Dataframe General Usage dataframes , formatting	3	558	July 14, 2023
Write fixed-width table General Usage	7	1145	May 3, 2023
Display more decimals in DataFrame General Usage dataframes , prettytables	9	3689	June 21, 2021

Formatting DataFrames with PrettyTables

Related topics