Formatting DataFrames with PrettyTables

I’d like to format some numeric columns with different decimal positions. The approach I am following is this:

using Random
using DataFrames
using PrettyTables

df = DataFrame(rand(5, 3), :auto)
pretty_table(df, formatters = ft_printf(["%.1f", "%.2f", "%.3f"], [1, 2, 3]))
┌─────────┬─────────┬─────────┐
│      x1 │      x2 │      x3 │
│ Float64 │ Float64 │ Float64 │
├─────────┼─────────┼─────────┤
│     0.1 │    0.78 │   0.750 │
│     0.2 │    0.69 │   0.366 │
│     0.4 │    0.15 │   0.127 │
│     0.4 │    0.84 │   0.096 │
│     0.1 │    0.88 │   0.107 │
└─────────┴─────────┴─────────┘

How would you go about specifying the format by column symbols instead of the position?
Something like…

pretty_table(df, formatters = Dict(:x1 => "%.1f", :x2 => "%.2f", :x3 =>"%.3f"))
2 Likes

Maybe an idea…

using Random
using DataFrames
using PrettyTables
using Formatting

function formatters(dataframe, formatters...)
    _dataframe = dataframe
    _formatters = Dict(formatters)

    return function (v, i, j) 
        column = propertynames(_dataframe)[j]
        if haskey(_formatters, column)
            return sprintf1(_formatters[column], v)
        end
        v
    end
end

df = DataFrame(rand(5, 3), :auto)

pretty_table(df, formatters = formatters(df, :x1 => "%.1f", :x2 => "%.2f", :x3 => "%.3f"))

Could a reverse Dict be easier?

using Random, DataFrames, PrettyTables

df = DataFrame(rand(5, 3), [:x3, :x1, :x2])
d = Dict(Symbol.(names(df)) .=> 1:ncol(df))
pretty_table(df, formatters = ft_printf(["%.1f", "%.2f", "%.3f"], [d[:x1], d[:x2], d[:x3]]))

Natively it is not possible because PrettyTables does not “understand” the column names as in DataFrames. I think the best approach is the one proposed by @rafael.guerra

Thank you both @rafael.guerra and @Ronis_BR.

I will use the closure approach (post #2) because it keeps the column and format tied together, do not rely on order and I find more ergonomic to type the column symbol than the quoted raw string. Also it allows to layout the code in rows which is nice to comment and uncomment specific columns.


fmt = formatters(df, 
    :x1 => "%.1f", 
    :x2 => "%.2f", 
    :x3 => "%.3f", 
)

pretty_table(df, formatters = fmt)

Alrigth. The same approach could be adapted to your use case:

using Random, DataFrames, PrettyTables

df = DataFrame(rand(5, 3), [:x3, :x1, :x2])

function formatters(df, fmt...)
    d = Dict(Symbol.(names(df)) .=> 1:ncol(df))
    keys = [first.(fmt)...]
    values = [last.(fmt)...]
    ft_printf(values, [d[s] for s in keys])
end

fmt = formatters(df, 
    :x1 => "%.1f", 
    :x2 => "%.2f", 
    :x3 => "%.3f",
)

pretty_table(df, formatters = fmt)

┌─────────┬─────────┬─────────┐
│      x3 │      x1 │      x2 │
│ Float64 │ Float64 │ Float64 │
├─────────┼─────────┼─────────┤
│   0.209 │     0.4 │    0.58 │
│   0.102 │     1.0 │    0.60 │
│   0.454 │     0.9 │    0.83 │
│   0.135 │     0.9 │    0.48 │
│   0.063 │     0.4 │    0.99 │
└─────────┴─────────┴─────────┘
1 Like
# Added this line to allow defaults for columns frequently used but not necessary present in df
 fmt = filter(pair -> haskey(d, pair.first), fmt)
using Random, DataFrames, PrettyTables

df = DataFrame(rand(5, 4), [:x3, :x1, :x2, :Price])

function formatters(df, fmt...)
    d = Dict(Symbol.(names(df)) .=> 1:ncol(df))
    fmt = filter(pair -> haskey(d, pair.first), fmt)
    keys = [first.(fmt)...]
    values = [last.(fmt)...]
    ft_printf(values, [d[s] for s in keys])
end

fmt = formatters(df, 
    :x1 => "%.1f", 
    # :x2 => "%.2f",
    :x3 => "%.3f",

    # defaults for columns frequently used but not necessary present in df
    :Count => "%.0f",
    :Price => "%.2f",
)

pretty_table(df, formatters = fmt)

┌─────────┬─────────┬──────────┬─────────┐
│      x3 │      x1 │       x2 │   Price │
│ Float64 │ Float64 │  Float64 │ Float64 │
├─────────┼─────────┼──────────┼─────────┤
│   0.155 │     0.1 │ 0.312756 │    0.01 │
│   0.527 │     0.6 │ 0.605655 │    0.80 │
│   0.688 │     0.1 │ 0.190063 │    0.17 │
│   0.899 │     0.3 │ 0.718441 │    0.71 │
│   0.100 │     0.1 │ 0.525271 │    0.94 │
└─────────┴─────────┴──────────┴─────────┘

Also

show(df, formatters = fmt)

5×4 DataFrame
 Row │ x3       x1       x2         Price   
     │ Float64  Float64  Float64    Float64 
─────┼──────────────────────────────────────
   1 │   0.152      0.2  0.121693      0.17
   2 │   0.782      0.7  0.352422      0.38
   3 │   0.546      0.9  0.0599519     0.03
   4 │   0.048      0.6  0.539459      0.31
   5 │   0.469      0.5  0.974332      0.37

Functions · DataFrames.jl

1 Like

Do you know if there is a way of suppressing the subheader (data types) from printing? Same data types such as CategoricalValues tends to be long and take up too much space.

Following the example above, add the nosubheader keyword:

pretty_table(df, formatters=fmt, nosubheader=true)
2 Likes

Thank you. It works.