Is there a better way to format each individual columns in a dataframe?

I wrote a function to format individual columns of a dataframe and I’m wondering if that’s a good idea or not.

Is there a better way to format the columns of a dataframe before printing to markdown? I found the formatters in PrettyTables difficult to use so I wrote the function

using DataStructures: DefaultDict
using Format
using DataFrames
function format_columns(table; formats...)
    formats = DefaultDict("{}", formats...)
    return DataFrame(
        [k=>format.(formats[Symbol(k)], table[!,k]) for k in names(table)]
    )
end

So I could do this:

format_columns(table, 
foo="{:0.1f}",
bar="{:0.2f}",
baz="{:0.2f} dB",
qux="{:0.1%}"
)

and get

5×4 DataFrame
 Row │ foo     bar     baz       qux
     │ String  String  String    String
─────┼──────────────────────────────────
   1 │ 1.0     0.00    0.00 dB   100.0%
   2 │ 1.1     0.05    -0.01 dB  99.8%
   3 │ 1.2     0.09    -0.04 dB  99.2%
   4 │ 1.3     0.13    -0.07 dB  98.3%
   5 │ 1.4     0.17    -0.12 dB  97.2%

Is there an easier way to do this?
Would this be useful to anyone as a package?
Given that I can’t find any other way to do this, I feel like either this isn’t a thing people need to do or my approach is subtly wrong.

1 Like

first of all, it occurs to me that these aspects may be in the scope of the prettytables.jl package.
Dataframe columns are more easily/efficiently transformed/combined if they maintain the appropriate type.

With PrettyTables, it may be written like this:

using DataFrames, PrettyTables

df = DataFrame(rand(5,4), [:foo, :bar, :baz, :qux])

dic = Dict([:foo, :bar, :baz, :qux] .=> ["%0.1f", "%0.2f", "%0.2f dB", "%0.1f%%"])
d = Dict(Symbol.(names(df)) .=> 1:ncol(df))

pretty_table(df, formatters = ft_printf(collect(values(dic)), getindex.(Ref(d), keys(dic))))
2 Likes

And an alternative way using Printf:

using DataFrames, Printf

df = DataFrame(rand(5,4), [:foo, :bar, :baz, :qux])

dic = Dict([:foo, :bar, :baz, :qux] .=> ["%0.1f", "%0.2f", "%0.2f dB", "%0.1f%%"])
dg = string.(df)
for (key, val) in dic
    fmt = Printf.Format(val)
    dg[!, key] = Printf.format.(Ref(fmt), df[!, key])
end
dg
2 Likes

As far as you know, PrettyTables only acts on the external aspect of the format (redefines show for example)? But do the columns of the dataframe retain their original type?

Yes

The code for formatting columns with PrettyTables and Printf is quite quite a lot, so I wrote a little package that can do this GitHub - GHTaarn/PrettyDataFrames.jl: DataFrames with custom formatting of each column of data

I would have liked it to be a subtype of AbstractDataFrame, but this appears to be a lot of work, so that will have to be a future project if this feature does not get into DataFrames itself.

1 Like