I wrote a function to format individual columns of a dataframe and I’m wondering if that’s a good idea or not.
Is there a better way to format the columns of a dataframe before printing to markdown? I found the formatters in PrettyTables
difficult to use so I wrote the function
using DataStructures: DefaultDict
using Format
using DataFrames
function format_columns(table; formats...)
formats = DefaultDict("{}", formats...)
return DataFrame(
[k=>format.(formats[Symbol(k)], table[!,k]) for k in names(table)]
)
end
So I could do this:
format_columns(table,
foo="{:0.1f}",
bar="{:0.2f}",
baz="{:0.2f} dB",
qux="{:0.1%}"
)
and get
5×4 DataFrame
Row │ foo bar baz qux
│ String String String String
─────┼──────────────────────────────────
1 │ 1.0 0.00 0.00 dB 100.0%
2 │ 1.1 0.05 -0.01 dB 99.8%
3 │ 1.2 0.09 -0.04 dB 99.2%
4 │ 1.3 0.13 -0.07 dB 98.3%
5 │ 1.4 0.17 -0.12 dB 97.2%
Is there an easier way to do this?
Would this be useful to anyone as a package?
Given that I can’t find any other way to do this, I feel like either this isn’t a thing people need to do or my approach is subtly wrong.
1 Like
first of all, it occurs to me that these aspects may be in the scope of the prettytables.jl package.
Dataframe columns are more easily/efficiently transformed/combined if they maintain the appropriate type.
With PrettyTables, it may be written like this:
using DataFrames, PrettyTables
df = DataFrame(rand(5,4), [:foo, :bar, :baz, :qux])
dic = Dict([:foo, :bar, :baz, :qux] .=> ["%0.1f", "%0.2f", "%0.2f dB", "%0.1f%%"])
d = Dict(Symbol.(names(df)) .=> 1:ncol(df))
pretty_table(df, formatters = ft_printf(collect(values(dic)), getindex.(Ref(d), keys(dic))))
2 Likes
And an alternative way using Printf
:
using DataFrames, Printf
df = DataFrame(rand(5,4), [:foo, :bar, :baz, :qux])
dic = Dict([:foo, :bar, :baz, :qux] .=> ["%0.1f", "%0.2f", "%0.2f dB", "%0.1f%%"])
dg = string.(df)
for (key, val) in dic
fmt = Printf.Format(val)
dg[!, key] = Printf.format.(Ref(fmt), df[!, key])
end
dg
2 Likes
As far as you know, PrettyTables only acts on the external aspect of the format (redefines show for example)? But do the columns of the dataframe retain their original type?
The code for formatting columns with PrettyTables
and Printf
is quite quite a lot, so I wrote a little package that can do this GitHub - GHTaarn/PrettyDataFrames.jl: DataFrames with custom formatting of each column of data
I would have liked it to be a subtype of AbstractDataFrame
, but this appears to be a lot of work, so that will have to be a future project if this feature does not get into DataFrames
itself.
1 Like