Cropping dataframe table data left or center

bertlhf · February 17, 2024, 1:57pm

Dataframe table data in Julia can easily be cropped right with the column_width keyword of PrettyTables.jl. For some of my columns I need data cropped left or center. I used to do this in R with the str_trunc function from the stringr package. I could not find this functionality in Julia, did I overlook it? How can I do this in Julia?

rafael.guerra · February 17, 2024, 2:39pm

What happens if you use column_width together with alignment?

bertlhf · February 17, 2024, 2:54pm

Thanks for quick reply. Alignment either :l or :r doesn’t change the right truncation.

rafael.guerra · February 17, 2024, 6:13pm

Please check the following proposal (*edited):

using DataFrames

# function by @StefanKarpinski (https://discourse.julialang.org/t/truncate-string/27978/14)
slice(s, n, m) = s[max(1, nextind(s, 0, n)):min(end, nextind(s, 0, m))]

lstr(s) = slice(s, 1, colwidth)
rstr(s) = slice(s, max(1, length(s) - colwidth + 1), length(s))
cstr(s) = slice(s, max(1, length(s)÷2 - colwidth÷2 + 1), min(length(s), length(s)÷2 + colwidth÷2))

df = DataFrame(strings = [join('α':'ω'), join('a':'z')])
df.length = length.(df.strings)

colwidth = 11
colswidths = [colwidth, 7]
textcols = names(df, String)
show(transform(df, textcols .=> ByRow(lstr); renamecols=false), alignment=:l, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(rstr); renamecols=false), alignment=:r, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:c, columns_width=colswidths)

bertlhf · February 18, 2024, 9:49am

Thanks a lot; almost there, I think

Your suggestion

df = DataFrame(strings = ["0123456789", join('a':'z'), "short"])
#=
3×1 DataFrame
 Row │ strings
     │ String
─────┼────────────────────────────
   1 │ 0123456789
   2 │ abcdefghijklmnopqrstuvwxyz
   3 │ short
=#

colwidth = 10

pretty_table(df, formatters=leftstr,   alignment=:l, columns_width=colwidth)
#=
┌────────────┐
│ strings    │
│ String     │
├────────────┤
│ 0123456789 │
│ abcdefghij │
│ short      │
└────────────┘
=#

pretty_table(df, formatters=rightstr,  alignment=:r, columns_width=colwidth)
#=
┌────────────┐
│    strings │
│     String │
├────────────┤
│ 0123456789 │
│ qrstuvwxyz │
│      short │
└────────────┘
=#

pretty_table(df, formatters=centerstr, alignment=:c, columns_width=colwidth)
#=
┌────────────┐
│  strings   │
│   String   │
├────────────┤
│ 0123456789 │
│ ijklmnopqr │
│   short    │
└────────────┘
=#

Solution I’m looking for

colwidth = 10

pretty_table(df, formatters=righttrunc, alignment=:l, columns_width=colwidth
#=
┌────────────┐
│ strings    │
│ String     │
├────────────┤
│ 0123456789 │
│ abcdefghi… │
│ short      │
└────────────┘
=#

# alignment remains :l
pretty_table(df, formatters=lefttrunc, alignment=:l, columns_width=colwidth
#=
┌────────────┐
│ strings    │
│ String     │
├────────────┤
│ 0123456789 │
│ …rstuvwxyz │
│ short      │
└────────────┘
=#

# alignment remains :l
pretty_table(df, formatters=centertrunc, alignment=:l, columns_width=colwidth
#=
┌────────────┐
│ strings    │
│ String     │
├────────────┤
│ 0123456789 │
│ abcd…wxyz  │
│ short      │
└────────────┘
=#

colwidth = 11

pretty_table(df, formatters=centertrunc, alignment=:l, columns_width=colwidth
#=
┌─────────────┐
│ strings     │
│ String      │
├─────────────┤
│ 0123456789  │
│ abcde…vwxyz │
│ short       │
└─────────────┘
=#

rafael.guerra · February 18, 2024, 10:07am

Note that in my edited post, I got rid of formatters because I could not find an option in PrettyTables.jl to apply them only to text columns when the dataframe is of mixed types.

The code in the edited post looks for dataframe columns of type String and applies the formatting only to those. It uses transform() to create a new dataframe with the string columns reformatted, while leaving the others as is.

You could use the same logic to get the sought results.

rafael.guerra · February 18, 2024, 10:28am

See here below a modification of the formatting function following your template.
I added your ASCII example and also my previous unicode example with a mixed types dataframe.

revised code

using DataFrames

# function by @StefanKarpinski (https://discourse.julialang.org/t/truncate-string/27978/14)
slice(s, n, m) = s[max(1, nextind(s, 0, n)):min(end, nextind(s, 0, m))]

function lstr(s)
    ls = length(s)
    ls ≤ colwidth && return s
    return slice(s, 1, colwidth) * "…"
end

function rstr(s)
    ls = length(s)
    ls ≤ colwidth && return s
    return "…" * slice(s, max(1, ls - colwidth + 1), ls)
end

function cstr(s)
    ls = length(s)
    ls ≤ colwidth && return s
    r1 = slice(s, 1, colwidth÷2)
    r2 = colwidth÷2 ≥ ls ? "" : slice(s, max(1, ls - colwidth÷2 + 1), ls)
    return r1 * "…" * r2
end


# Example#1
df = DataFrame(strings = ["0123456789", join('a':'z'), "short"])
colwidth = 10
colswidths = [colwidth + 3]
textcols = names(df, String)
show(transform(df, textcols .=> ByRow(lstr); renamecols=false), alignment=:l, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(rstr); renamecols=false), alignment=:l, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:l, columns_width=colswidths)


# Example#2
df = DataFrame(strings = [join('α':'ω'), join('a':'z')])
df.length = length.(df.strings)
colwidth = 11     # try also: 30
colswidths = [colwidth + 3, 7]
textcols = names(df, String)
show(transform(df, textcols .=> ByRow(lstr); renamecols=false), alignment=:l, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(rstr); renamecols=false), alignment=:r, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:c, columns_width=colswidths)

bertlhf · February 18, 2024, 11:17am

Thanks again. Two down, one to go, I think (the last one, cstr).

Example#1

# I changed "..." to "…" in your str functions

df = DataFrame(strings = ["0123456789", join('a':'z'), "short"])
#=
3×1 DataFrame
 Row │ strings
     │ String
─────┼────────────────────────────
   1 │ 0123456789
   2 │ abcdefghijklmnopqrstuvwxyz
   3 │ short
=#

show(transform(df, textcols .=> ByRow(lstr); renamecols=false), alignment=:l, columns_width=colswidths)
#=
3×1 DataFrame
 Row │ strings
     │ String
─────┼────────────────
   1 │ 0123456789
   2 │ abcdefghij…
   3 │ short
=#

# note: alignment=:l
show(transform(df, textcols .=> ByRow(rstr); renamecols=false), alignment=:l, columns_width=colswidths)
#=
3×1 DataFrame
 Row │ strings
     │ String
─────┼────────────────
   1 │ 0123456789
   2 │ …qrstuvwxyz
   3 │ short
=#

# note: alignment=:l
show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:l, columns_width=colswidths)
#=
3×1 DataFrame
 Row │ strings
     │ String
─────┼────────────────
   1 │ 01234456789
   2 │ ijklm…mnopqr
   3 │ shhort
=#

The last one should become something like

#=
3×1 DataFrame
 Row │ strings
     │ String
─────┼────────────────
   1 │ 01234456789
   2 │ abcde…vwxyz
   3 │ short
=#

bertlhf · February 18, 2024, 12:30pm

Just ran your last revised code. Everything fine now, only the last sees the “short” string mysteriously duplicated:

show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:l, columns_width=colswidths)
#=
3×1 DataFrame
 Row │ strings       
     │ String        
─────┼───────────────
   1 │ 0123456789
   2 │ abcde…vwxyz
   3 │ shortshort
=#

Dan · February 18, 2024, 12:42pm

This should be replaced by a graphemes based slice, which is more appropriate (some codepoints combine to a single… grapheme). Rewriting the above is simple thanks to Unicode package:

using Unicode

slice(s, n, m) = graphemes(s, max(n,1):min(m,length(graphemes(s))))

rafael.guerra · February 18, 2024, 12:55pm

Fixed in the revised code in post above .

bertlhf · February 18, 2024, 2:33pm

Thanks, @rafael.guerra. For what it is worth, below an example how I now use this solution in my workflow.

function tbk_trunc(trunc::Function, df::AbstractDataFrame, field::String, colwidth::Int64)
    transform(df, field .=> ByRow(trunc); renamecols=false)
end

@pipe first(tbk_triple(), 5) |> tbk_trunc(ctrunc, _, "OBJECT", 10) |> tbk_page(_)

first part of the pipe is a dataframe produced in some way or another from the more than 110,000 SUBJECT PREDICATE OBJECT triples of my personal “knowledge” base
second part of the pipe is your solution a little bit adapted to my own taste
third part of the pipe invokes PrettyTables and the less function; in this way I can easily browse large dataframes with the less pager in my terminal

rafael.guerra · February 19, 2024, 2:26pm

Just in case, are you aware of the TerminalPager Julia package?

bertlhf · February 19, 2024, 2:39pm

I became aware of TerminalPager.jl after I implemented the tbk_page function using a.o. the less function. Since I’m completely happy with it as it runs, I did not bother to look into TerminalPager.

Thanks anyway.

Topic		Replies	Views
How to change the column width of a DataFrame General Usage question , dataframes	8	1727	April 26, 2021
DataFrames show() ignores column width, prints everything Data jupyter , dataframes	3	890	January 14, 2021
Print DataFrame the same way pandas print New to Julia question , dataframes , prettytables	7	517	June 25, 2022
PrettyTables; DataFrames: Combination Visualization dataframes , prettytables	3	886	November 30, 2019
Configure DataFrame show General Usage dataframes	5	2643	December 11, 2020

Cropping dataframe table data left or center

Your suggestion

Solution I’m looking for

Example#1

Related topics