Cropping dataframe table data left or center

Dataframe table data in Julia can easily be cropped right with the column_width keyword of PrettyTables.jl. For some of my columns I need data cropped left or center. I used to do this in R with the str_trunc function from the stringr package. I could not find this functionality in Julia, did I overlook it? How can I do this in Julia?

What happens if you use column_width together with alignment?

Thanks for quick reply. Alignment either :l or :r doesn’t change the right truncation.

Please check the following proposal (*edited):

using DataFrames

# function by @StefanKarpinski (https://discourse.julialang.org/t/truncate-string/27978/14)
slice(s, n, m) = s[max(1, nextind(s, 0, n)):min(end, nextind(s, 0, m))]

lstr(s) = slice(s, 1, colwidth)
rstr(s) = slice(s, max(1, length(s) - colwidth + 1), length(s))
cstr(s) = slice(s, max(1, length(s)Γ·2 - colwidthΓ·2 + 1), min(length(s), length(s)Γ·2 + colwidthΓ·2))

df = DataFrame(strings = [join('Ξ±':'Ο‰'), join('a':'z')])
df.length = length.(df.strings)

colwidth = 11
colswidths = [colwidth, 7]
textcols = names(df, String)
show(transform(df, textcols .=> ByRow(lstr); renamecols=false), alignment=:l, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(rstr); renamecols=false), alignment=:r, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:c, columns_width=colswidths)

Thanks a lot; almost there, I think

Your suggestion

df = DataFrame(strings = ["0123456789", join('a':'z'), "short"])
#=
3Γ—1 DataFrame
 Row β”‚ strings
     β”‚ String
─────┼────────────────────────────
   1 β”‚ 0123456789
   2 β”‚ abcdefghijklmnopqrstuvwxyz
   3 β”‚ short
=#

colwidth = 10

pretty_table(df, formatters=leftstr,   alignment=:l, columns_width=colwidth)
#=
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ strings    β”‚
β”‚ String     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 0123456789 β”‚
β”‚ abcdefghij β”‚
β”‚ short      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
=#

pretty_table(df, formatters=rightstr,  alignment=:r, columns_width=colwidth)
#=
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    strings β”‚
β”‚     String β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 0123456789 β”‚
β”‚ qrstuvwxyz β”‚
β”‚      short β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
=#

pretty_table(df, formatters=centerstr, alignment=:c, columns_width=colwidth)
#=
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  strings   β”‚
β”‚   String   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 0123456789 β”‚
β”‚ ijklmnopqr β”‚
β”‚   short    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
=#

Solution I’m looking for

colwidth = 10

pretty_table(df, formatters=righttrunc, alignment=:l, columns_width=colwidth
#=
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ strings    β”‚
β”‚ String     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 0123456789 β”‚
β”‚ abcdefghi… β”‚
β”‚ short      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
=#

# alignment remains :l
pretty_table(df, formatters=lefttrunc, alignment=:l, columns_width=colwidth
#=
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ strings    β”‚
β”‚ String     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 0123456789 β”‚
β”‚ …rstuvwxyz β”‚
β”‚ short      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
=#

# alignment remains :l
pretty_table(df, formatters=centertrunc, alignment=:l, columns_width=colwidth
#=
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ strings    β”‚
β”‚ String     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 0123456789 β”‚
β”‚ abcd…wxyz  β”‚
β”‚ short      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
=#

colwidth = 11

pretty_table(df, formatters=centertrunc, alignment=:l, columns_width=colwidth
#=
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ strings     β”‚
β”‚ String      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 0123456789  β”‚
β”‚ abcde…vwxyz β”‚
β”‚ short       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
=#

Note that in my edited post, I got rid of formatters because I could not find an option in PrettyTables.jl to apply them only to text columns when the dataframe is of mixed types.

The code in the edited post looks for dataframe columns of type String and applies the formatting only to those. It uses transform() to create a new dataframe with the string columns reformatted, while leaving the others as is.

You could use the same logic to get the sought results.

See here below a modification of the formatting function following your template.
I added your ASCII example and also my previous unicode example with a mixed types dataframe.

revised code
using DataFrames

# function by @StefanKarpinski (https://discourse.julialang.org/t/truncate-string/27978/14)
slice(s, n, m) = s[max(1, nextind(s, 0, n)):min(end, nextind(s, 0, m))]

function lstr(s)
    ls = length(s)
    ls ≀ colwidth && return s
    return slice(s, 1, colwidth) * "…"
end

function rstr(s)
    ls = length(s)
    ls ≀ colwidth && return s
    return "…" * slice(s, max(1, ls - colwidth + 1), ls)
end

function cstr(s)
    ls = length(s)
    ls ≀ colwidth && return s
    r1 = slice(s, 1, colwidthΓ·2)
    r2 = colwidthΓ·2 β‰₯ ls ? "" : slice(s, max(1, ls - colwidthΓ·2 + 1), ls)
    return r1 * "…" * r2
end


# Example#1
df = DataFrame(strings = ["0123456789", join('a':'z'), "short"])
colwidth = 10
colswidths = [colwidth + 3]
textcols = names(df, String)
show(transform(df, textcols .=> ByRow(lstr); renamecols=false), alignment=:l, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(rstr); renamecols=false), alignment=:l, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:l, columns_width=colswidths)


# Example#2
df = DataFrame(strings = [join('Ξ±':'Ο‰'), join('a':'z')])
df.length = length.(df.strings)
colwidth = 11     # try also: 30
colswidths = [colwidth + 3, 7]
textcols = names(df, String)
show(transform(df, textcols .=> ByRow(lstr); renamecols=false), alignment=:l, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(rstr); renamecols=false), alignment=:r, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:c, columns_width=colswidths)

Thanks again. Two down, one to go, I think (the last one, cstr).

Example#1

# I changed "..." to "…" in your str functions

df = DataFrame(strings = ["0123456789", join('a':'z'), "short"])
#=
3Γ—1 DataFrame
 Row β”‚ strings
     β”‚ String
─────┼────────────────────────────
   1 β”‚ 0123456789
   2 β”‚ abcdefghijklmnopqrstuvwxyz
   3 β”‚ short
=#

show(transform(df, textcols .=> ByRow(lstr); renamecols=false), alignment=:l, columns_width=colswidths)
#=
3Γ—1 DataFrame
 Row β”‚ strings
     β”‚ String
─────┼────────────────
   1 β”‚ 0123456789
   2 β”‚ abcdefghij…
   3 β”‚ short
=#

# note: alignment=:l
show(transform(df, textcols .=> ByRow(rstr); renamecols=false), alignment=:l, columns_width=colswidths)
#=
3Γ—1 DataFrame
 Row β”‚ strings
     β”‚ String
─────┼────────────────
   1 β”‚ 0123456789
   2 β”‚ …qrstuvwxyz
   3 β”‚ short
=#

# note: alignment=:l
show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:l, columns_width=colswidths)
#=
3Γ—1 DataFrame
 Row β”‚ strings
     β”‚ String
─────┼────────────────
   1 β”‚ 01234456789
   2 β”‚ ijklm…mnopqr
   3 β”‚ shhort
=#

The last one should become something like

#=
3Γ—1 DataFrame
 Row β”‚ strings
     β”‚ String
─────┼────────────────
   1 β”‚ 01234456789
   2 β”‚ abcde…vwxyz
   3 β”‚ short
=#

Just ran your last revised code. Everything fine now, only the last sees the β€œshort” string mysteriously duplicated:

show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:l, columns_width=colswidths)
#=
3Γ—1 DataFrame
 Row β”‚ strings       
     β”‚ String        
─────┼───────────────
   1 β”‚ 0123456789
   2 β”‚ abcde…vwxyz
   3 β”‚ shortshort
=#

This should be replaced by a graphemes based slice, which is more appropriate (some codepoints combine to a single… grapheme). Rewriting the above is simple thanks to Unicode package:

using Unicode

slice(s, n, m) = graphemes(s, max(n,1):min(m,length(graphemes(s))))

Fixed in the revised code in post above .

Thanks, @rafael.guerra. For what it is worth, below an example how I now use this solution in my workflow.

function tbk_trunc(trunc::Function, df::AbstractDataFrame, field::String, colwidth::Int64)
    transform(df, field .=> ByRow(trunc); renamecols=false)
end

@pipe first(tbk_triple(), 5) |> tbk_trunc(ctrunc, _, "OBJECT", 10) |> tbk_page(_)
  • first part of the pipe is a dataframe produced in some way or another from the more than 110,000 SUBJECT PREDICATE OBJECT triples of my personal β€œknowledge” base
  • second part of the pipe is your solution a little bit adapted to my own taste
  • third part of the pipe invokes PrettyTables and the less function; in this way I can easily browse large dataframes with the less pager in my terminal

Just in case, are you aware of the TerminalPager Julia package?

1 Like

I became aware of TerminalPager.jl after I implemented the tbk_page function using a.o. the less function. Since I’m completely happy with it as it runs, I did not bother to look into TerminalPager.

Thanks anyway.