Dataframe table data in Julia can easily be cropped right with the column_width keyword of PrettyTables.jl. For some of my columns I need data cropped left or center. I used to do this in R with the str_trunc function from the stringr package. I could not find this functionality in Julia, did I overlook it? How can I do this in Julia?
What happens if you use column_width together with alignment?
Thanks for quick reply. Alignment either :l or :r doesnβt change the right truncation.
Please check the following proposal (*edited):
using DataFrames
# function by @StefanKarpinski (https://discourse.julialang.org/t/truncate-string/27978/14)
slice(s, n, m) = s[max(1, nextind(s, 0, n)):min(end, nextind(s, 0, m))]
lstr(s) = slice(s, 1, colwidth)
rstr(s) = slice(s, max(1, length(s) - colwidth + 1), length(s))
cstr(s) = slice(s, max(1, length(s)Γ·2 - colwidthΓ·2 + 1), min(length(s), length(s)Γ·2 + colwidthΓ·2))
df = DataFrame(strings = [join('Ξ±':'Ο'), join('a':'z')])
df.length = length.(df.strings)
colwidth = 11
colswidths = [colwidth, 7]
textcols = names(df, String)
show(transform(df, textcols .=> ByRow(lstr); renamecols=false), alignment=:l, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(rstr); renamecols=false), alignment=:r, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:c, columns_width=colswidths)
Thanks a lot; almost there, I think
Your suggestion
df = DataFrame(strings = ["0123456789", join('a':'z'), "short"])
#=
3Γ1 DataFrame
Row β strings
β String
ββββββΌββββββββββββββββββββββββββββ
1 β 0123456789
2 β abcdefghijklmnopqrstuvwxyz
3 β short
=#
colwidth = 10
pretty_table(df, formatters=leftstr, alignment=:l, columns_width=colwidth)
#=
ββββββββββββββ
β strings β
β String β
ββββββββββββββ€
β 0123456789 β
β abcdefghij β
β short β
ββββββββββββββ
=#
pretty_table(df, formatters=rightstr, alignment=:r, columns_width=colwidth)
#=
ββββββββββββββ
β strings β
β String β
ββββββββββββββ€
β 0123456789 β
β qrstuvwxyz β
β short β
ββββββββββββββ
=#
pretty_table(df, formatters=centerstr, alignment=:c, columns_width=colwidth)
#=
ββββββββββββββ
β strings β
β String β
ββββββββββββββ€
β 0123456789 β
β ijklmnopqr β
β short β
ββββββββββββββ
=#
Solution Iβm looking for
colwidth = 10
pretty_table(df, formatters=righttrunc, alignment=:l, columns_width=colwidth
#=
ββββββββββββββ
β strings β
β String β
ββββββββββββββ€
β 0123456789 β
β abcdefghiβ¦ β
β short β
ββββββββββββββ
=#
# alignment remains :l
pretty_table(df, formatters=lefttrunc, alignment=:l, columns_width=colwidth
#=
ββββββββββββββ
β strings β
β String β
ββββββββββββββ€
β 0123456789 β
β β¦rstuvwxyz β
β short β
ββββββββββββββ
=#
# alignment remains :l
pretty_table(df, formatters=centertrunc, alignment=:l, columns_width=colwidth
#=
ββββββββββββββ
β strings β
β String β
ββββββββββββββ€
β 0123456789 β
β abcdβ¦wxyz β
β short β
ββββββββββββββ
=#
colwidth = 11
pretty_table(df, formatters=centertrunc, alignment=:l, columns_width=colwidth
#=
βββββββββββββββ
β strings β
β String β
βββββββββββββββ€
β 0123456789 β
β abcdeβ¦vwxyz β
β short β
βββββββββββββββ
=#
Note that in my edited post, I got rid of formatters
because I could not find an option in PrettyTables.jl to apply them only to text columns when the dataframe is of mixed types.
The code in the edited post looks for dataframe columns of type String
and applies the formatting only to those. It uses transform()
to create a new dataframe with the string columns reformatted, while leaving the others as is.
You could use the same logic to get the sought results.
See here below a modification of the formatting function following your template.
I added your ASCII example and also my previous unicode example with a mixed types dataframe.
revised code
using DataFrames
# function by @StefanKarpinski (https://discourse.julialang.org/t/truncate-string/27978/14)
slice(s, n, m) = s[max(1, nextind(s, 0, n)):min(end, nextind(s, 0, m))]
function lstr(s)
ls = length(s)
ls β€ colwidth && return s
return slice(s, 1, colwidth) * "β¦"
end
function rstr(s)
ls = length(s)
ls β€ colwidth && return s
return "β¦" * slice(s, max(1, ls - colwidth + 1), ls)
end
function cstr(s)
ls = length(s)
ls β€ colwidth && return s
r1 = slice(s, 1, colwidthΓ·2)
r2 = colwidthΓ·2 β₯ ls ? "" : slice(s, max(1, ls - colwidthΓ·2 + 1), ls)
return r1 * "β¦" * r2
end
# Example#1
df = DataFrame(strings = ["0123456789", join('a':'z'), "short"])
colwidth = 10
colswidths = [colwidth + 3]
textcols = names(df, String)
show(transform(df, textcols .=> ByRow(lstr); renamecols=false), alignment=:l, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(rstr); renamecols=false), alignment=:l, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:l, columns_width=colswidths)
# Example#2
df = DataFrame(strings = [join('Ξ±':'Ο'), join('a':'z')])
df.length = length.(df.strings)
colwidth = 11 # try also: 30
colswidths = [colwidth + 3, 7]
textcols = names(df, String)
show(transform(df, textcols .=> ByRow(lstr); renamecols=false), alignment=:l, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(rstr); renamecols=false), alignment=:r, columns_width=colswidths)
show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:c, columns_width=colswidths)
Thanks again. Two down, one to go, I think (the last one, cstr).
Example#1
# I changed "..." to "β¦" in your str functions
df = DataFrame(strings = ["0123456789", join('a':'z'), "short"])
#=
3Γ1 DataFrame
Row β strings
β String
ββββββΌββββββββββββββββββββββββββββ
1 β 0123456789
2 β abcdefghijklmnopqrstuvwxyz
3 β short
=#
show(transform(df, textcols .=> ByRow(lstr); renamecols=false), alignment=:l, columns_width=colswidths)
#=
3Γ1 DataFrame
Row β strings
β String
ββββββΌββββββββββββββββ
1 β 0123456789
2 β abcdefghijβ¦
3 β short
=#
# note: alignment=:l
show(transform(df, textcols .=> ByRow(rstr); renamecols=false), alignment=:l, columns_width=colswidths)
#=
3Γ1 DataFrame
Row β strings
β String
ββββββΌββββββββββββββββ
1 β 0123456789
2 β β¦qrstuvwxyz
3 β short
=#
# note: alignment=:l
show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:l, columns_width=colswidths)
#=
3Γ1 DataFrame
Row β strings
β String
ββββββΌββββββββββββββββ
1 β 01234456789
2 β ijklmβ¦mnopqr
3 β shhort
=#
The last one should become something like
#=
3Γ1 DataFrame
Row β strings
β String
ββββββΌββββββββββββββββ
1 β 01234456789
2 β abcdeβ¦vwxyz
3 β short
=#
Just ran your last revised code. Everything fine now, only the last sees the βshortβ string mysteriously duplicated:
show(transform(df, textcols .=> ByRow(cstr); renamecols=false), alignment=:l, columns_width=colswidths)
#=
3Γ1 DataFrame
Row β strings
β String
ββββββΌβββββββββββββββ
1 β 0123456789
2 β abcdeβ¦vwxyz
3 β shortshort
=#
This should be replaced by a graphemes based slice, which is more appropriate (some codepoints combine to a single⦠grapheme). Rewriting the above is simple thanks to Unicode package:
using Unicode
slice(s, n, m) = graphemes(s, max(n,1):min(m,length(graphemes(s))))
Fixed in the revised code in post above .
Thanks, @rafael.guerra. For what it is worth, below an example how I now use this solution in my workflow.
function tbk_trunc(trunc::Function, df::AbstractDataFrame, field::String, colwidth::Int64)
transform(df, field .=> ByRow(trunc); renamecols=false)
end
@pipe first(tbk_triple(), 5) |> tbk_trunc(ctrunc, _, "OBJECT", 10) |> tbk_page(_)
- first part of the pipe is a dataframe produced in some way or another from the more than 110,000 SUBJECT PREDICATE OBJECT triples of my personal βknowledgeβ base
- second part of the pipe is your solution a little bit adapted to my own taste
- third part of the pipe invokes PrettyTables and the less function; in this way I can easily browse large dataframes with the less pager in my terminal
Just in case, are you aware of the TerminalPager Julia package?
I became aware of TerminalPager.jl after I implemented the tbk_page function using a.o. the less function. Since Iβm completely happy with it as it runs, I did not bother to look into TerminalPager.
Thanks anyway.