Cthulhu has an internal TextWidthLimiter<:IO
which allows you to print text, attempting (but not always succeeding) in limiting output to a certain number of characters. I’m considering splitting it out into its own package so that it can be used more broadly (and hopefully be made more robust).
One point I’m unsure of is how to handle the distinction between graphemes and Chars: the issue is that single graphemes at least sometimes take up the space of two Chars on my screen. This seems to introduce some inconsistencies in terminal manipulations, and I’m unsure of whether there is even a way to handle this robustly.
Here’s a demo which walks through some of the issues I’ve discovered. Note that here on discourse “éé” prints with no space between the "é"s, but when I try it in my terminal there is a space between them.
julia> using Unicode
julia> str = "exposé"
"exposé"
julia> collect(str) # collect will treat the é as two Chars
7-element Vector{Char}:
'e': ASCII/Unicode U+0065 (category Ll: Letter, lowercase)
'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)
'p': ASCII/Unicode U+0070 (category Ll: Letter, lowercase)
'o': ASCII/Unicode U+006F (category Ll: Letter, lowercase)
's': ASCII/Unicode U+0073 (category Ll: Letter, lowercase)
'e': ASCII/Unicode U+0065 (category Ll: Letter, lowercase)
'́': Unicode U+0301 (category Mn: Mark, nonspacing)
julia> using Unicode
julia> g = collect(graphemes(str)) # graphemes treats the é as a single entity
6-element Vector{SubString{String}}:
"e"
"x"
"p"
"o"
"s"
"é"
julia> c = g[end]
"é"
Now let’s see what happens when we mix printing c
with terminal manipulations. “\e[$(n)D” means “go back n
” and “\e[K” means "kill to the end of the line. Below, killstr
gets built to print n
times and then go backwards n
times, followed by killing to the end of the line; if each grapheme (despite appearances) really has width 1, this should leave a blank line in all cases:
julia> n = displaysize(stdout)[2] # current width of my terminal window
119
julia> killstr = repeat('x', n) * "\e[$(n)D\e[K"
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\e[119D\e[K"
julia> print(killstr) # works as expected
julia> killstr = repeat(c, n) * "\e[$(n)D\e[K"
"ééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééééé\e[119D\e[K"
julia> print(killstr) # does not work as expected
ééééééééééééééééééééééééééééééééééééééééééééééééééééééééééée
So it works as expected when printing 'x'
but not é
. Amusingly, note that the final character is 'e'
and not é
, indicating that it stripped the accent mark.
Therefore, this is also a lie:
julia> textwidth(c)
1
This makes me think that when it comes to width-limited output, Char
-iteration is to be prefered over graphemes
despite the current internal implementation of TextWidthLimiter. However, if this is a Julia bug (or terminal setting issue) that should be fixed, it might be better to correct it first and then write the package with the correct implementation in mind.
I’d love any insights anyone wants to share.