“Character” (= codepoint) doesn’t mean what you think in Unicode, and your solution for this is buggy. Some Unicode codepoints are 0 characters wide, and some are 2 characters wide. For example, "föó"
is 5 codepoints, 2 of which are “modifier” characters of 0 width (which add accents to the preceding character). (Moreover, the canonically equivalent string "föó"
has 3 codepoints! See here if this confuses you.)
A better solution would be to use textwidth
, which measures the width of strings and characters (approximately, because in some cases this depends on the font and the terminal). For example:
function clipwidth(s::AbstractString, maxwidth::Integer)
width = 0
for (i,c) in pairs(s)
width += textwidth(c)
width > maxwidth && return s[1:prevind(s, i)]
end
return s
end
In general, thinking of “character indices” in Unicode is very often a sign of misunderstanding Unicode, and in that sense Julia’s string indexing has the helpful side effect of catching a lot of bugs.
PS. That being said, you can get the n-th index of a string s
with nextind(s,0,n)
, so you can do text[1:nextind(text,0,n)]
to obtain the first n
Unicode characters (codepoints) if that is really what you want.
PPS. Also, I think @printf
got it wrong here: julia#41068.