How to iterate over unicode characters with multiple codepoints

You can use Unicode.graphemes to iterate over graphemes (“user-perceived characters” in unicode), regardless of how they are encoded in code points:

julia> using Unicode

julia> graphemes("Héllo World")
length-11 GraphemeIterator{String} for "Héllo World"

julia> graphemes("Héllo World") |> collect
11-element Array{SubString{String},1}:
 "H"
 "é"
 "l"
 "l"
 "o"
 " "
 "W"
 "o"
 "r"
 "l"
 "d"

Note that the second element of this array is a string of 2 code points (“2 characters” in the terminology of Julia docs)

9 Likes