The i
-th codepoint is given by s[nextind(s, 0, i)]
:
julia> s = "cão"
"cão"
julia> [s[nextind(s, 0, i)] for i = 1:3]
3-element Array{Char,1}:
'c'
'ã'
'o'
However, realize that finding the i
-th codepoint is O(i) (linear) complexity for the UTF-8 encoding or any variable-width encoding.
The real question is why you want the i
-th codepoint. Usually, random positions in strings arise from other processing, e.g. searches, in which the index is already computed as a byproduct.
As @johnmyleswhite alluded to, the notion of a “character” in Unicode might not be what you expect. The strings s = "cão"
and s2 = "cão"
may look the same, and are canonically equivalent, but s2
actually has 4 Unicode codepoints (“characters”) even though it has 3 graphemes (what most users would consider “characters”), because in s2
the ã
is made from an ASCII a
followed by a U+0303 “combining tilde”. So, thinking in terms of the i
-th “position” in a string may indicate a conceptual misunderstanding of Unicode.