String indexing bug?

str = "(θ,)"
str[3]

results in

ERROR: StringIndexError: invalid index [3], valid nearby indices [2]=>'θ', [4]=>','
Stacktrace:
 [1] string_index_err(s::String, i::Int64)
   @ Base ./strings/string.jl:12
 [2] getindex_continued(s::String, i::Int64, u::UInt32)
   @ Base ./strings/string.jl:235
 [3] getindex(s::String, i::Int64)
   @ Base ./strings/string.jl:228
 [4] top-level scope
   @ REPL[11]:1

I’m guessing maybe not a bug but unicode related. Still it feels unexpected to me. Should this be expected behaviour?

See the manual section on Unicode and UTF-8.

5 Likes

When i passed a filename containing θ to ffmpeg, ffmpeg rejected it because of that character.
Some unicode chars are not fully supported yet and it seems a deep issue that hasnt been fixed after decades.

Thank you for pointing me to the relevant part of the manual.

There have been several threads on this subject, one recently prompted this PR: graphemes(s, m:n) substring slicing by stevengj · Pull Request #44266 · JuliaLang/julia · GitHub

1 Like