Weird string slicing in korean

I have a trouble with korean string:

julia> julia = "julia"
"julia"

julia> julia[1:2]
"ju"

julia> julia = "줄리아"
"줄리아"

julia> julia[1:2]
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'줄', [4]=>'리'
Stacktrace:
 [1] string_index_err(s::String, i::Int64)
   @ Base .\strings\string.jl:12
 [2] getindex(s::String, r::UnitRange{Int64})
   @ Base .\strings\string.jl:267
 [3] top-level scope
   @ c:\Users\rmsms\OneDrive\lab\Rt\preprocess.jl:33

julia> julia[1:4]
"줄리"

Yes I could get why it works in this weird way(Maybe because korean character have 2 byte?), and I can handle it now. But Is this intentional?

ps. Japanese and Chinese character have same issue:

julia> julia = "ジュリア"
"ジュリア"

julia> julia[1:2]
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'ジ', [4]=>'ュ'
Stacktrace:
 [1] string_index_err(s::String, i::Int64)
   @ Base .\strings\string.jl:12
 [2] getindex(s::String, r::UnitRange{Int64})
   @ Base .\strings\string.jl:267
 [3] top-level scope
   @ c:\Users\rmsms\OneDrive\lab\Rt\preprocess.jl:37

julia> julia[1:4]
"ジュ"

julia> julia = "酒利兒"
"酒利兒"

julia> julia[1:2]
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'酒', [4]=>'利'
Stacktrace:
 [1] string_index_err(s::String, i::Int64)
   @ Base .\strings\string.jl:12
 [2] getindex(s::String, r::UnitRange{Int64})
   @ Base .\strings\string.jl:267
 [3] top-level scope
   @ c:\Users\rmsms\OneDrive\lab\Rt\preprocess.jl:41

julia> julia[1:4]
"酒利"

this is intentional. strings are indexed by bytes for performance reasons

If you don’t have pressing performance concerns, you could do

julia> String(collect("줄리아")[1:2])
"줄리"

which takes about 70 ns.

See also this discussion:

1 Like