Substring function?

stevengj · February 19, 2022, 3:18pm

Julia has a working substring functionality, which works on string indices.

The proposed character-indexing method does not eliminate the complexities of Unicode. Consider:

julia> substring(str, start, stop) = str[nextind(str, 0, start):nextind(str, 0, stop)]

julia> substring("äöü", 2,3) # NFC normalized string
"öü"

julia> substring("äöü", 2,3) # NFD normalized string
"̈o"

The supposed simplicity of “character indexing” is an illusion.

You pay a big price in performance to reduce apparent confusion on people’s first few days of using strings in Julia, but only postpone your Unicode bugs (because “characters” don’t mean what you think), and you get zero benefits in the long run (because indexing codepoint counts is not actually necessary for realistic string processing).

Topic		Replies	Views
SubString doesn't work with unicode New to Julia question , unicode	13	1446	June 17, 2022
String indices : byte indexing feels wrong New to Julia strings , unicode	18	1411	December 5, 2023
Indexing strings by Unicode code point instead of code unit? General Usage strings	14	2513	January 12, 2024
Breakage due to changes in `String` slicing in v0.7 Internals & Design	35	2313	February 12, 2018
StringIndex idea (Julia 2.0) Internals & Design strings , indexing	72	3344	March 27, 2024

Substring function?

Related topics