Substring function?

stevengj · February 18, 2022, 2:42pm

Just thought a substring function would be useful out of the box, with documentation about how it differs from the string[start:stop] form in terms of unicode and performance.

Slicing a[m:n] always makes a copy in Julia (at least, with the built-in types), whether for arrays or strings. If you want to use a view (i.e. create a SubString object), the easiest way is to use @views on a block of code, e.g.

julia> s = "αβγł€đŧŧŋ"
"αβγł€đŧŧŋ"

julia> @views s[1:5]
"αβγ"

julia> typeof(ans)
SubString{String}

Slicing with @views works just fine for this.

The real question is, where are you getting these character indices that you want to pass to your substring function? Usually you get indices to a substring from some previous iteration over the string, either from your own loop or from something like a findnext call, and these give you codeunit indices that you can pass to s[m:n] directly.

If you are counting codepoints as “characters”, e.g. you want the “first 3 characters” in a string, then the odds are high that you are making a mistake. For example, "ü" is two codepoints (length("ü") == 2) because it is u followed by a combining character U+0308. See also this explanation: Myth: Counting coded characters or code points is important.

Because of Unicode’s complexity, wanting a substring from the m-th codepoint (“character”) to the n-th codepoint, as opposed to between two string indices (= code units), is actually an extremely uncommon operation (in non-buggy code). This is why it’s not built-in.

Topic		Replies	Views
SubString doesn't work with unicode New to Julia question , unicode	13	1445	June 17, 2022
String indices : byte indexing feels wrong New to Julia strings , unicode	18	1411	December 5, 2023
Indexing strings by Unicode code point instead of code unit? General Usage strings	14	2511	January 12, 2024
Breakage due to changes in `String` slicing in v0.7 Internals & Design	35	2312	February 12, 2018
StringIndex idea (Julia 2.0) Internals & Design strings , indexing	72	3343	March 27, 2024

Substring function?

Related topics