I found first(str), last(str), and chop(str), but couldn’t find anything for getting a substring where multibyte unicode characters are involved. Something like: substring(str, start, stop) = str[nextind(str, 0, start):nextind(str, 0, stop)] Not sure if anyone else found this surprising. I suppose…

Yeh, been there. Just thought a substring function would be useful out of the box, with documentation about how it differs from the string[start:stop] form in terms of unicode and performance.

You can use the SubString constructor directly.

[image] fredrikekre: You can use the SubString constructor directly Which btw, is in one of Steve’s responses in the thread linked - here .

Unless I’m reading the docs incorrectly, SubString uses byte indexes.

Are you looking for a built-in function that outputs as in example below? substring(s,n) = join([s[c] for (i,c) in enumerate(eachindex(s)) if i ∈ n]) ▶ Results

Your implementation is interesting, but a little inefficient. Compare: substring(s,n) = join([s[c] for (i,c) in enumerate(eachindex(s)) if i ∈ n]) substring(str, start, stop) = str[nextind(str, 0, start):nextind(str, 0, stop)] and after they’ve both been warmed up… julia> s = "αβγł€đŧŧŋ" "αβγł€đŧ…

You can get 0-allocations by using a view: substring(str, start, stop) = view(str, nextind(str, 0, start):nextind(str, 0, stop))

Just thought a substring function would be useful out of the box, with documentation about how it differs from the string[start:stop] form in terms of unicode and performance. Slicing a[m:n] always makes a copy in Julia (at least, with the built-in types), whether for arrays or strings. If you w…

Maybe you will find this useful Subsetting strings in Julia using character indexing | Blog by Bogumił Kamiński

Substring function?

New to Julia

rafael.guerra February 18, 2022, 7:58am 2

You may wanna check this related thread.

Topic		Replies	Views
SubString doesn't work with unicode New to Julia question , unicode	13	1631	June 17, 2022
String indices : byte indexing feels wrong New to Julia strings , unicode	18	1707	December 5, 2023
Indexing strings by Unicode code point instead of code unit? General Usage strings	14	2800	January 12, 2024
Breakage due to changes in `String` slicing in v0.7 Internals & Design	35	2614	February 12, 2018
StringIndex idea (Julia 2.0) Internals & Design strings , indexing	71	4262	March 27, 2024

Substring function?

Related topics