StringIndex idea (Julia 2.0)

stevengj · February 5, 2024, 1:00pm

I disagree with your premise that random codepoint indexing is “working with strings properly”. String algorithms that involve random access should be operating on codeunits (= bytes in UTF-8), not codepoints (Chars), and codeunits already provide O(1) access.

So far, in all the discussions of strings in Julia, to my recollection there has not been a single example of a practical string algorithm that requires random-access codepoint indexing (as opposed to a “pointer” to a previously traversed location, as in a search result).

The discussions on this thread about making the current string (codeunit) indices opaque and adding other index types have mostly been about making things more intuitive for new users, and preventing bugs due to mis-use of s[i+1] instead of s[nextind(s, i)] and similar. Not about an algorithmic need for fast random character indexing.

(AFAIK, no mainstream programming language other than Python3 tries to guarantee O(1) random codepoint access for its default string type, and Python3’s idiosyncratic approach comes with a lot of disadvantages.)

Topic		Replies	Views
String indices : byte indexing feels wrong New to Julia strings , unicode	18	1411	December 5, 2023
Breakage due to changes in `String` slicing in v0.7 Internals & Design	35	2312	February 12, 2018
Indexing strings by Unicode code point instead of code unit? General Usage strings	14	2511	January 12, 2024
Substring function? New to Julia strings , unicode	42	4003	July 18, 2022
Performance of length(::String) Performance	24	3933	July 28, 2018

StringIndex idea (Julia 2.0)

Related topics