I think that’s getting to bottom of the issue. Iterators use codepoints, whereas ranges use byte indexes. I see the need for byte indexing, but being new to the language, this idiosyncracy tripped me up. I expected to be working with codepoints all the way down unless explicitly converting into bytes. For example, in Java:
jshell> "αβγł€đŧŧŋ".substring(0,5)
$1 ==> "αβγł€"
jshell> "αβγł€đŧŧŋ".getBytes()
$2 ==> byte[19] { -50, -79, -50, -78, -50, -77, -59, -126, -30, -126, -84, -60, -111, -59, -89, -59, -89, -59, -117 }
It might have been possible to implement strings as a vector of codepoints and still achieve O(1) performance for indexing using a lookup table for indicies, or (to save memory) an isascii
property. Too late for that though.
Note, I’m not saying that Julia should be “like Java”, but I do agree that the API should be consistent. Looks to me like the LegacyStrings implementation might have been better.