StringIndex idea (Julia 2.0)

I wouldn’t want to conflate ordinal syntax with StringIndex, although I think you’re on to something with using juxtaposition for StringIndex types.

Two reasons. The first is that indexing by codepoint isn’t an ordinal vs. cardinal distinction, so “If you use 4, it’s a byte index, if you use 4th it’s a codepoint index” would just be something to remember, it’s not at all obvious that 4th means the fourth codepoint, not the fourth codeunit.

The other one is that there are at minimum four ways to want to index a string: by its codeunits, codepoints, graphemes, and textwidth. I would argue that most times a developer is tempted to use codepoints, what they actually want is graphemes. Codepoints will prevent throwing an invalid index error, but they don’t prevent splitting up :+1:t3: or :australia: or  (I don’t have a good way to type Latin characters with combining codepoints, just pretend that one is composed).

But a syntax such as string[1ch:5ch], string[1gr:5gr], string[1wc:5wc] is a nicely-compact way to generate those sorts of index. wc because of the wcwidth function which never quite made it into the C standard, it could be 1tw to remind users that it’s using the textwidth function under the hood. I think this is fitting because the various approaches to indexing are effectively units, and this looks like a unit, and units are a place where highly-abbreviated names are considered acceptable, no one insists on 1m being spelled 1meter.

If this system also had CharString GraphemeString etc., the set should include 1cu (maybe just 1u?), so those can be addressed by a codeunit, since each would have a default interpretation of an Int. In fact, maybe that is the place to unify with ordinal indexing, since it carries the intended meaning of “index this in the normal way, whether or not that differs from whatever weird expectation the indexable type has for a number”.

2 Likes