Just thought a substring function would be useful out of the box, with documentation about how it differs from the string[start:stop] form in terms of unicode and performance.
Slicing a[m:n]always makes a copy in Julia (at least, with the built-in types), whether for arrays or strings. If you want to use a view (i.e. create a SubString object), the easiest way is to use @views on a block of code, e.g.
The real question is, where are you getting these character indices that you want to pass to your substring function? Usually you get indices to a substring from some previous iteration over the string, either from your own loop or from something like a findnext call, and these give you codeunit indices that you can pass to s[m:n] directly.
Because of Unicode’s complexity, wanting a substring from the m-th codepoint (“character”) to the n-th codepoint, as opposed to between two string indices (= code units), is actually an extremely uncommon operation (in non-buggy code). This is why it’s not built-in.
first and last are defined for any iterator. Since string iteration is over codepoints, they have to be consistent, but I agree that they need to be used with care.
As for chop, as far as I can tell it’s used to chop off a known suffix (usually an ASCII suffix so there are no issues with Unicode normalization), like a file extension, which is safe enough. (However, in starting in Julia 1.8 it will often be better to use the new chopprefix and chopsuffix functions, which only remove the prefix/suffix if it is present and which may be more efficient because they can avoid decoding the UTF-8.)