I know that a lot has been written about string slicing safely with UTF-8 strings that will contain multiple code points per printable character. We have thisind, prevind, nextind and searching as ways to get valid indices. I don’t want to stir up anything here. There are many times when it is conceptually easier to think of a string as an array of n printable characters (length(s) = n). We also know that lastindex(s) is often > length(s).
Here is a simple way to slice strings as if they are arrays of printable characters so that the actual valid index positions are opaque:
function cutstr(s::AbstractString, from::Int, to::Int)
from < 1 && error("Character number $from out of bounds")
to > length(s) && error("Character number $to out of bounds")
to < from && error("to character number must be greater or equal to from character number")
last(first(s, to), to - from + 1)
end
You could dispense with my error checking and just use Julia’s normal errors returned by the last and first functions, but there is very little cost in time for the extra error checking.
A trivial example:
julia> a = "\alpha[TAB]" * "foo"
"αfoo"
julia> @btime cutstr(a,2,4)
170.718 ns (2 allocations: 64 bytes)
"foo"
julia> @btime cutstr(a^5,2,14)
474.153 ns (3 allocations: 144 bytes)
"fooαfooαfooαf"
It’s not genius, but it is short and obvious. It’s so short you don’t really need to wrap it in its own function–just use it inline. I tried writing a function that found the correct index point for the starting character and then looped with nextind to catenate the additional characters. Probably my bad coding, but it was slower and certainly too clumsy to use inline.
Anyone found something else–maybe even simpler and faster?