I mean indices that you can actually use to index into the string, i.e. an index i
where s1[i] != s2[i]
is valid, so you can use it for subsequent processing. Yes, technically this is a codeunit index (a byte index for String
).
For example, this implementation is both faster than anything posted so far and is correct for Unicode (in that it returns a valid index or nothing
), though it doesn’t take Unicode normalization into account:
const UTF8String = Union{String,SubString{String}}
function firstdiff_index(s1::UTF8String, s2::UTF8String)
c1, c2 = codeunits(s1), codeunits(s2)
@inbounds for i in 1:min(length(c1),length(c2))
c1[i] != c2[i] && return thisind(s1, i)
end
return nothing
end
What would the user do with a grapheme index?