Find the position of a single non-matching character between two strings

I mean indices that you can actually use to index into the string, i.e. an index i where s1[i] != s2[i] is valid, so you can use it for subsequent processing. Yes, technically this is a codeunit index (a byte index for String).

For example, this implementation is both faster than anything posted so far and is correct for Unicode (in that it returns a valid index or nothing), though it doesn’t take Unicode normalization into account:

const UTF8String = Union{String,SubString{String}}
function firstdiff_index(s1::UTF8String, s2::UTF8String)
    c1, c2 = codeunits(s1), codeunits(s2)
    @inbounds for i in 1:min(length(c1),length(c2))
        c1[i] != c2[i] && return thisind(s1, i)
    end
    return nothing
end

What would the user do with a grapheme index?

2 Likes