Maybe the definition of simple can be discussed, if plain old for loop are an option, then the following code works, is readable and doesn’t allocate. (But it is not a one-liner )
function firstdiff(s1, s2)
if length(s1) != length(s2)
return min(length(s1), length(s2)) + 1
for (i,(c1,c2)) in enumerate(zip(s1,s2))
if c1 != c2
return i
return 0
(Oh, I just noted your profile name, so it wasn’t a beginner question. For sure that solution was obvious to you anyway )
julia> findfirst( a!=b for (a,b) in zip(s1,s2) )
ERROR: MethodError: no method matching keys(::Base.Iterators.Zip{Tuple{String, String}})
Closest candidates are:
keys(::IndexStyle, ::AbstractArray, ::AbstractArray...) at ~/julia/1.7.3/share/julia/base/abstractarray.jl:350
keys(::Tuple) at ~/julia/1.7.3/share/julia/base/tuple.jl:72
keys(::Tuple, ::Tuple...) at ~/julia/1.7.3/share/julia/base/tuple.jl:77
Note that for String you can probably do better (performance-wise) by comparing bytes in the codeunits(s1) and codeunits(s2) arrays, then converting the resulting byte index back to a string index with thisind.
That does the trick, works for the Unicode character strings too. Basically a shortened version of your original function, without the length check. That also means that when the lengths are different with one of them being a substring of the other, for eg. “julia” and “julialang”, it returns nothing - not sure if that’s okay or not for @rafael.guerra 's use case.
(Tangential, but I couldn’t find this return in a for loop documented in the manual section on loops or in REPL docstrings. Does it only work in global scope, where can I find more about it?)
Oh yeah, I was gonna mention that in a now-abandoned post. By string indices you mean byte indices I presume? If it’s something user-facing, graphemes may also be the thing to consider.
I mean indices that you can actually use to index into the string, i.e. an index i where s1[i] != s2[i] is valid, so you can use it for subsequent processing. Yes, technically this is a codeunit index (a byte index for String).
For example, this implementation is both faster than anything posted so far and is correct for Unicode (in that it returns a valid index or nothing), though it doesn’t take Unicode normalization into account:
const UTF8String = Union{String,SubString{String}}
function firstdiff_index(s1::UTF8String, s2::UTF8String)
c1, c2 = codeunits(s1), codeunits(s2)
@inbounds for i in 1:min(length(c1),length(c2))
c1[i] != c2[i] && return thisind(s1, i)
return nothing
If I’m not mistaken, @SteffenPL got the simplest solution for ASCII strings in post#8, but only @stevengj’s solution provides correct answers for Unicode strings. Thanks to all.
This is > 50\times slower than a loop, and is also somewhat different from the other solutions in that it fails if s1 and s2 differ in more than a single character, instead of returning the first mismatch.
Note that if you want something that works for arbitrary AbstractString subtypes (not just UTF-8 encodings), you could use:
function firstdiff_indices(s1::AbstractString, s2::AbstractString)
for ((i1, c1), (i2, c2)) in zip(pairs(s1), pairs(s2))
c1 != c2 && return (i1, i2)
return nothing
(Note that in this case you need to return two indices in general, since s1 and s2 might have different indexing schemes.) It’s non-allocating, but is about 5x slower than the byte-scan method for String.