Find the position of a single non-matching character between two strings

Would you mind a footnote on the above? :sweat:

Graphemes are what (end-)users care about in most cases. If the goal was to show ā€œthe filenames differ at position Nā€ to an average, non-developer user, then N being in terms of graphemes would make the most intuitive sense. (For eg. I originally tested the code segments here with strings in my native language, and was myself confused for a moment at the result, until I counted how many codepoints were there in each of the graphemes upto the difference.)

I agree, if itā€™s a non-programming interface (as opposed to, say, an exception string) where the user will never take the position and use it to index into a Julia string. (Though Julia 1.9 will include grapheme slicing.)

(It might be reasonable to implement a graphemeind(s, i) function that returns the index of the grapheme containing s[i] in the Unicode stdlib. I havenā€™t seen that functionality in other languages, though, e.g. it doesnā€™t seem to be in the Python grapheme library.)

1 Like

the intention was to provide a different proposal from the many already arrived.
If I had tried to find a better performing solution it would have taken 50 times the time it took me to find this one.
Regarding correctness, I point out that a ā€œsimplerā€ solution was required to the problem of finding the position of a SINGLE non-matching character, not the first of many non-matching characters.
I used findfirst because a ā€œsimpleā€ find did not come to mind (and I donā€™t know if it exists).
But the same function works with findlast (or even findall), for the given problem.

It also fails this test:

julia> s1, s2 = "zzzz", "zzzx"
("zzzz", "zzzx")

julia> findfirst(==(only(setdiff(s1,s2))), s1)
ERROR: ArgumentError: Collection is empty, must contain exactly 1 element
2 Likes

@rocco_sprmnt21, your proposal seems very simple and elegant. I was looking for a function to subtract strings, which doesnā€™t seem to exist, probably for a good reasonā€¦ But your setdiff (nearly) does the trick, although slowly compared to the rest.

it can be useful, but you have to be careful, as @stevengj shows, for corner cases.

if you like setdiff, here are some amended versions

# for ASCII strings
setdiff(enumerate(s1),enumerate(s2))


# for Unicode strings
first(setdiff(pairs(s1),pairs(s2)))

2 Likes