Find the position of a single non-matching character between two strings

rafael.guerra · August 30, 2022, 7:24pm

Would you mind a footnote on the above?

digital_carver · August 30, 2022, 7:39pm

Graphemes are what (end-)users care about in most cases. If the goal was to show “the filenames differ at position N” to an average, non-developer user, then N being in terms of graphemes would make the most intuitive sense. (For eg. I originally tested the code segments here with strings in my native language, and was myself confused for a moment at the result, until I counted how many codepoints were there in each of the graphemes upto the difference.)

stevengj · August 30, 2022, 7:52pm

I agree, if it’s a non-programming interface (as opposed to, say, an exception string) where the user will never take the position and use it to index into a Julia string. (Though Julia 1.9 will include grapheme slicing.)

(It might be reasonable to implement a graphemeind(s, i) function that returns the index of the grapheme containing s[i] in the Unicode stdlib. I haven’t seen that functionality in other languages, though, e.g. it doesn’t seem to be in the Python grapheme library.)

rocco_sprmnt21 · August 30, 2022, 8:00pm

the intention was to provide a different proposal from the many already arrived.
If I had tried to find a better performing solution it would have taken 50 times the time it took me to find this one.
Regarding correctness, I point out that a “simpler” solution was required to the problem of finding the position of a SINGLE non-matching character, not the first of many non-matching characters.
I used findfirst because a “simple” find did not come to mind (and I don’t know if it exists).
But the same function works with findlast (or even findall), for the given problem.

stevengj · August 30, 2022, 8:02pm

It also fails this test:

julia> s1, s2 = "zzzz", "zzzx"
("zzzz", "zzzx")

julia> findfirst(==(only(setdiff(s1,s2))), s1)
ERROR: ArgumentError: Collection is empty, must contain exactly 1 element

rafael.guerra · August 30, 2022, 8:03pm

@rocco_sprmnt21, your proposal seems very simple and elegant. I was looking for a function to subtract strings, which doesn’t seem to exist, probably for a good reason… But your setdiff (nearly) does the trick, although slowly compared to the rest.

rocco_sprmnt21 · August 30, 2022, 8:06pm

it can be useful, but you have to be careful, as @stevengj shows, for corner cases.

rocco_sprmnt21 · August 30, 2022, 9:47pm

if you like setdiff, here are some amended versions

# for ASCII strings
setdiff(enumerate(s1),enumerate(s2))


# for Unicode strings
first(setdiff(pairs(s1),pairs(s2)))

Topic		Replies	Views
Issue with findfirst on a string New to Julia strings	3	966	November 25, 2021
The correct use of findfirst General Usage strings	4	5881	March 26, 2019
Compare patterns in byte array New to Julia	4	509	February 19, 2021
Findfirst and eachline General Usage	14	208	May 20, 2025
Findfirst(r"\d",tmp) works but : findlast(r"\d",tmp) ERROR: MethodError: no method matching findlast(::Regex, ::String) General Usage regex	2	359	November 6, 2020

Find the position of a single non-matching character between two strings

Related topics