Would you mind a footnote on the above?
Graphemes are what (end-)users care about in most cases. If the goal was to show āthe filenames differ at position Nā to an average, non-developer user, then N being in terms of graphemes would make the most intuitive sense. (For eg. I originally tested the code segments here with strings in my native language, and was myself confused for a moment at the result, until I counted how many codepoints were there in each of the graphemes upto the difference.)
I agree, if itās a non-programming interface (as opposed to, say, an exception string) where the user will never take the position and use it to index into a Julia string. (Though Julia 1.9 will include grapheme slicing.)
(It might be reasonable to implement a graphemeind(s, i)
function that returns the index of the grapheme containing s[i]
in the Unicode
stdlib. I havenāt seen that functionality in other languages, though, e.g. it doesnāt seem to be in the Python grapheme
library.)
the intention was to provide a different proposal from the many already arrived.
If I had tried to find a better performing solution it would have taken 50 times the time it took me to find this one.
Regarding correctness, I point out that a āsimplerā solution was required to the problem of finding the position of a SINGLE non-matching character, not the first of many non-matching characters.
I used findfirst because a āsimpleā find did not come to mind (and I donāt know if it exists).
But the same function works with findlast (or even findall), for the given problem.
It also fails this test:
julia> s1, s2 = "zzzz", "zzzx"
("zzzz", "zzzx")
julia> findfirst(==(only(setdiff(s1,s2))), s1)
ERROR: ArgumentError: Collection is empty, must contain exactly 1 element
@rocco_sprmnt21, your proposal seems very simple and elegant. I was looking for a function to subtract
strings, which doesnāt seem to exist, probably for a good reasonā¦ But your setdiff
(nearly) does the trick, although slowly compared to the rest.
it can be useful, but you have to be careful, as @stevengj shows, for corner cases.
if you like setdiff, here are some amended versions
# for ASCII strings
setdiff(enumerate(s1),enumerate(s2))
# for Unicode strings
first(setdiff(pairs(s1),pairs(s2)))