Weird behaviour of "setdiff"

I just found that the weird behaviour of “setdiff” in the 1.8.2, as can be seen in the following figure:
微信截图_20221002144818

I think the “correct” result should be ‘[’, ‘1’, '1, ‘]’. Such a minor change leads to unexpected behaviour of all my old codes.

It returns distinct values, though it could be spelled out more in the docstring.

Construct the set of elements in s but not in any of the iterables in itrs. Maintain order with arrays.

julia> setdiff("aaa","b")
1-element Vector{Char}:
 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

You can use filter instead

julia> filter(≠('_'), "_[11]")
"[11]"
3 Likes

The behaviour of setdiff seems correct to me. As per the documentation

setdiff(s, itrs...)
Construct the set of elements in s but not in any of the iterables in itrs. Maintain order with arrays.

If you notice, a String is an iterable of Char’s[1], so it will return the set of chars present in the first string, but not in the second. A set by definition (mathematics) does not repeat values. For this, the return value is correct.

(1)

julia> for c in "hello"
           println(c)
       end
h
e
l
l
o

I guess this is not the case because a set is well-defined in Computer Science and Mathematics as a collection of unique elements (skipping a lot of details, really). But if you think the docs could be improved, you can feel free to fill an issue in GitHub or make a Pull Request. :slight_smile:

2 Likes

All right! But I have to reload it in my package, and redefine my desired behaviour using filter!

It seems that “setdiff” is not oriented to string manipulation, it makes sense in Mathematics.

If you just need to remove the character from the string, you can use replace instead. It returns a string. If you do need it to be a Vector{Char}, you can collect that too.

julia> s = "_[11]"
"_[11]"

julia> replace(s, "_" => "")
"[11]"

julia> collect(replace(s, "_" => ""))
4-element Vector{Char}:
 '[': ASCII/Unicode U+005B (category Ps: Punctuation, open)
 '1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
 '1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
 ']': ASCII/Unicode U+005D (category Pe: Punctuation, close)
1 Like

Yes, it is also a good choice!

There is another aspect here that I find weird. As mentioned already, the documentation says that setdiff returns a set. In this example, however, it returns a Vector{Char}, and that is not a subtype of Set.

It returns a set not a Set, that means that it returns a set in the mathematical meaning of the word. That the set (collection of unique value) returned is not a Set (data structure) is not really relevant. I guess it would be more clear if the documentation explicitly said something about it, maybe along the lines of

setdiff(s, itrs...)
Construct an Array containing the set of elements in s but not in any of the iterables in itrs. Maintain order with arrays.

I see. I agree that it would be a good idea to state explicitly in the documentation that setdiff does not only work with Sets. The return type seems to depend on the first argument. (It’s not always an Array.) I’ve just seen that for union this is already documented.