I just found that the weird behaviour of “setdiff” in the 1.8.2, as can be seen in the following figure:
I think the “correct” result should be ‘[’, ‘1’, '1, ‘]’. Such a minor change leads to unexpected behaviour of all my old codes.
I just found that the weird behaviour of “setdiff” in the 1.8.2, as can be seen in the following figure:
I think the “correct” result should be ‘[’, ‘1’, '1, ‘]’. Such a minor change leads to unexpected behaviour of all my old codes.
It returns distinct values, though it could be spelled out more in the docstring.
Construct the set of elements in s but not in any of the iterables in itrs. Maintain order with arrays.
julia> setdiff("aaa","b")
1-element Vector{Char}:
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
You can use filter instead
julia> filter(≠('_'), "_[11]")
"[11]"
The behaviour of setdiff
seems correct to me. As per the documentation
setdiff(s, itrs...)
Construct the set of elements ins
but not in any of the iterables initrs
. Maintain order with arrays.
If you notice, a String
is an iterable of Char
’s[1], so it will return the set of chars present in the first string, but not in the second. A set by definition (mathematics) does not repeat values. For this, the return value is correct.
(1)
julia> for c in "hello"
println(c)
end
h
e
l
l
o
I guess this is not the case because a set is well-defined in Computer Science and Mathematics as a collection of unique elements (skipping a lot of details, really). But if you think the docs could be improved, you can feel free to fill an issue in GitHub or make a Pull Request.
All right! But I have to reload it in my package, and redefine my desired behaviour using filter!
It seems that “setdiff” is not oriented to string manipulation, it makes sense in Mathematics.
If you just need to remove the character from the string, you can use replace
instead. It returns a string. If you do need it to be a Vector{Char}
, you can collect
that too.
julia> s = "_[11]"
"_[11]"
julia> replace(s, "_" => "")
"[11]"
julia> collect(replace(s, "_" => ""))
4-element Vector{Char}:
'[': ASCII/Unicode U+005B (category Ps: Punctuation, open)
'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
']': ASCII/Unicode U+005D (category Pe: Punctuation, close)
Yes, it is also a good choice!
There is another aspect here that I find weird. As mentioned already, the documentation says that setdiff
returns a set. In this example, however, it returns a Vector{Char}
, and that is not a subtype of Set
.
It returns a set not a Set
, that means that it returns a set in the mathematical meaning of the word. That the set (collection of unique value) returned is not a Set
(data structure) is not really relevant. I guess it would be more clear if the documentation explicitly said something about it, maybe along the lines of
setdiff(s, itrs...)
Construct anArray
containing the set of elements ins
but not in any of the iterables initrs
. Maintain order with arrays.
I see. I agree that it would be a good idea to state explicitly in the documentation that setdiff
does not only work with Set
s. The return type seems to depend on the first argument. (It’s not always an Array
.) I’ve just seen that for union
this is already documented.