I thought I would ask here before opening an issue. Context: I was running a script that was using modules that I did not want to wrap up as a package, so at the beginning I had something like
push!(LOAD_PATH, expanduser("~/research/some_directory"))
During the morning, LOAD_PATH
filled up nicely with about 100 entries. So I thought I would find the equivalent of
pushnew!(collection, items...) = push!(collection, setdiff(items, collection)...)
in Base
, but I didn’t. However, I recalled union!
, which is basically the same. But it has no method for Vector
s, so I can’t do
union!(LOAD_PATH, [my_path])
at the moment.
The question is: is there a deeper design reason for not having either of these methods, or just that no one got around to writing them?
I think you’re looking for append!
?
union!
would seem to imply eliminating duplicates, which is a very different operation from push!(X, V...)
or append!(X, V)
, and is an expensive operation for arrays; if you need set-union, you’re better off with Set
or some other data structure that supports it efficiently.
Nope, I think I was specifically looking for pushnew!
above (which many languages with modifiable vectors have, eg cl:pushnew), but was ready to substitute with union!
. I do realize it is more expensive than append!
, but sometimes you want to just add new elements.
So the question is (before I file an issue): was there a reason for making union!
work for Set
s, but not Vectors
, even though union
exists for both?
There’s likely no design decision here, just a lack of motivation for implementing this method. I guess it would make sense to add it for completeness.
Bear in mind that union!(a, b)
would also have to remove duplicates from a
in order to return the same result as union(a, b)
– it’s not quite the same as pushnew!
. There’s no unique!(xs)
function exported from Base either, but @nalimilan’s point applies to that too I suppose.
2 Likes
This is a good point, thanks! Probably pushnew!
is more useful than unique!
then.
This is exactly the functionality of push!
for a Set
. If you need this functionality then you should use Set
, as @stevengj pointed out. This will be much faster than using a Vector
for Set
-like functionality.
Not necessarily a Set
: there are collections which are ordered, but elements are supposed to be unique (or at least duplicate elements make little sense). Base.LOAD_PATH
is an example.
Sure, you do sort(collect(S)) at the end.
Ah, you mean temporally ordered, I see. But I think it’s a sufficiently uncommon use case that you can just write your one-liner as you did, without adding anything to Base?
Thanks for all the responses. To summarize, the message I take away from this topic is:
-
pushnew!
could make sense for Vector
, and other modifiable collections. I may submit a PR, but I will think more about it. It would simply replace x ∈ c || push!(c, x)
or similar, so I am not sure it is justified.
- Naturally, choosing a collection with faster
in
may provide better results for these operators. But they may still make sense for small vectors, etc.
-
union!
makes less sense for Vector
, since it would need to remove duplicates from the original, as @MikeInnes pointed out. So I got the answer to my original question. Thanks!
1 Like
Yet it’s not a completely useless method IMHO. I would say methods should be provided for all types for which they are applicable, as long as they make sense.
OTOH, as you say, adding a new pushnew!
function makes the API bigger without a very clear need.