I thought I would ask here before opening an issue. Context: I was running a script that was using modules that I did not want to wrap up as a package, so at the beginning I had something like
push!(LOAD_PATH, expanduser("~/research/some_directory"))
During the morning, LOAD_PATH filled up nicely with about 100 entries. So I thought I would find the equivalent of
pushnew!(collection, items...) = push!(collection, setdiff(items, collection)...)
in Base, but I didn’t. However, I recalled union!, which is basically the same. But it has no method for Vectors, so I can’t do
union!(LOAD_PATH, [my_path])
at the moment.
The question is: is there a deeper design reason for not having either of these methods, or just that no one got around to writing them?
I think you’re looking for append!?
union! would seem to imply eliminating duplicates, which is a very different operation from push!(X, V...) or append!(X, V), and is an expensive operation for arrays; if you need set-union, you’re better off with Set or some other data structure that supports it efficiently.
Nope, I think I was specifically looking for pushnew! above (which many languages with modifiable vectors have, eg cl:pushnew), but was ready to substitute with union!. I do realize it is more expensive than append!, but sometimes you want to just add new elements.
So the question is (before I file an issue): was there a reason for making union! work for Sets, but not Vectors, even though union exists for both?
There’s likely no design decision here, just a lack of motivation for implementing this method. I guess it would make sense to add it for completeness.
Bear in mind that union!(a, b) would also have to remove duplicates from a in order to return the same result as union(a, b) – it’s not quite the same as pushnew!. There’s no unique!(xs) function exported from Base either, but @nalimilan’s point applies to that too I suppose.
2 Likes
This is a good point, thanks! Probably pushnew! is more useful than unique! then.
This is exactly the functionality of push! for a Set. If you need this functionality then you should use Set, as @stevengj pointed out. This will be much faster than using a Vector for Set-like functionality.
Not necessarily a Set: there are collections which are ordered, but elements are supposed to be unique (or at least duplicate elements make little sense). Base.LOAD_PATH is an example.
Sure, you do sort(collect(S)) at the end.
Ah, you mean temporally ordered, I see. But I think it’s a sufficiently uncommon use case that you can just write your one-liner as you did, without adding anything to Base?
Thanks for all the responses. To summarize, the message I take away from this topic is:
-
pushnew! could make sense for Vector, and other modifiable collections. I may submit a PR, but I will think more about it. It would simply replace x ∈ c || push!(c, x) or similar, so I am not sure it is justified.
- Naturally, choosing a collection with faster
in may provide better results for these operators. But they may still make sense for small vectors, etc.
-
union! makes less sense for Vector, since it would need to remove duplicates from the original, as @MikeInnes pointed out. So I got the answer to my original question. Thanks!
1 Like
Yet it’s not a completely useless method IMHO. I would say methods should be provided for all types for which they are applicable, as long as they make sense.
OTOH, as you say, adding a new pushnew! function makes the API bigger without a very clear need.