Why is there no union!(::Vector,::Vector)?

question

#1

I thought I would ask here before opening an issue. Context: I was running a script that was using modules that I did not want to wrap up as a package, so at the beginning I had something like

push!(LOAD_PATH, expanduser("~/research/some_directory"))

During the morning, LOAD_PATH filled up nicely with about 100 entries. So I thought I would find the equivalent of

pushnew!(collection, items...) = push!(collection, setdiff(items, collection)...)

in Base, but I didn’t. However, I recalled union!, which is basically the same. But it has no method for Vectors, so I can’t do

union!(LOAD_PATH, [my_path])

at the moment.

The question is: is there a deeper design reason for not having either of these methods, or just that no one got around to writing them?


#2

I think you’re looking for append!?

union! would seem to imply eliminating duplicates, which is a very different operation from push!(X, V...) or append!(X, V), and is an expensive operation for arrays; if you need set-union, you’re better off with Set or some other data structure that supports it efficiently.


#3

Nope, I think I was specifically looking for pushnew! above (which many languages with modifiable vectors have, eg cl:pushnew), but was ready to substitute with union!. I do realize it is more expensive than append!, but sometimes you want to just add new elements.

So the question is (before I file an issue): was there a reason for making union! work for Sets, but not Vectors, even though union exists for both?


#4

There’s likely no design decision here, just a lack of motivation for implementing this method. I guess it would make sense to add it for completeness.


#5

Bear in mind that union!(a, b) would also have to remove duplicates from a in order to return the same result as union(a, b) – it’s not quite the same as pushnew!. There’s no unique!(xs) function exported from Base either, but @nalimilan’s point applies to that too I suppose.


#6

This is a good point, thanks! Probably pushnew! is more useful than unique! then.


#7

This is exactly the functionality of push! for a Set. If you need this functionality then you should use Set, as @stevengj pointed out. This will be much faster than using a Vector for Set-like functionality.


#8

Not necessarily a Set: there are collections which are ordered, but elements are supposed to be unique (or at least duplicate elements make little sense). Base.LOAD_PATH is an example.


#9

Sure, you do sort(collect(S)) at the end.


#10

Ah, you mean temporally ordered, I see. But I think it’s a sufficiently uncommon use case that you can just write your one-liner as you did, without adding anything to Base?


#11

http://datastructuresjl.readthedocs.io/en/latest/ordered_containers.html


#12

Thanks for all the responses. To summarize, the message I take away from this topic is:

  1. pushnew! could make sense for Vector, and other modifiable collections. I may submit a PR, but I will think more about it. It would simply replace x ∈ c || push!(c, x) or similar, so I am not sure it is justified.
  2. Naturally, choosing a collection with faster in may provide better results for these operators. But they may still make sense for small vectors, etc.
  3. union! makes less sense for Vector, since it would need to remove duplicates from the original, as @MikeInnes pointed out. So I got the answer to my original question. Thanks!

#13

Yet it’s not a completely useless method IMHO. I would say methods should be provided for all types for which they are applicable, as long as they make sense.

OTOH, as you say, adding a new pushnew! function makes the API bigger without a very clear need.