Why is there no union!(::Vector,::Vector)?

Tamas_Papp · January 9, 2017, 1:04pm

I thought I would ask here before opening an issue. Context: I was running a script that was using modules that I did not want to wrap up as a package, so at the beginning I had something like

push!(LOAD_PATH, expanduser("~/research/some_directory"))

During the morning, LOAD_PATH filled up nicely with about 100 entries. So I thought I would find the equivalent of

pushnew!(collection, items...) = push!(collection, setdiff(items, collection)...)

in Base, but I didn’t. However, I recalled union!, which is basically the same. But it has no method for Vectors, so I can’t do

union!(LOAD_PATH, [my_path])

at the moment.

The question is: is there a deeper design reason for not having either of these methods, or just that no one got around to writing them?

stevengj · January 9, 2017, 1:06pm

I think you’re looking for append!?

union! would seem to imply eliminating duplicates, which is a very different operation from push!(X, V...) or append!(X, V), and is an expensive operation for arrays; if you need set-union, you’re better off with Set or some other data structure that supports it efficiently.

Tamas_Papp · January 9, 2017, 1:27pm

Nope, I think I was specifically looking for pushnew! above (which many languages with modifiable vectors have, eg cl:pushnew), but was ready to substitute with union!. I do realize it is more expensive than append!, but sometimes you want to just add new elements.

So the question is (before I file an issue): was there a reason for making union! work for Sets, but not Vectors, even though union exists for both?

nalimilan · January 11, 2017, 9:59am

There’s likely no design decision here, just a lack of motivation for implementing this method. I guess it would make sense to add it for completeness.

MikeInnes · January 11, 2017, 10:07am

Bear in mind that union!(a, b) would also have to remove duplicates from a in order to return the same result as union(a, b) – it’s not quite the same as pushnew!. There’s no unique!(xs) function exported from Base either, but @nalimilan’s point applies to that too I suppose.

Tamas_Papp · January 11, 2017, 10:55am

This is a good point, thanks! Probably pushnew! is more useful than unique! then.

dpsanders · January 11, 2017, 3:33pm

This is exactly the functionality of push! for a Set. If you need this functionality then you should use Set, as @stevengj pointed out. This will be much faster than using a Vector for Set-like functionality.

Tamas_Papp · January 11, 2017, 3:48pm

Not necessarily a Set: there are collections which are ordered, but elements are supposed to be unique (or at least duplicate elements make little sense). Base.LOAD_PATH is an example.

dpsanders · January 11, 2017, 4:47pm

Sure, you do sort(collect(S)) at the end.

dpsanders · January 11, 2017, 4:48pm

Ah, you mean temporally ordered, I see. But I think it’s a sufficiently uncommon use case that you can just write your one-liner as you did, without adding anything to Base?

yuyichao · January 11, 2017, 4:54pm

http://datastructuresjl.readthedocs.io/en/latest/ordered_containers.html

Tamas_Papp · January 12, 2017, 10:09am

Thanks for all the responses. To summarize, the message I take away from this topic is:

pushnew! could make sense for Vector, and other modifiable collections. I may submit a PR, but I will think more about it. It would simply replace x ∈ c || push!(c, x) or similar, so I am not sure it is justified.
Naturally, choosing a collection with faster in may provide better results for these operators. But they may still make sense for small vectors, etc.
union! makes less sense for Vector, since it would need to remove duplicates from the original, as @MikeInnes pointed out. So I got the answer to my original question. Thanks!

nalimilan · January 12, 2017, 12:49pm

Yet it’s not a completely useless method IMHO. I would say methods should be provided for all types for which they are applicable, as long as they make sense.

OTOH, as you say, adding a new pushnew! function makes the API bigger without a very clear need.

Topic		Replies	Views
Best data structure for fast unions of large sets of integers Performance performance , set , datastructures	56	1019	May 6, 2024
Declaring an empty set New to Julia	7	1257	October 28, 2022
Why doesn't `vect` return a union-typed result if the number of elements is small? Internals & Design performance , inference , type	3	615	September 13, 2022
Receive the success status of `push!(Set, element)` New to Julia question , set	24	317	July 25, 2025
Taking `push!` and `pop!` seriously Internals & Design proposal	19	9146	November 5, 2021

Why is there no union!(::Vector,::Vector)?

Related topics