Receive the success status of `push!(Set, element)`

I can push! an element to a Set.

julia> S = Set();

julia> e = 1;

julia> push!(S, e)
Set{Any} with 1 element:
  1

But I am not aware of a straightforward way to understand if the element e was new to S.
Therefore I wrote a function myself

julia> function push_and_get_status!(S, e)
           l = length(S)
           push!(S, e)
           length(S) == l && return false
           return true
       end
push_and_get_status! (generic function with 1 method)

julia> push_and_get_status!(S, 2)
true

julia> S
Set{Any} with 2 elements:
  2
  1

julia> push_and_get_status!(S, true)
false

julia> S
Set{Any} with 2 elements:
  2
  true

I want to know if there is an established more straightforward method?

One additional thing I happen to discover, which I find weird, is

  • why is 1 replaced by true?

You see, 1 == true, i.e. when we have owned an equivalent element 1 in S, why do we need to spend the additional effort to replace 1 with true? This is unexpected and doesn’t seem rational.

Just like in statistical tests. If there is a primal hypothesis, my inclination is to trust it. Only after I can testify its invalidness, then I might reject it.

You can just use in! Even as a one-liner:

push_and_get_status!(S, e) = e in S ? false : (push!(S, e); true)
2 Likes

Since you’re starting your check prior to the push!, you could’ve checked e in S beforehand. It takes longer than a couple length(S) calls, but the option to skip push! altogether saves more time overall. Benchmarking your method against NonDairyNeutrino’s with BenchmarkTools.@btime, I get a savings from 12.4ns to 7.2ns.

Set and the underlying Dict compare elements/keys by isequal. isequal(true, 1) is true, so those values override each other. IdDict and IdSet use === instead, so they don’t have this issue. You’re right that a set could have opted out of replacing an equal key, but it wouldn’t save much work because the hash, search, and comparison already happened by that point, and the Dict implementation doesn’t assume or check for an equal value before replacement.

2 Likes

So, my point is, why can’t

push_and_get_status!(S, e) = e in S ? false : (push!(S, e); true)

replace the current julia’s push!(S::Set, e) method (which returns S).
I think your method push_and_get_status!(S, e) will be more useful in practice.

And I don’t think e in S is a disjoint operation to the underlying realization of push!(S::Set, e).

It wasn’t hard to write push_and_get_status!, and we still have access to the simpler push!. We lose options if we replace simpler building blocks with convenient functions, and convenient functions are only worth adding to the API if it’s difficult to implement well.

1 Like

So you mean that this function has no redundancy. If this is true, then this topic is accomplished. Thanks!

I didn’t comment on that, and that definitely does have redundancy. If e in S is true, it branches to push!(S, e) that performs the same hash, search, and comparison. It’s possible that the compiler might have inlined the two and removed the redundancy, but normally we just push! and accept overriding with the most “updated” value or use IdSet if we really need the distinction. In the case where we either must push! or we expect to hit the push! branch most of the time, your length checks would be cheaper (my earlier benchmark mention uses the same element the whole time, so it hit the false branch).

My original opinion is: the current julia’s push! is programmer-oriented, rather than user-oriented.

As a user, the desired push!(S::Set, element) should first check if the element is new to S, if it is not new, then just return false (the realistic meaning is that—given I’ve owned, I don’t want to spare an additional operation to do a swap). If the element is indeed new to S (which equates to element ∉ S), then we properly add element to S, so that the cardinality of S is increased by 1.

I don’t think this overriding is meaningful in practice. Does it?

But consider that push! is a function for arbitrary collections, and calls with the same arity share as much behavior as possible despite dispatching to different methods based on type. So, if the behavior for push!(C, e) is to return a boolean value based on whether the element is new, it has to do that for any collection for no real gain. Checking whether an element was new in a Vector is expensive, and push! always increases the length. I should’ve clarified earlier, but a separate function specifically for Set is more appropriate for what you want.

It could be; of all the equivalent values ever push!-ed into a set, the latest one is more often the preferred outcome than the earliest one. Even before considering whether it’s practical, inserting the input is the documented behavior of push!. “Insert unless it already exists” is something else that saves little work, and it’s more work for dictionaries where there’s an associated value to also compare.

2 Likes

So you mean that if I define

mypush!(S, e) = e ∈ S ? false : (push!(S, e); true);

Then doing this

julia> S = Set(); # Set

julia> for i = 1:10000
           mypush!(S, i)
       end

julia> mypush!(S, true)
false

is better than (more efficient than) the Vector counterpart (as follows)?

julia> S = []; # Vector

julia> for i = 1:10000
           mypush!(S, i)
       end

julia> mypush!(S, true)
false

(Note that both code snippet can achieve my aim, and the latter Vector arrangment appears more convenient.)

Way more, O(1) vs O(n) because of the e ∈ S part. It’s why we use Set at all, though you might only observe performance gains at larger lengths. Also note that the check and branch is necessary for this maybe-push behavior in the Vector case; the length checks with mandatory push! would not work.

2 Likes

I opened an issue A more user-friendly function enhancing the `push!(S::Set, element)` method? · Issue #59084 · JuliaLang/julia · GitHub

Can’t speak for anyone involved, but my guess is that won’t go anywhere because a new function could be in an independent package. The current movement is to reduce the system image (load fewer standard libraries at startup, instead load them upon first import like other packages), and adding more functions to the already imported Base is an irreversible motion in the opposite direction. Based on adjacent discussions, the first thing they’ll do if v2 ever starts serious development is move as much as possible from Base into standard libraries.

4 Likes

I took a look. It seems that the APIs on Set are far from cumbersome. I think such a special method for Set is useful. And I think a proper name is collect!(S, e)::Bool, not knowing whether appropriateđŸ„Č

Since Julia is a programming language, all users are by definition programmers. I’ve seen people claim this distinction before, and it doesn’t make sense to me.

I also don’t understand why a ‘user’ would want the status returned, but a ‘programmer’ would not.

2 Likes

You are looking for in! - this does exactly what you want.

The existing in can’t be changed to return a boolean, because that would be a breaking change. The existing behaviour is useful, because it allows you to do stuff like:

with_element = push!(copy(my_set), element)
8 Likes

they do. Apparently someone has proposed this idea before me (Since the in! function was added in Julia 1.11).

Yes, exactly.

julia> collect!(S, e) = !in!(e, S)
collect! (generic function with 1 method)

julia> S = Set();

julia> collect!(S, 1) # success? Yes
true

julia> collect!(S, 2) # success? Yes
true

julia> S
Set{Any} with 2 elements:
  2
  1

julia> collect!(S, true) # success? No
false

julia> S
Set{Any} with 2 elements:
  2
  1

But it might be a pity I cannot spot this in! function in the doc.

You cut my sentence in half, I said

If programmers don’t want the status returned, this wouldn’t be added.

Also, there is a very big difference between wanting status returned from push! vs from in!, for the latter, when checking in, the status is the point.

I was saying I don’t understand why there would be a difference between ‘programmers’ and ‘users’ in this case, and that the distinction is not clear or useful.

1 Like

As Benny said, programmers may focus on these low-level realizations, how they can be done efficiently.
As a user, I only care about the correctness of the logic.

function collect!(S, e)
    e ∈ S && return false # you can NOT strictly enrich `S` by adding `e`
    push!(S, e)
    return true # You can strictly enrich `S` this time
end

For my (optimization) application, the S is a set of cuts. As I definitely want to avoiding adding the same cuts to a model

julia> model = JuMP.Model();

julia> @constraint(model, x[1] + 2x[2] + 3x[3] ≀ 4)
x[1] + 2 x[2] + 3 x[3] <= 4

julia> @constraint(model, x[1] + 2x[2] + 3x[3] ≀ 4) # This is undesired
x[1] + 2 x[2] + 3 x[3] <= 4

julia> print(model)
Feasibility
Subject to
 x[1] + 2 x[2] + 3 x[3] <= 4
 x[1] + 2 x[2] + 3 x[3] <= 4

I can do something like this (my real application code)

PBus1, PBus2 = ı.(mj[:pBus1]), ı.(mj[:pBus2]) # get the column newly generated
if collect!(cols[j], [PBus1, PBus2]) # this vertex is new
    con = JuMP.@build_constraint(Ξ[j] ≀ ı(mj[:prim_obj]) + ÎČ â‹… (PBus1 + PBus2))
    Threads.lock(() -> push!(rows[j], JuMP.add_constraint(model, con)), my_lock)
    out_upd_vec[j] = true # indicating we can strictly enrich the model this time
end