julia> S = Set();
julia> e = 1;
julia> push!(S, e)
Set{Any} with 1 element:
1
But I am not aware of a straightforward way to understand if the element e was new to S.
Therefore I wrote a function myself
julia> function push_and_get_status!(S, e)
l = length(S)
push!(S, e)
length(S) == l && return false
return true
end
push_and_get_status! (generic function with 1 method)
julia> push_and_get_status!(S, 2)
true
julia> S
Set{Any} with 2 elements:
2
1
julia> push_and_get_status!(S, true)
false
julia> S
Set{Any} with 2 elements:
2
true
I want to know if there is an established more straightforward method?
One additional thing I happen to discover, which I find weird, is
why is 1 replaced by true?
You see, 1 == true, i.e. when we have owned an equivalent element 1 in S, why do we need to spend the additional effort to replace 1 with true? This is unexpected and doesnât seem rational.
Just like in statistical tests. If there is a primal hypothesis, my inclination is to trust it. Only after I can testify its invalidness, then I might reject it.
Since youâre starting your check prior to the push!, you couldâve checked e in S beforehand. It takes longer than a couple length(S) calls, but the option to skip push! altogether saves more time overall. Benchmarking your method against NonDairyNeutrinoâs with BenchmarkTools.@btime, I get a savings from 12.4ns to 7.2ns.
Set and the underlying Dict compare elements/keys by isequal. isequal(true, 1) is true, so those values override each other. IdDict and IdSet use === instead, so they donât have this issue. Youâre right that a set could have opted out of replacing an equal key, but it wouldnât save much work because the hash, search, and comparison already happened by that point, and the Dict implementation doesnât assume or check for an equal value before replacement.
push_and_get_status!(S, e) = e in S ? false : (push!(S, e); true)
replace the current juliaâs push!(S::Set, e) method (which returns S).
I think your method push_and_get_status!(S, e) will be more useful in practice.
And I donât think e in S is a disjoint operation to the underlying realization of push!(S::Set, e).
It wasnât hard to write push_and_get_status!, and we still have access to the simpler push!. We lose options if we replace simpler building blocks with convenient functions, and convenient functions are only worth adding to the API if itâs difficult to implement well.
I didnât comment on that, and that definitely does have redundancy. If e in S is true, it branches to push!(S, e) that performs the same hash, search, and comparison. Itâs possible that the compiler might have inlined the two and removed the redundancy, but normally we just push! and accept overriding with the most âupdatedâ value or use IdSet if we really need the distinction. In the case where we either must push! or we expect to hit the push! branch most of the time, your length checks would be cheaper (my earlier benchmark mention uses the same element the whole time, so it hit the false branch).
My original opinion is: the current juliaâs push! is programmer-oriented, rather than user-oriented.
As a user, the desired push!(S::Set, element) should first check if the element is new to S, if it is not new, then just return false (the realistic meaning is thatâgiven Iâve owned, I donât want to spare an additional operation to do a swap). If the element is indeed new to S (which equates to element â S), then we properly add element to S, so that the cardinality of S is increased by 1.
I donât think this overriding is meaningful in practice. Does it?
But consider that push! is a function for arbitrary collections, and calls with the same arity share as much behavior as possible despite dispatching to different methods based on type. So, if the behavior for push!(C, e) is to return a boolean value based on whether the element is new, it has to do that for any collection for no real gain. Checking whether an element was new in a Vector is expensive, and push! always increases the length. I shouldâve clarified earlier, but a separate function specifically for Set is more appropriate for what you want.
It could be; of all the equivalent values ever push!-ed into a set, the latest one is more often the preferred outcome than the earliest one. Even before considering whether itâs practical, inserting the input is the documented behavior of push!. âInsert unless it already existsâ is something else that saves little work, and itâs more work for dictionaries where thereâs an associated value to also compare.
Way more, O(1) vs O(n) because of the e â S part. Itâs why we use Set at all, though you might only observe performance gains at larger lengths. Also note that the check and branch is necessary for this maybe-push behavior in the Vector case; the length checks with mandatory push! would not work.
Canât speak for anyone involved, but my guess is that wonât go anywhere because a new function could be in an independent package. The current movement is to reduce the system image (load fewer standard libraries at startup, instead load them upon first import like other packages), and adding more functions to the already imported Base is an irreversible motion in the opposite direction. Based on adjacent discussions, the first thing theyâll do if v2 ever starts serious development is move as much as possible from Base into standard libraries.
I took a look. It seems that the APIs on Set are far from cumbersome. I think such a special method for Set is useful. And I think a proper name is collect!(S, e)::Bool, not knowing whether appropriateđ„Č
Since Julia is a programming language, all users are by definition programmers. Iâve seen people claim this distinction before, and it doesnât make sense to me.
I also donât understand why a âuserâ would want the status returned, but a âprogrammerâ would not.
You are looking for in! - this does exactly what you want.
The existing in canât be changed to return a boolean, because that would be a breaking change. The existing behaviour is useful, because it allows you to do stuff like:
If programmers donât want the status returned, this wouldnât be added.
Also, there is a very big difference between wanting status returned from push! vs from in!, for the latter, when checking in, the status is the point.
I was saying I donât understand why there would be a difference between âprogrammersâ and âusersâ in this case, and that the distinction is not clear or useful.
As Benny said, programmers may focus on these low-level realizations, how they can be done efficiently.
As a user, I only care about the correctness of the logic.
function collect!(S, e)
e â S && return false # you can NOT strictly enrich `S` by adding `e`
push!(S, e)
return true # You can strictly enrich `S` this time
end
For my (optimization) application, the S is a set of cuts. As I definitely want to avoiding adding the same cuts to a model
I can do something like this (my real application code)
PBus1, PBus2 = ı.(mj[:pBus1]), ı.(mj[:pBus2]) # get the column newly generated
if collect!(cols[j], [PBus1, PBus2]) # this vertex is new
con = JuMP.@build_constraint(Ξ[j] †ı(mj[:prim_obj]) + ÎČ â (PBus1 + PBus2))
Threads.lock(() -> push!(rows[j], JuMP.add_constraint(model, con)), my_lock)
out_upd_vec[j] = true # indicating we can strictly enrich the model this time
end