Receive the success status of `pop!(Set, element)`

Continuing the discussion from Receive the success status of `push!(Set, element)`:

The following defined method

# collect `e` into `S`, return success status
collect!(S, e) = !in!(e, S)

should have an inverse, which I’d like to name it rm! provisionally.
Since I am not aware if there is an established one presently, I write

# remove `e` from `S`, return success status
function rm!(S, e)
    try
        pop!(S, e)
    catch e
        e isa KeyError && return false
    end
    return true
end

I don’t know if it is appropriate (is there a better implementation?). The test looks like

julia> S = Set(0);

julia> collect!(S, true) # success
true

julia> S # `1` is collected
Set{Int64} with 2 elements:
  0
  1

julia> rm!(S, false) # success
true

julia> S # `0` is removed
Set{Int64} with 1 element:
  1

julia> rm!(S, false) # fail
false

julia> S # `S` is unaltered
Set{Int64} with 1 element:
  1

Edit: may skip the ERROR type check

# remove `e` from `S`, return success status
function rm!(S, e)
    try
        pop!(S, e)
    catch
        return false
    end
    return true
end

A try-catch approach is not very efficient:

julia> @btime rm!($(Set(0)), 0)
  4.500 μs (1 allocation: 16 bytes)
false

julia> @btime rm!($(Set(0)), 1)
  4.571 μs (1 allocation: 16 bytes)
false

Just using in is much better:

function rm2!(S, e)
    if e in S
        pop!(S, e)
        return true
    else
        return false
    end
end
julia> S = Set([0, true]); rm2!(S, false)
true

julia> S
Set{Int64} with 1 element:
  1

julia> rm2!(S, false)
false

julia> S
Set{Int64} with 1 element:
  1
julia> @btime rm2!($(Set(0)), 0)
  4.000 ns (0 allocations: 0 bytes)
false

julia> @btime rm2!($(Set(0)), 1)
  5.800 ns (0 allocations: 0 bytes)
false

julia> S0 = Set(rand(UInt8, 128)); es = rand(UInt8, 16); 

julia> count(e -> e in S0, es)
8

julia> length(unique(es))
16

julia> @btime foreach(e -> rm!(S, e), $es) setup=(S=copy($S0));
  37.100 μs (8 allocations: 128 bytes)

julia> @btime foreach(e -> rm2!(S, e), $es) setup=(S=copy($S0));
  82.798 ns (0 allocations: 0 bytes)
3 Likes

what about

function rm!(S, e)
       isnothing(pop!(S, e, nothing)) && return false
       return true
end

if nothing is never involved in the Set.

1 Like

Yes, that should work and be efficient. Note that you can write it as

rm3!(S, e) = !isnothing(pop!(S, e, nothing))
julia> @btime foreach(e -> rm3!(S, e), $es) setup=(S=copy($S0));
  97.789 ns (0 allocations: 0 bytes)

I would actually have expected it to be slightly faster than rm2!, as you skip one in-like search inside of pop! when e \in S, but apparently not. Perhaps because the ternary pop! returns a Union-type?


If you modify the pop! source code (making use of internal methods) to return a Bool, something which should be faster than any approach wrapping pop!

function rm4!(s::Set, x)
    index = Base.ht_keyindex(s.dict, x)
    index < 1 && return false
    Base._delete!(s.dict, index)
    return true
end

I still get the same timing as rm2!, at least for the test above (with the same S0 and es)

julia> @btime foreach(e -> rm4!(S, e), $es) setup=(S=copy($S0));
  83.884 ns (0 allocations: 0 bytes)

So rm2! seems pretty good.

2 Likes

Thank you for the tests!

I think the ternary pop! is convenient for the user, and thus somewhat rendered my this topic purposeless. (in practice, if we are sure that the returned element won’t be some certain value e.g. 0, then it can be in the place of nothing). So we don’t need to wrap it within a user-defined function like rm3!.

Another advantage of the ternary pop! is that it is usable for Dict either

julia> a = Dict(i => 10i for i = 1:2)
Dict{Int64, Int64} with 2 entries:
  2 => 20
  1 => 10

julia> pop!(a, 2, 0) # if we are sure `0` ∉ keys(a)
20

julia> pop!(a, 2, 0) # if we are sure `0` ∉ keys(a)
0

julia> a
Dict{Int64, Int64} with 1 entry:
  1 => 10

If you’re interested in making sets (and dicts) more efficient, then have a look at the proposed “token interfaces”. The idea is that you can query the set for a key, and then you get a token back. This token then contain the information about the presence of they key.

So, the interface would be something like:

s = Set([1,2,3,4]) # make a set
token = gettoken(s, 3)
if ispresent(token)
    delete!(s, token)
    true
else 
    false
end

This API could then be used for all these operations where you do two or more queries using the same key.
You could try to make a PR and see where that gets you. Some relevant issues: Define a token API for Associative types · Issue #24454 · JuliaLang/julia · GitHub, Feature request: Intermediate API for updating collections, especially `AbstractDict` · Issue #31199 · JuliaLang/julia · GitHub, Add modify! function for lookup/update/insert/delete in one go by tkf · Pull Request #33758 · JuliaLang/julia · GitHub

4 Likes

Interesting that’s the case. Cursory look at @code_llvm suggests rm2! doesn’t inline pop! (even if I contribute @inline to the call), so I wouldn’t expect it to be able to remove redundant code. In any case, I’d prefer eliminating redundancies on the semantic level, maybe with the new API above.

I’d never be comfortable with a requirement to never put an element into a set. In some cases the type system can separate the signal value’s type from the set element type, but not always e.g. Set{Any}. If you really want to go this direction, I’d be more comfortable if you made an entirely new and internal type documented to never be put into a collection. I definitely prefer the alternatives that don’t contribute another return type to the call; even nothing does that.