Taking `push!` and `pop!` seriously

TL;DR: currently, push! and pop! conflate two different meanings (adding/removing elements to unordered/sorted collections vs. adding elements to the end of Vectors, Stacks, Deques, etc). These meanings should be split into different functions in Julia 2.0.

Background

DataStructures.jl implements a number of data structures, including Stack, Queue, Deque, and CircularDeque, which are meant to be more performant implementations of stacks and queues (and deques) than just using Vectors.

Stack, Deque, and CircularDeque all use the same functions as base (push!, pop!, pushfirst!, and popfirst!). Queue has never been moved over to that interface, and still uses enqueue! and dequeue!

We’re having a discussion in DataStructures.jl around moving Queue to the standard interface.

Currently, we have the following:

Data structure push! pop! pushfirst! popfirst! enqueue! dequeue!

Base

Vector :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
Set :heavy_check_mark: :heavy_check_mark:
Dict :heavy_check_mark: :heavy_check_mark:

DataStructures.jl

Stack :heavy_check_mark: :heavy_check_mark:
Deque :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
CircularQueue :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
OrderedDict, OrderedSet :heavy_check_mark: :heavy_check_mark:
SortedDict, SortedSet :heavy_check_mark: :heavy_check_mark:
─────────── ───── ───── ───── ───── ───── ─────
Queue, PriorityQueue (current) :heavy_check_mark: :heavy_check_mark:
Queue, PriorityQueue (DS proposal A) :heavy_check_mark: :heavy_check_mark:
Queue, PriorityQueue (DS proposal B) :heavy_check_mark: :heavy_check_mark:

:heavy_check_mark:= implemented
◯ = not implemented, but could/should be
────────────────────────────────────────

From this, we can see that push! and pop! have two different meanings.

  1. For unordered and sorted collections, they simply mean to add or remove elements from the collection.
  2. For every other collection listed, they (necessarily) mean to add or remove elements from the end of the collection, and they carry their traditional meaning.

There are two proposals for the Queue interface

“DS Proposal A” above suggests that Queue should be treated like Set, Dict, and SortedDict/SortedSet, (and perhaps Stack), since there’s only one way to add things to the collection.

“DS Proposal B” above suggests that Queue should be treated like a Vector, Deque, CircularDeque, (and perhaps OrderedDict, OrderedSet, and Stack), since there’s a distinction as to whether things are added to the front or back of the collection.

Both are valid perspectives, and this suggests that the fact that push! and pop! are used for adding/removing elements from unordered and sorted collections should be revisited.

Proposal 1

For Julia 2.0, pop! and push! should be deprecated for Dicts, Sets (and SortedDict/SortedSets), and a separate function for adding/removing elements into unordered or sorted collections should be implemented. Possibilities include:

  • push!(s::Set, elem)
    • add!(s, elem) # this was the name in Julia 0.1!
    • insert!(s, elem)
  • pop!(s::Set, elem)remove!(s, elem)

For delete!, see item #1 under Notes.

Proposal 2

Switch to GitHub - andyferris/Dictionaries.jl: An alternative interface for dictionaries in Julia, for improved productivity and performance as the main dictionary interface. This would be a much larger discussion, but has the benefit here that push! and pop! would not be defined on unordered collections.

Notes

  1. delete!(s, elem) already exists; it returns the updated collection, as opposed to pop! (currently), which returns the removed element:
    julia> s = Set([1,2,3])
    Set([2, 3, 1])
    
    julia> pop!(s)
    2
    
    julia> pop!(s,3)
    3
    
    julia> delete!(s, 1)
    Set(Int64[])
    

References

Original discussion/PR where push! and pop! were originally defined on Dict and Set:


Kevin

16 Likes

Thanks for the detailed writeup and bringing up this issue.

I think that

  1. just adding an element to a collection is a without being concerned about the “location” (eg unordered collections) is a valid use case, for which we have push!.

    I would keep push! with the following contract:

    push!(collection, item) # assume types match or can be converted
    @assert item ∈ collection
    

    but no other promise. Vectors, queues, dequeues etc could all support this, with the understanding that they just put item where they like.

  2. Types for which it makes sense could support the following contract:

    pushfirst!(collection, item)
    collection[firstindex(collection)] == item
    

    and similarly for pushlast!/lastindex.

  3. pop! reverses push!, popfirst! reverses pushfirst!, etc. Each only needs to be defined when the corresponding method is defined. The contract is

    collection2 = copy(collection)
    push!(collection, item)
    pop!(collection) == item
    collection == collection2
    

    and similarly for each (when applicable).

    When this is not possible (eg for Set), the pop*! methods should not be defined.. (edit, thanks @mauro3).

  4. Dictionaries and other collections which use indexing by keys should be handled by a separate interface, possibly GitHub - andyferris/Dictionaries.jl: An alternative interface for dictionaries in Julia, for improved productivity and performance.

    pop! by keys always felt weird, my understanding is that it is an optimization for avoiding lookup twice.

From a practical perspective, my understanding is that 2.0 is not on the immediate horizon, so maybe packages are the best way to explore this design space.

6 Likes

@Tamas_Papp: but for un-ordered collections pop! cannot be the inverse of push! unless you keep track of the insertion order (which makes it an ordered collection). Example:

s::Set
push!(s, an_item)
pop!(s) # to return an_item the Set s would need to keep track of insertion order, which would thus be an OrderedSet
1 Like

Excellent point, I edited the suggestion above.

It may look strange at first, but may be it would make sense to have pop! for un-ordered collection, just it should have no promises of what element is returned? Something like this

function pop!(s::Set)
    el = first(s)
    delete!(s, el)
    el
end

s = Set([1, 2])
push!(s, 3) # s == Set([1, 2, 3])
pop!(s)     # 2; s == Set([1, 3])

I am trying to imagine scenario, when it can be useful. Definitely it should be something, when order of element processing is irrelevant. May be for example multi thread application? One may have set of unique objects, which should be processed in different threads, but order in which they’ll be processed does not matter. Of course one can collect Vector of some objects and at some point he could apply unique function to them, or he can have a Set which should be collect later, but it looks unnecessary if you may just pop! from the Set.

2 Likes

We have iteration for this.

I don’t think that destructively consuming an unordered collection is a very common scenario.

You could not pop same element in two different threads. How would you like to implement it with iteration?

Note that pop! for Set (which is a Dict) is not thread safe at the moment, and it unlikely to ever be. If I wanted to consume an unordered collection by threads, I would just design an appropriate data structure for this (with locks etc).

In any case, I think that this use case (mutation of unordered containers by multiple threads) is a red herring, and just distracts from the main API.

Personally, I would prefer a clean, simple API, with some clear contracts, even if it does not support all possible combinations one can think of — in fact, I would consider not supporting some methods an advantage when it cannot be done in a consistent way.

4 Likes

Does it make sense to add popany!/pushany! to make it clear that you don’t care from/to where the element is popped/added? In this scenario, we’d deprecate pop!/push! for unordered containers to enforce at-the-end semantics.

Alternatively, pop!/push! can keep the current maybe-anywhere semantics and we can add poplast!/pushlast! for strict at-the-end semantics.

5 Likes

Thanks for interesting link about thread safety in Julia! :slight_smile:

In that case what about something like

pop_r!(c::Container, s::Semafor)  # see strtok_r in C for motivation in name

?

IMO it still has place in discussion about general containers API.

I fully agree. Just have to add that missing function which could be implemented is inconsistency too!

pop!(s:Set) from my POV has same logic as is in getting first element from for i in set.

1 Like

I vote for alternative because it is backward compatible.

2 Likes

Thank you to everyone for the replies! I like Tamas’s proposal. I think there are just a few corner cases that need to be answered.

I rather like the explicit addition of pushlast! and poplast!. This should be non-breaking (it’s just adding functionality), so if accepted, it could go in to a 1.x version of Julia.

The main question being debated in the DataStructures.jl issue I referenced is what to do with Queue. If we follow Tamas’s proposal, then we should definitely implement pushfirst! and poplast!.

However, should we also implement

  • push! (as an alias for pushfirst!) and
  • pop! (as an alias for poplast!)?

This slightly conflicts with the suggestion that

since for a Queue, they don’t exactly reverse one another.

My inclination is that implementing push! and pop! for Queues is fine, and to just clarify the “reverses” wording here.

If there aren’t any more comments, I’ll move this to an issue in Julia at some point.

Cheers!
Kevin

4 Likes

For Queue, I would suggest a pushfirst! and poplast!, both without an “inverse”.

1 Like

Queue has only one way in and one way out, so push! and pop! make sense to me.
Otherwise we end up making the code less generic.

4 Likes

Channel defines push! and popfirst! but not pop! or pushfirst!. So, it is strange to define push!/pushfirst! when popfirst!/pop! does not work, if you want to make your API close to Base.

But if you add poplast!/pushlast! to the generic collection API, I think it makes sense for push! and pop! to be callable for any type that defines *first! or *last!. They become location agnostic API where the target collection can chose its favorite location.

push! is useful for defining generic “collect

for x in input
    push!(sink, f(x))
end

and pop! is useful for defining generic “iterator”

while !isempty(source)
    f(pop!(source))
end
6 Likes

I think that to maintain current semantics for ordered collections, this requires that push! inserts last, so it cannot choose an arbitrary location.

I usually think of code being generic in relation to an API. There are various trade-offs here about what that API specifies (does insertion happen at a particular location or order? does pop! deliver a particular element, or any one will do?), but I am not sure it is common to have a use case where eg FIFO and LIFO collections would otherwise be interchangeable.

Maybe traits traits that describe various assumptions would make sense? But perhaps that’s just overcomplicating it.

As far as I understand, a common use case is choosing breadth-first search or depth-first search when exploring a tree, e.g. in a branch-and-bound algorithm for global optimization.

4 Likes

That’s why I suggested to add pushany! and popany! in my first comment. I think either pushany!/popany! or pushlast!/poplast! are required if we want the “anywhere” API. Those two choices both have pros and cons and I think both of them are reasonable.

2 Likes

Catching up here… :slight_smile: Great discussion!

Just another thought - the word “pop” implies taking something off the top, and so it feels like dealing with a stack. Shall we consider “pull”?

I fully agree with properties 1, 2 and 4. Regarding 3, I think we have a “language problem” here. In many ways (e.g. for Vector), push! and pop! are used to refer to a stack, i.e. a LIFO data collection. Therefore it makes sense to have popfirst! monkeypatch FIFO access onto a stack. pushfirst! works with a similar effect, but to me starts to feel kind of redundant.

In other ways (e.g. for AbstractChannel), push! and pop! refer to some fuzzy or “natural” way to add and retrieve objects from a collection. I think it’s worth considering to use different names for this, e.g. put!/take!, and use a language closer to (some) textbooks for FIFO access:

  • push! and pop!: LIFO
  • enqueue! and dequeue!: FIFO
  • put! and take!: priority queue, heap, etc

Then, individual data structures can default to put!(stack, x) = push!(stack, x) or put!(queue, x) = enqueue!(queue, x) etc. Applications that require a certain access pattern can then use the proper language. Combinations like push!+dequeue! would (rightfully so) look odd.

This differs from the pushany!/popany! suggested by @tkf in that instead of “no contract” put!/take! would fulfil the “natural contract”, if it exists. If there is no “natural contract”, they should not be defined.

push*! and pop*! (* = first|last) would not exist. However, the fuzzy put! and take! could be extended to put*! and take*! for ordered collections, thus fulfilling a new version of property 2.
(Edit: this way, should this be considered, it would be possible to change this even in v1.x by letting pop*! and push*! keep their “legacy” meaning.)

A big downside of this is that we would have to decide what to do with Vector. As mentioned by @dpsanders, it is effectively a collection supporting both FIFO and LIFO, so I would argue that it should have no (fuzzy) put! and take!. So far, so good. However, the current pushfirst! would then be enqueue!, which loses its tight connection to push!. The benefit of this is that it forces users to think more carefully about the access pattern and less about the data layout of the data structures they use. Making the operations efficient that a user needs, should be the job of the data structure (Edit: e.g. a Vector as an efficient deque).

5 Likes