I created a variable of type BitSet and used setdiff! to remove elements from it one at a time until it was empty. This took linear time as a function of the set size and the constant was about 10 cycles per element (on an x64 machine). This makes since. If the set were implemented as a bit array, deletions could be fast. But it implies that isempty is not scanning the bit array to decide if the set is empty. It must be consulting a hidden variable that keeps track of the number of elements in the set. Because length works on BiSet types (though I can’t find this in the documentation) the existence of such a hidden variable seems natural.
But then it occurred to me that pop! must be O(n^2). If a set is mostly empty after popping an element it might be necessary to scan a large portion of the set to find the next element to pop.
But still trying to figure out simple things in Julia, I timed alternating between removing a random element using setdiff! and popping using pop! . It is linear, with a constant of 13 or 14 cycles per element. This is too fast. It violates theory for the assumed data structure.
The implication is that the underlying data structure must be significantly richer than assumed. There must be something like a hidden queue. The queue structure must point to elements in the bit array and there must be an index for finding elements in the queue when they are removed from the set, so they can also be removed from the queue.
Is this true ???
If so it is both cool and disturbing. It is fast! But I was in the process of creating the auxiliary structures required to make it O(n) or O(n ln(n)). These structures now seem redundant with existing (likely faster) hidden structures.
I am wandering:
- What was the direct way to figure this out?
- Am I doing something similarly wrong with other data structures?
- Is this a serious memory problem when I don’t need to push or pop?
- Is it possible to add the hidden queue only when it is realized that pop will be used on this “object” … this seems likely to create new issues, but …
The plot is:
Should I upload the code?