Fastest sorted container (search) challenge, and new 2021 search algorithm

Julia provides binary search, of sorted arrays. But simple sorted arrays aren’t the best data structure (always).

Before going into that, binary search is O(log n), there’s interpolation search for O(log log n), but then there’s this new algorithm I discovered, a hybrid of those two:

https://www.sciencedirect.com/science/article/pii/S221509862100046X

Likely because worst case of interpolation search is O(n). But this IBS is better than binary search and interpolation search:

The asymptotic analysis showed that the average-case and the worst-case complexity of the binary search is O(log2 n). The complexity of the binary search is independent of the distribution of keys […] When IBS runs on uniform distributed keys, the average case complexity of IBS is equal to the complexity of the interpolation search, which is O(log2 log2 n). […]
Similar to binary search, worst-case complexity of IBS is still O(log2 log2 n) within non-uniform distributions

But time complexity alone doesn’t really matter, there are also cache effects. I think those will not trump IBS despite this older method:

The fastest I knew of, so far, is the alternative array layout:

Eytzinger Binary Search - Algorithmica
https://arxiv.org/pdf/1509.05053.pdf

Eytzinger (BFS) layout normally used for heaps, an implicit B-tree layout that
generalizes the Eytzinger layout, and the van Emde Boas layout commonly used in the
cache-oblivious algorithms literature.

Possibly it or some other (see the paper) could work for this new algorithm. That layout fits binary search, and binary search fits binary trees, but something like a B-tree might be better.

Bonus points would be if the data structure could also be a persistent data structure, and also fast… What packages are already existing for any of this in Julia, or other language (that we could maybe reuse)?

One other idea I have is a partially sorted container. I’m not sure it can work for a tree, but looking at sorting, it’s often done (e.g. in databases), but then you might not use the data in the container, e.g. for a top-N query. You can sort with quickselect, and might not need to sort the full container ever…

1 Like