Package for partial sorting output iterator

Is there an implementation that partial sorts an iterator without allocating the whole output?

Eg suppose that I need the n largest elements by f from an iterator itr that has N \gg n elements.

partialsort!(collect(itr), n; by f) would allocate a length N vector, while the whole thing can be handled with a length n vector that is kept sorted.

1 Like

One quick implementation would be like:

using DataStructures
function partialsort(itr, n)
    ret = SortedSet{eltype(itr)}()
    for x in itr
        push!(ret, x)
        if length(ret) > n
            pop!(ret)
        end
    end
    ret
end
6 Likes

Perhaps SortedSet can cause problems because it folds identical items (as a mathematical set does). An alternative data structure to use is a heap:

function partialsort2(itr, n)
    ret = BinaryMinMaxHeap{eltype(itr)}()
    for x in itr
        push!(ret, x)
        if length(ret) > n
            popmin!(ret)
        end
    end
    ret
end
6 Likes