Is there an implementation that partial sorts an iterator without allocating the whole output?
Eg suppose that I need the n
largest elements by f
from an iterator itr
that has N \gg n elements.
partialsort!(collect(itr), n; by f)
would allocate a length N
vector, while the whole thing can be handled with a length n
vector that is kept sorted.
1 Like
One quick implementation would be like:
using DataStructures
function partialsort(itr, n)
ret = SortedSet{eltype(itr)}()
for x in itr
push!(ret, x)
if length(ret) > n
pop!(ret)
end
end
ret
end
6 Likes
Dan
July 2, 2025, 2:27pm
3
Perhaps SortedSet
can cause problems because it folds identical items (as a mathematical set does). An alternative data structure to use is a heap:
function partialsort2(itr, n)
ret = BinaryMinMaxHeap{eltype(itr)}()
for x in itr
push!(ret, x)
if length(ret) > n
popmin!(ret)
end
end
ret
end
6 Likes