using DataStructures
function partialsort(itr, n)
ret = SortedSet{eltype(itr)}()
for x in itr
push!(ret, x)
if length(ret) > n
pop!(ret)
end
end
ret
end
You can’t do better than O(N log(N)) for full comparison sort. This means in general you also can’t to partial sort in O(N). Otherwise you would just choose n=N and end up with a faster full comparison sort.
Now if your problem has extra structure, there can be faster algorithms. Like O(N) radix sort for a vector of integers. Or O(1) if your collection is say a range.
I think you are misinterpreting partialsort!. You don’t need the output array to be sorted, you only need to find the position of the n-th smallest element (and as a consequence you can find the position of the n smallest elements with an extra O(n) time). Setting n=N would just give you the largest element, which obviously can be found in time O(N).
I think this operation may done in O(N) time as follows. For simplicity of explanation, assume N is a multiple of n. Form a buffer with 2n slots. Initially load the buffer with 2n items from the iterator. Then find the median, which requires O(n) time, and discard the elements below the median, leaving n items in the buffer. Now load another n items from the iterator, again filling the buffer. Repeat the operation of taking the median and discarding the n smaller elements. The number of median operations is N/n, and each one requires O(n) operations, so the total running time is O(N).
The OP asked for a package rather than an algorithm; I don’t know of a package to implement this operation.