I need to sort (usually small, i. e. 5 to 20 elements, but the algorithm cannot scale badly) arrays, but only partially. That is, I need that the elements get split into elements smaller than a cutoff, and elements greater than a cutoff, and I need the number of elements smaller than the cutoff. It must be inplace, and single-threaded.

The fastest I could do for now is this:

julia> function partialsort_cutoff!(x,cutoff)
iswap = 1
for i in eachindex(x)
if x[i] <= cutoff
if iswap != i
x[iswap], x[i] = x[i], x[iswap]
end
iswap = iswap + 1
end
end
return iswap - 1
end
partialsort_cutoff! (generic function with 1 method)
julia> @benchmark partialsort_cutoff!(y,0.5) setup=(y=rand(20)) evals=1
BechmarkTools.Trial: 10000 samples with 1 evaluations.
Range (min β¦ max): 88.000 ns β¦ 444.000 ns β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 172.000 ns β GC (median): 0.00%
Time (mean Β± Ο): 172.809 ns Β± 25.730 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
β ββββββββββββββββββ β
βββββββββββββββββ βββ βββββββββββββββββββββββββββββ ββ βββ βββββββ β
88 ns Histogram: frequency by time 236 ns <
Memory estimate: 0 bytes, allocs estimate: 0.

What, on the other side, might be useful, is that I donβt need the values greater than the cutoff. I would be happy if I got the elements smaller than the cutoff, and the number of elements smaller than the cutoff. But I donβt know if something can be faster than swapping the elements directly.

This should minimize the number of swaps but the overall higher complexity doesnβt seem to pay off unless the swaps are more expensive than in the benchmark.

function partialsort_cutoff2!(x, cutoff)
i, j = extrema(eachindex(x))
while true
while i < j && x[i] <= cutoff
i += 1
end
while j > i && x[j] > cutoff
j -= 1
end
if j <= i
return i - 1
end
x[i], x[j] = x[j], x[i]
end
end

Edit: Thereβs a corner case bug if all elements are lower than the cutoff.

function filter!(f, a::AbstractVector)
j = firstindex(a)
for ai in a
@inbounds a[j] = ai
j = ifelse(f(ai), nextind(a, j), j)
end
j > lastindex(a) && return a
if a isa Vector
resize!(a, j-1)
sizehint!(a, j-1)
else
deleteat!(a, j:lastindex(a))
end
return a
end

adapted to your needs:

function yourFilter!(f, a::AbstractVector)
j = firstindex(a)
for ai in a
@inbounds a[j] = ai
j = ifelse(f(ai), nextind(a, j), j)
end
return j-1
end

You can check if @inbounds is good for your swapping too.

The adapted Base.Sort.partition! function appears to be slightly faster than what I originally had. If someone sees anything else to improve, Iβll appreciate. But I guess that if that is what is implemented by default in QuickSort, I guess it canβt get much faster.

using BenchmarkTools
function partition!(v::AbstractVector, cutoff)
i, j = 0, length(v)+1
@inbounds while true
i += 1; j -= 1
while
v[i] <= cutoff
i += 1
end
while
v[j] > cutoff
j -= 1
end
i >= j && break
v[i], v[j] = v[j], v[i]
end
return j
end
function partialsort_cutoff!(x,cutoff)
iswap = 1
@inbounds for i in eachindex(x)
if x[i] <= cutoff
if iswap != i
x[iswap], x[i] = x[i], x[iswap]
end
iswap = iswap + 1
end
end
return iswap - 1
end

Do you have guarantees that you have values both below and above the cutoff? Otherwise you will go out of bounds in one of the while loops. (The Base partition! avoids this with the selectpivot! semantics.)

As a rule of thumb, subnanosecond timings donβt make any sense. Usually this means there is constant propagation going on, so that the result is effectively computed at compile time and @btime is trying to benchmark a function which directly returns a constant value. See Manual Β· BenchmarkTools.jl

Indeed, I only tested that in the toy example, so I missed that. Iβll add checking limits to the loops, as the use of a pivot does not seem to be exactly what I need. But maybe that will narrow the diference to the initial implementation even further.

@GunnarFarneback now I notice that is exactly what you posted above. Thanks for that.

Probably what he is seen deserves a new thread, because if he ran exactly the code I posted that should not be happening.