# Fastest way to partition array, given a condition

I need to sort (usually small, i. e. 5 to 20 elements, but the algorithm cannot scale badly) arrays, but only partially. That is, I need that the elements get split into elements smaller than a cutoff, and elements greater than a cutoff, and I need the number of elements smaller than the cutoff. It must be inplace, and single-threaded.

The fastest I could do for now is this:

``````julia> function partialsort_cutoff!(x,cutoff)
iswap = 1
for i in eachindex(x)
if x[i] <= cutoff
if iswap != i
x[iswap], x[i] = x[i], x[iswap]
end
iswap = iswap + 1
end
end
return iswap - 1
end
partialsort_cutoff! (generic function with 1 method)

julia> @benchmark partialsort_cutoff!(y,0.5) setup=(y=rand(20)) evals=1
BechmarkTools.Trial: 10000 samples with 1 evaluations.
Range (min β¦ max):   88.000 ns β¦ 444.000 ns  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     172.000 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   172.809 ns Β±  25.730 ns  β GC (mean Β± Ο):  0.00% Β± 0.00%

β ββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
88 ns            Histogram: frequency by time          236 ns <

Memory estimate: 0 bytes, allocs estimate: 0.

``````

Anyone can think of a faster alternative?

1 Like

Small style suggestion, you can use `x[iswap], x[i] = x[i], x[iswap]` instead of three lines.

4 Likes

accepted

1 Like

Why do I get different timing results? On Julia Version 1.5.3:

``````julia> @btime partialsort_cutoff!(y,0.5) setup=(y=copy(\$x)) evals=1
0.001 ns (0 allocations: 0 bytes)
7
``````

EDIT:
Timings are the same until `length(x) == 134`:

``````julia> x = rand(134);

julia> @btime partialsort_cutoff!(y,0.5) setup=(y=copy(\$x)) evals=1
100.000 ns (0 allocations: 0 bytes)
65
``````

That is a strange result, and certainly it is not meaningful. Iβm in Julia 1.6.2, for if that matters.

Anyway, it is probably better to benchmark the function with:

``````julia> @benchmark partialsort_cutoff!(y,0.5) setup=(y=rand(20)) evals=1
BechmarkTools.Trial: 10000 samples with 1 evaluations.
Range (min β¦ max):   88.000 ns β¦ 444.000 ns  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     172.000 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   172.809 ns Β±  25.730 ns  β GC (mean Β± Ο):  0.00% Β± 0.00%

β ββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
88 ns            Histogram: frequency by time          236 ns <

Memory estimate: 0 bytes, allocs estimate: 0.

``````

(updated the original post)

1 Like

Is the initial distribution on number totally random ?

Yet, it is.

What, on the other side, might be useful, is that I donβt need the values greater than the cutoff. I would be happy if I got the elements smaller than the cutoff, and the number of elements smaller than the cutoff. But I donβt know if something can be faster than swapping the elements directly.

This should minimize the number of swaps but the overall higher complexity doesnβt seem to pay off unless the swaps are more expensive than in the benchmark.

``````function partialsort_cutoff2!(x, cutoff)
i, j = extrema(eachindex(x))
while true
while i < j && x[i] <= cutoff
i += 1
end
while j > i && x[j] > cutoff
j -= 1
end
if j <= i
return i - 1
end
x[i], x[j] = x[j], x[i]
end
end
``````

Edit: Thereβs a corner case bug if all elements are lower than the cutoff.

1 Like

This is called βpartitioningβ the array, and is a subroutine of the quicksort algorithm. See the `Base.partition!` function.

4 Likes

Thanks. But the swaps are pretty cheap, yes. They are swaps of a struct of shape:

``````struct S
i::Int
x::Float64
v::SVector{3,Float64}
end
``````

and the sorting is on the `x` values.

Probably a little bit more expensive to swap, but still very cheap.

`filter!` ?

2 Likes

I cannot deallocate the remaining of the array, because it will be reused. Could that be adapted to that situation?

the `Base.partition!` is interesting, but I will have to understand exactly how to use it in this case.

Also, `filter!` seems to be fast. Another function to study how it is implemented, even if to adapt to what I need here.

If you use the `partition!` algorithm you donβt need to swap; you can just do a single assignment into the lower half.

2 Likes

Thanks all! I have some clear directions to study alternatives. I need to take care of the kid now

I will post what I get from those suggestions asap.

1 Like

The original:

``````function filter!(f, a::AbstractVector)
j = firstindex(a)
for ai in a
@inbounds a[j] = ai
j = ifelse(f(ai), nextind(a, j), j)
end
j > lastindex(a) && return a
if a isa Vector
resize!(a, j-1)
sizehint!(a, j-1)
else
deleteat!(a, j:lastindex(a))
end
return a
end
``````

``````function yourFilter!(f, a::AbstractVector)
j = firstindex(a)
for ai in a
@inbounds a[j] = ai
j = ifelse(f(ai), nextind(a, j), j)
end
return j-1
end
``````

You can check if `@inbounds` is good for your swapping too.

2 Likes

Thanks for that. But I realized now that the way I have things implemented now, I actually need the swapping.

I will see how faster is the `partition!` function.

The adapted `Base.Sort.partition!` function appears to be slightly faster than what I originally had. If someone sees anything else to improve, Iβll appreciate. But I guess that if that is what is implemented by default in QuickSort, I guess it canβt get much faster.

``````julia> @benchmark partition!(y,0.5) setup=(y=rand(20)) evals=1
BechmarkTools.Trial: 10000 samples with 1 evaluations.
Range (min β¦ max):   71.000 ns β¦  14.398 ΞΌs  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     157.000 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   159.097 ns Β± 145.189 ns  β GC (mean Β± Ο):  0.00% Β± 0.00%

β βββββββββββββββ  β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
71 ns            Histogram: frequency by time          216 ns <

Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark partialsort_cutoff!(y,0.5) setup=(y=rand(20)) evals=1
BechmarkTools.Trial: 10000 samples with 1 evaluations.
Range (min β¦ max):   76.000 ns β¦  31.006 ΞΌs  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     169.000 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   172.485 ns Β± 309.801 ns  β GC (mean Β± Ο):  0.00% Β± 0.00%

β β βββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
76 ns            Histogram: frequency by time          229 ns <

Memory estimate: 0 bytes, allocs estimate: 0.

``````
Code
``````using BenchmarkTools

function partition!(v::AbstractVector, cutoff)
i, j = 0, length(v)+1
@inbounds while true
i += 1; j -= 1
while
v[i] <= cutoff
i += 1
end
while
v[j] > cutoff
j -= 1
end
i >= j && break
v[i], v[j] = v[j], v[i]
end
return j
end

function partialsort_cutoff!(x,cutoff)
iswap = 1
@inbounds for i in eachindex(x)
if x[i] <= cutoff
if iswap != i
x[iswap], x[i] = x[i], x[iswap]
end
iswap = iswap + 1
end
end
return iswap - 1
end

``````

Do you have guarantees that you have values both below and above the cutoff? Otherwise you will go out of bounds in one of the while loops. (The Base `partition!` avoids this with the `selectpivot!` semantics.)

5 Likes

As a rule of thumb, subnanosecond timings donβt make any sense. Usually this means there is constant propagation going on, so that the result is effectively computed at compile time and `@btime` is trying to benchmark a function which directly returns a constant value. See Manual Β· BenchmarkTools.jl

1 Like

Indeed, I only tested that in the toy example, so I missed that. Iβll add checking limits to the loops, as the use of a pivot does not seem to be exactly what I need. But maybe that will narrow the diference to the initial implementation even further.

@GunnarFarneback now I notice that is exactly what you posted above. Thanks for that.

Probably what he is seen deserves a new thread, because if he ran exactly the code I posted that should not be happening.