Fastest way to partition array, given a condition

lmiq · July 24, 2021, 6:55pm

I need to sort (usually small, i. e. 5 to 20 elements, but the algorithm cannot scale badly) arrays, but only partially. That is, I need that the elements get split into elements smaller than a cutoff, and elements greater than a cutoff, and I need the number of elements smaller than the cutoff. It must be inplace, and single-threaded.

The fastest I could do for now is this:

julia> function partialsort_cutoff!(x,cutoff)
         iswap = 1
         for i in eachindex(x)
           if x[i] <= cutoff
             if iswap != i
               x[iswap], x[i] = x[i], x[iswap]
             end
             iswap = iswap + 1
           end
         end
         return iswap - 1
       end
partialsort_cutoff! (generic function with 1 method)

julia> @benchmark partialsort_cutoff!(y,0.5) setup=(y=rand(20)) evals=1
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):   88.000 ns … 444.000 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     172.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   172.809 ns ±  25.730 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                         ▂ ▄▁▁▆▂▇▃█▃▃█▃█▂▆▁▂▄ ▃                  
  ▂▂▂▁▂▂▂▂▂▂▃▃▃▃▄▄▅▄▆▅▆▇▇██████████████████████▇█▆▅▇▅▆▄▅▃▃▄▃▃▃▂ ▅
  88 ns            Histogram: frequency by time          236 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Anyone can think of a faster alternative?

Skoffer · July 24, 2021, 7:03pm

Small style suggestion, you can use x[iswap], x[i] = x[i], x[iswap] instead of three lines.

lmiq · July 24, 2021, 7:09pm

accepted

eliassno · July 24, 2021, 7:29pm

Why do I get different timing results? On Julia Version 1.5.3:

julia> @btime partialsort_cutoff!(y,0.5) setup=(y=copy($x)) evals=1
  0.001 ns (0 allocations: 0 bytes)
7

EDIT:
Timings are the same until length(x) == 134:

julia> x = rand(134);

julia> @btime partialsort_cutoff!(y,0.5) setup=(y=copy($x)) evals=1
  100.000 ns (0 allocations: 0 bytes)
65

lmiq · July 24, 2021, 7:33pm

That is a strange result, and certainly it is not meaningful. I’m in Julia 1.6.2, for if that matters.

Anyway, it is probably better to benchmark the function with:

julia> @benchmark partialsort_cutoff!(y,0.5) setup=(y=rand(20)) evals=1
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):   88.000 ns … 444.000 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     172.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   172.809 ns ±  25.730 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                         ▂ ▄▁▁▆▂▇▃█▃▃█▃█▂▆▁▂▄ ▃                  
  ▂▂▂▁▂▂▂▂▂▂▃▃▃▃▄▄▅▄▆▅▆▇▇██████████████████████▇█▆▅▇▅▆▄▅▃▃▄▃▃▃▂ ▅
  88 ns            Histogram: frequency by time          236 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

(updated the original post)

LaurentPlagne · July 24, 2021, 7:44pm

Is the initial distribution on number totally random ?

lmiq · July 24, 2021, 7:57pm

Yet, it is.

What, on the other side, might be useful, is that I don’t need the values greater than the cutoff. I would be happy if I got the elements smaller than the cutoff, and the number of elements smaller than the cutoff. But I don’t know if something can be faster than swapping the elements directly.

GunnarFarneback · July 24, 2021, 7:59pm

This should minimize the number of swaps but the overall higher complexity doesn’t seem to pay off unless the swaps are more expensive than in the benchmark.

function partialsort_cutoff2!(x, cutoff)
    i, j = extrema(eachindex(x))
    while true
        while i < j && x[i] <= cutoff
            i += 1
        end
        while j > i && x[j] > cutoff
            j -= 1
        end
        if j <= i
            return i - 1
        end
        x[i], x[j] = x[j], x[i]
    end
end

Edit: There’s a corner case bug if all elements are lower than the cutoff.

stevengj · July 24, 2021, 8:04pm

This is called “partitioning” the array, and is a subroutine of the quicksort algorithm. See the Base.partition! function.

lmiq · July 24, 2021, 8:04pm

Thanks. But the swaps are pretty cheap, yes. They are swaps of a struct of shape:

struct S
  i::Int 
  x::Float64
  v::SVector{3,Float64}
end

and the sorting is on the x values.

Probably a little bit more expensive to swap, but still very cheap.

oheil · July 24, 2021, 8:13pm

filter! ?

lmiq · July 24, 2021, 8:17pm

I cannot deallocate the remaining of the array, because it will be reused. Could that be adapted to that situation?

two comments:

the Base.partition! is interesting, but I will have to understand exactly how to use it in this case.

Also, filter! seems to be fast. Another function to study how it is implemented, even if to adapt to what I need here.

GunnarFarneback · July 24, 2021, 8:19pm

If you use the partition! algorithm you don’t need to swap; you can just do a single assignment into the lower half.

lmiq · July 24, 2021, 8:21pm

Thanks all! I have some clear directions to study alternatives. I need to take care of the kid now

I will post what I get from those suggestions asap.

oheil · July 24, 2021, 8:29pm

The original:

function filter!(f, a::AbstractVector)
    j = firstindex(a)
    for ai in a
        @inbounds a[j] = ai
        j = ifelse(f(ai), nextind(a, j), j)
    end
    j > lastindex(a) && return a
    if a isa Vector
        resize!(a, j-1)
        sizehint!(a, j-1)
    else
        deleteat!(a, j:lastindex(a))
    end
    return a
end

adapted to your needs:

function yourFilter!(f, a::AbstractVector)
    j = firstindex(a)
    for ai in a
        @inbounds a[j] = ai
        j = ifelse(f(ai), nextind(a, j), j)
    end
    return j-1
end

You can check if @inbounds is good for your swapping too.

lmiq · July 24, 2021, 11:29pm

Thanks for that. But I realized now that the way I have things implemented now, I actually need the swapping.

I will see how faster is the partition! function.

lmiq · July 24, 2021, 11:46pm

The adapted Base.Sort.partition! function appears to be slightly faster than what I originally had. If someone sees anything else to improve, I’ll appreciate. But I guess that if that is what is implemented by default in QuickSort, I guess it can’t get much faster.

julia> @benchmark partition!(y,0.5) setup=(y=rand(20)) evals=1
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):   71.000 ns …  14.398 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     157.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   159.097 ns ± 145.189 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                            ▂ ▁▅▂▆▂▂█▄▂▇▃▆▂▁▅  ▃
  ▂▁▁▂▁▂▂▂▂▂▃▂▂▃▃▃▃▄▅▄▄▆▅█▇█████████████████████▇█▆▅▇▄▄▆▄▄▃▃▃▃▃ ▅
  71 ns            Histogram: frequency by time          216 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark partialsort_cutoff!(y,0.5) setup=(y=rand(20)) evals=1
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):   76.000 ns …  31.006 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     169.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   172.485 ns ± 309.801 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                           ▁ ▃ ▄▁▆▂▇▃▇▃▇▃█▃▆▁▆▁▂ ▁
  ▂▁▁▂▂▁▂▂▂▂▂▂▂▃▃▃▃▄▃▅▄▆▅▇▆███████████████████████▆█▅▇▅▅▄▄▃▄▃▃▃ ▅
  76 ns            Histogram: frequency by time          229 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Code

using BenchmarkTools

function partition!(v::AbstractVector, cutoff)
    i, j = 0, length(v)+1
    @inbounds while true
        i += 1; j -= 1
        while 
          v[i] <= cutoff
          i += 1  
        end
        while 
          v[j] > cutoff
          j -= 1 
        end
        i >= j && break
        v[i], v[j] = v[j], v[i]
    end
    return j
end

function partialsort_cutoff!(x,cutoff)
  iswap = 1
  @inbounds for i in eachindex(x)
    if x[i] <= cutoff
      if iswap != i
        x[iswap], x[i] = x[i], x[iswap]
      end
      iswap = iswap + 1
    end
  end
  return iswap - 1
end

GunnarFarneback · July 25, 2021, 7:14am

Do you have guarantees that you have values both below and above the cutoff? Otherwise you will go out of bounds in one of the while loops. (The Base partition! avoids this with the selectpivot! semantics.)

giordano · July 25, 2021, 10:02am

As a rule of thumb, subnanosecond timings don’t make any sense. Usually this means there is constant propagation going on, so that the result is effectively computed at compile time and @btime is trying to benchmark a function which directly returns a constant value. See Manual · BenchmarkTools.jl

lmiq · July 25, 2021, 10:09am

Indeed, I only tested that in the toy example, so I missed that. I’ll add checking limits to the loops, as the use of a pivot does not seem to be exactly what I need. But maybe that will narrow the diference to the initial implementation even further.

@GunnarFarneback now I notice that is exactly what you posted above. Thanks for that.

Probably what he is seen deserves a new thread, because if he ran exactly the code I posted that should not be happening.

Topic		Replies	Views
Best `separate` performance? Performance	18	1184	November 1, 2022
In place sorting for views Performance question , sort , views	12	2743	March 1, 2021
Sorting algorithms: partial quicksort usage General Usage question , sort	4	1745	March 16, 2019
10x faster sortperm() Performance sortperm	10	2702	October 29, 2024
Sort huge array get smallest 1000 Performance	5	831	December 31, 2017

Fastest way to partition array, given a condition

Related topics