How to sort a vector by an expensive function in place efficiently?

The normal sort!(...;by=f) sorts in place by a function, but it results in calling that function many more times than the length of the vector. I tried all the sorting algorithms included in Base and all of them resulted in calling the by function multiple times the length of the vector. So, this doesn’t work well if the by function is expensive.

I have seen the recommendation to use sortperm for this: Sorting and Related Functions · The Julia Language
However, it appears that it is creating 2 new vectors as large as the original. Is there a way to do this more efficiently when one wants to sort the original vector in place?

Sorting in place is described by the REPL help with ?sort!

by

  sort!(v; alg::Algorithm=defalg(v), lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward)

  Sort the vector v in place. QuickSort is used by default for numeric arrays while MergeSort is used for other arrays. You can specify an algorithm to use via the alg keyword (see Sorting Algorithms
  for available algorithms). The by keyword lets you provide a function that will be applied to each element before comparison; the lt keyword allows providing a custom "less than" function (note      
  that for every x and y, only one of lt(x,y) and lt(y,x) can return true); use rev=true to reverse the sorting order. These options are independent and can be used together in all possible
  combinations: if both by and lt are specified, the lt function is applied to the result of the by function; rev=true reverses whatever ordering specified via the by and lt keywords.
)

So I’m not really understanding what you are after?

I added more details to the question explaining why sort! is inappropriate for the case of expensive functions. It calls the function many times more than once per item.

1 Like

If that is the case then you could easily get an improvement by just calling the function itself inplace and then sorting in place.

1 Like

Another possibility is to memoize the function.

You need to store the result of calling f on the data somewhere. For example, instead of sort(x, by=f), you could use:

x = x[sortperm(f.(x))]

which has the downside that it allocates additional arrays for f.(x), the permutation, and for re-ordering x.

To operate more in-place, you could use the StructArrays.jl package to implicitly create an array of tuples of (f(x[i]), x[i]) values:

fx = StructArray((f.(x), x))
sort!(fx, by=first) # also sorts x

For example:

x = rand(-9:9, 10)
sort!(StructArray((abs.(x), x)), by=first)

sorts x by absolute value, but only calls abs once on each element of x, at the expense of allocating only one additional array abs.(x).

PS. You could also use something like Memoize.jl to memoize f, but this builds a dictionary structure that is probably not as efficient as the code above if x contains mostly unique values. On the other hand, memoization will be more efficient than the f.(x) above if x contains only a few distinct values repeated many times, since then you will only call f once per value in x.

6 Likes

Thanks for the pointer to StructArray. I had been wondering if there was a way to do a “view” sort of thing like that. I had thought about memoization, but every value is different so it wouldn’t work well for my case.

The timing is rather curious on the three approaches:

using StructArray
function texp(x)
    for _ in 1:1000
        x = log(x^2)
    end
    x
end

testit1(f, v) = sort!(v, by=f)
testit2(f, v) = v[sortperm(f.(v))]
testit3(f, v) = sort!(StructArray((f.(v), v)), by=first)

v = rand(10000)

julia> @btime testit1($texp, $v);
  4.464 s (0 allocations: 0 bytes)

julia> @btime testit2($texp, $v);
  220.449 ms (9 allocations: 234.56 KiB)

julia> @btime testit3($texp, $v);
  220.919 ms (29 allocations: 157.62 KiB)

v = rand(100000)
julia> @btime testit2($texp, $v);
  2.226 s (9 allocations: 2.29 MiB)

julia> @btime testit3($texp, $v);
  2.223 s (29 allocations: 1.53 MiB)

The naive sort! is really bad. The timing of the sortperm and StructArray approaches is about the same, but one allocates more bytes, but one allocates more times. And that pattern stays the same when increasing the size of the vector.

I tried using RadixSort for the same code and it resulted in about the same timing but more allocations (both count and size).

1 Like

a further variant, with the same structure as the solution proposed by @stevengj (but without dependencies) , and with the same performance as testt2

testit4(f, v) = sort!(tuple.(f.(v), v), by=first)

This doesn’t sort the original array v, but only sorts the new array of tuples. In contrast, sort!(StructArray((f.(v), v)), by=first) also sorts v.

2 Likes

Other than the StructArray trick, you could also try permute! to reduce allocations (or even Base.permute!!, as you don’t need to preserve the result of sortperm).

For example

sort_by!(f, v) = Base.permute!!(v, sortperm(f.(v)))

should reduce allocations. I’m not sure if it is slower or faster than simply v[sortperm(f.(v))] though.

3 Likes

I was looking for something like permute!. Thanks! So, here are the timings after adding two more variants for 100000 entries:

testit4(f, v) = Base.permute!!(v, sortperm(f.(v)))
testit5(f, v) = Base.permute!(v, sortperm(f.(v)))

julia> @btime testit2($texp, $v);
  1.157 s (9 allocations: 2.29 MiB)

julia> @btime testit3($texp, $v);
  1.156 s (29 allocations: 1.53 MiB)

julia> @btime testit4($texp, $v);
  1.149 s (7 allocations: 1.53 MiB)

julia> @btime testit5($texp, $v);
  1.150 s (9 allocations: 2.29 MiB)

Fun… The permute!! version has similar allocation count to the basic sortperm approach, but has similar allocation bytes to the StructArray approach. It has the lower of both, which would seem to suggest it is the best approach so far.

Be very careful using @btime to benchmark mutating sorting functions like this. The benchmarking code will run the same function multiple times on your inputs, which can significantly bias the results for sorting algorithms because every call after the first will be attempting to sort an already sorted array, which is very cheap. This is mentioned in the manual here: Manual · BenchmarkTools.jl

To benchmark something like sort!, you need to do:

julia> @btime sort!(x) setup=(x = rand(100)) evals=1
  1.668 μs (0 allocations: 0 bytes)

Where the setup phase generates new (unsorted) data for each run, and evals=1 tells the benchmark to only run sort! once on that data.

4 Likes

And maybe seed the RNG

@btime sort!(x) setup=(Random.seed!(42); x = rand(100)) evals=1
1 Like

Yes, but the problem was that sort calls the by function multiple times per value, and memoization can help you with that.

using Memoize

@memoize function texpm(x)
    for _ in 1:1000
        x = log(x^2)
    end
    return x
end

1.7.2> v = rand(10000);

1.7.2> @time testit2(texp, v);
  0.206719 seconds (9 allocations: 234.562 KiB)

1.7.2> @time sort!(v, by=texpm);
  0.219627 seconds (315.41 k allocations: 4.813 MiB)

Some care must be taken when benchmarking with memoization. Next time you work on the same data, you get

1.7.2> @time sort!(v; by=texpm);
  0.009580 seconds (198.99 k allocations: 3.036 MiB)

because the values are memoized, and the vector is sorted.

It’s unlikely to ever be faster than the other methods though, unless you are repeatedly working on shuffled versions of the same data.

1 Like

That’s good to know about @btime, thanks!

I had thought about it being presorted. However, I’m not trying to benchmark the algorithm itself, but rather the process around it. Also, I thought there might be the possibility of skewing the results if one case got a particular “easy” set of data. But, I suppose it runs enough samples that that is probably not a concern.

Using your example:

julia> @btime testit2($texp, x) setup=(x=rand(100000)) evals=1;
  1.166 s (9 allocations: 2.29 MiB)
julia> @btime testit4($texp, x) setup=(x=rand(100000)) evals=1;
  1.165 s (7 allocations: 1.53 MiB)
julia> @btime testit5($texp, x) setup=(x=rand(100000)) evals=1;
  1.164 s (9 allocations: 2.29 MiB)

The results look about the same.