How to sort a vector by an expensive function in place efficiently?

You need to store the result of calling f on the data somewhere. For example, instead of sort(x, by=f), you could use:

x = x[sortperm(f.(x))]

which has the downside that it allocates additional arrays for f.(x), the permutation, and for re-ordering x.

To operate more in-place, you could use the StructArrays.jl package to implicitly create an array of tuples of (f(x[i]), x[i]) values:

fx = StructArray((f.(x), x))
sort!(fx, by=first) # also sorts x

For example:

x = rand(-9:9, 10)
sort!(StructArray((abs.(x), x)), by=first)

sorts x by absolute value, but only calls abs once on each element of x, at the expense of allocating only one additional array abs.(x).

PS. You could also use something like Memoize.jl to memoize f, but this builds a dictionary structure that is probably not as efficient as the code above if x contains mostly unique values. On the other hand, memoization will be more efficient than the f.(x) above if x contains only a few distinct values repeated many times, since then you will only call f once per value in x.

6 Likes