Fastest way to permute Array, given some permutation

DavidBerghaus · November 6, 2020, 3:42pm

I wonder what the fastest way is to apply a permutation to an array. Three approaches that I checked are:

julia> A = rand(10000);

julia> permutation = sortperm(A);

julia> @btime $A[$permutation];
  11.700 μs (3 allocations: 78.22 KiB)

julia> @btime $A .= $A[$permutation];
  13.200 μs (3 allocations: 78.22 KiB)

julia> @btime permute!($A,$permutation);
  42.000 μs (2 allocations: 78.20 KiB)

None of these methods seem to be inplace (for the first one it is not surprising), I did however not expect the first approach to be the fastest. Any ideas what is causing this?

stevengj · November 6, 2020, 3:54pm

The permute! function acts in-place in A, but requires the allocation of an auxiliary index array. You can do it fully in-place with the undocumented Base.permute!! function, which also overwrites the permutation array (so that you can only call it once).

@btime Base.permute!!($A,p) setup=(p = copy($permutation)) evals=1;

However, it only saves about 15% on my machine.

The basic issue here, I think, is that permuting an array in-place is generally a tricky problem, involving chasing the cycles of the permutation, that takes a bunch of bookkeeping and has poor memory-access patterns for cache-line utilization. Simply making an out-of-place copy via A[permutation] is much more straightforward (and involves more consecutive memory access) and hence faster.

DavidBerghaus · November 6, 2020, 3:58pm

Okay thank you, then I will not feel too bad for allocating this extra memory

Jean_Michel · November 8, 2020, 1:33pm

Note that you gain another 20% by doing

julia> @btime @inbounds $A[$permutation];
  9.356 μs (2 allocations: 78.20 KiB)

DavidBerghaus · November 8, 2020, 9:05pm

Good point, that’s true!

Henrique_Becker · November 8, 2020, 9:59pm

Note that over-optimizing permutations may cause damage to your sanity, I know because I had to do it for maximum efficiency to replicate a literature heuristic.

The optimized heuristic code is here. Basically, for each Vector that needs to be permuted frequently, I allocate a copy a single time, and then in the literal million iterations following, I copy-permute the original vector to the copy, and then swap who is the copy and who is the original in the loop scope.

I also had to gut the sort internals, it was not pretty. But I got rid of any allocations and my final code was six times faster. I just had to keep the old code to be able to check if the results were the same, or I introduced any bugs.

DavidBerghaus · November 9, 2020, 2:08pm

This indeed looks a bit tedious

But for a 6x speed-up it might be worth it ^^

Henrique_Becker · November 9, 2020, 3:38pm

It become so much faster only because the code stopped allocating anything this way, and it is a code that literally runs at least one million iterations before stopping, so even one allocation inside the loop caused considerable slowdown, as the rest of the loop is just permutation of small vectors.

DavidBerghaus · November 9, 2020, 4:50pm

Yes, fortunately for me, this permutation line is only called a few times, so I don’t have to optimize too much. The reason for creating this thread was more out of curiosity and because I was suprised that Base.permute! is slower

stevengj · November 9, 2020, 5:04pm

Note that you can do this in-place on preallocated output using B .= getindex.(Ref(A), permutation). For example:

julia> @btime $A[$permutation];
  18.972 μs (2 allocations: 78.20 KiB)

julia> B = similar(A);

julia> @btime $B .= getindex.(Ref($A), $permutation);
  7.601 μs (0 allocations: 0 bytes)

It’s a good thought exercise to make sure you understand why A .= getindex.(Ref(A), permutation) would not work (it executes, but produces the wrong A in general).

DavidBerghaus · November 9, 2020, 5:58pm

Yes indeed, I originally thought that this is what

would be doing (similar to x, y = y, x) until I realized that this line just creates a permuted copy of A and then replaces the array element-by element (which in hindsight seems obvious).

Also interesting that

does not seem to benefit from @inbounds

orenbenkiki · September 12, 2021, 8:46am

Would have been nice if permute! took an optional additional array to use instead of always allocating it (or overwriting the original in the case of permute!!). I’ve a case where I repeatedly permute not-too-long arrays and this de/allocation each time can be a performance issue. Is this something that might be considered as a feature for the permute! function?

stevengj · September 12, 2021, 12:54pm

You could pass copyto!(pcopy, p) to permute!! if you want to preserve the original permutation array while using a preallocated buffer.

orenbenkiki · September 12, 2021, 1:06pm

@stevengj - right, I should have though of that. This way I’d be able to apply the permutation as many times as I want w/o any allocations. Thanks!

Topic		Replies	Views
Speeding up sort along an array dimension Performance sort , arrays	11	618	August 18, 2021
Fast permutation vector: code needed Performance sorting	38	579	March 21, 2023
In-place inverse permutation Performance sort , sortperm	1	865	February 9, 2023
More efficient way of getting permutations New to Julia	3	354	August 13, 2021
Find permutation of an array such that it is equal to another array General Usage question , sortperm	10	959	May 31, 2023

Fastest way to permute Array, given some permutation

Related topics