Performance optimization：Frequently use permutedims function

quantumdreamer · April 28, 2025, 3:15am

Here is my example：

G = rand(8,256,14,288,6)+rand(8,256,14,288,6)*1im;
@elapsed begin
    G = permutedims(G,[1,2,4,3,5]);
    k = sqrt.(sum(abs2.(reshape(G,589824,1,1,14,6)),dims = 1));
    G .*= k;
    G = permutedims(G,[1,2,4,3,5]);
end
```.

quantumdreamer · April 28, 2025, 3:16am

In my case,codes block like this in a for loop is frequently used,The G array is a matrix used to simulate the input and is not counted in the running time.I don’t know how to further optimize it,i would appreciate it if anyone can help me？

baggepinnen · April 28, 2025, 6:01am

permutedims allocates a new array each time you call it. If you call it often, you can make use of permutedims! instead, which accepts a preallocated array such that you do not have to allocate a new one each time.

DNF · April 28, 2025, 7:56am

You are not providing the H array. Do you need it? It’s hard to suggest improvements without a self-contained example.

DNF · April 28, 2025, 8:06am

It would be easier to help if you create a fully self-contained function that shows exactly what inputs are given and what outputs are required. Right now, it’s unclear which of the variables are actually needed for output, and which are only temporary helpers that could be removed for performance optimization.

For example, why do you perform permutation twice on G, is that intentional? Do you need k, and will you output both G, G1, H and k? And where does H come from?

The only definitive advice I have right now is for a part where you didn’t ask for help:

This should instead be

G = rand(ComplexF64, 8, 256, 14, 288, 6)

DNF · April 28, 2025, 8:24am

That said, here are some remarks:

If you use permutedims, specify the dimensions as a tuple, to avoid allocations: G = permutedims(G,(1,2,4,3,5));
Never write sum(abs2.(X)), instead write sum(abs2, X), the latter avoids the temporary intermediate array.
Don’t use permutedims, instead specify the dimensions in the sum function:

k2 = sum(abs2, G; dims=(1, 2, 4))

Don’t calculate the sqrt before multiplying with H, since that creates a temporary array, instead fuse this calculation into the update: H .*= sqrt.(k2).

That’s a start.

quantumdreamer · April 28, 2025, 10:59am

I correct my codes.Here, “H” should be changed to “G”

quantumdreamer · April 28, 2025, 11:16am

The reason why two permutedims are needed is that when calculating k, the G array is obtained after the first permutedims. I wonder if there is any way to obtain k through the G before permutedims.

quantumdreamer · April 28, 2025, 11:18am

I have to think about the third point you mentioned. Does it mean that this doesn’t require any permutedims

photor · April 28, 2025, 11:35am

You should try Strided.jl .

DNF · April 28, 2025, 11:58am

So you want to permute it back? You should probably use invperm for this. In this particular case, the permutation is its own inverse, but that’s not generally the case.

If you just need G, then you could do this:

function bar!(G)
    k2 = sum(abs2, G; dims=(1,2,4))
    G .*= sqrt.(k2);
    return G
end

which is much cleaner and faster, and allocates much less. Also it doesn’t hardcode dimension sizes, which you really should avoid.

Note that this will directly modify the array G.

There’s even more stuff you can do to speed this up and completely avoid any temporary arrays, but this is a simple solution.

DNF · April 28, 2025, 12:05pm

This version completely avoids allocations, but is roughly the same speed:

function baz!(G)
    for slice in eachslice(G; dims=(3, 5))
        slice .*= sqrt(sum(abs2, slice))
    end
    return G
end

quantumdreamer · April 29, 2025, 1:24am

Yes,It’s a good. I already try “@Strided”,but sometimes its performance is not good enough.

quantumdreamer · April 29, 2025, 1:27am

This is exactly what I want. Thank you for your help. This can indeed avoid unnecessary performance overhead

quantumdreamer · April 29, 2025, 3:09am

function foo(G)
    H = reshape(permutedims(G,(1,2,4,3,5)),8,256,288,14,6,1)
    H = permutedims(H,(2,1,6,5,3,4))
    return H
end
G = rand(ComplexF64,8,256,14,288,6);
H = foo(G);

quantumdreamer · April 29, 2025, 3:11am

Could you help me take a look at this by the way? I have no clue at all.@DNF

quantumdreamer · April 29, 2025, 3:23am

function bar2(G)
    out = similar(G, 256, 8, 1, 6, 288, 14)
    @inbounds for i5=1:6, i4=1:288, i3=1:14, i2=1:256, i1=1:8
        out[i2,i1,1,i5,i4,i3] = G[i1,i2,i3,i4,i5]  # 注意索引对应关系
    end
    return out
end

This is how I do it at present.

DNF · April 29, 2025, 5:30am

I don’t know how to improve on the performance of this code, but I would not use hardcoded ranges like that, it’s dangerous when combined with @inbounds. Use size or axes to get the ranges.

But the takeaway from my previous answer was that you didn’t need permutedims or reshape at all! Perhaps you don’t need it now either, but since I don’t know what you what this for, I cannot suggest any alternative.

quantumdreamer · April 30, 2025, 1:06am

OK,thank you for your advice.

roflmaostc · April 30, 2025, 11:37am

Also this works but the winner is baz!

using BenchmarkTools

function a(G)
    G = permutedims(G,[1,2,4,3,5]);
    k = sqrt.(sum(abs2.(reshape(G,589824,1,1,14,6)),dims = 1));
    G .*= k;
    G = permutedims(G,[1,2,4,3,5]);
end

function b(G)
    G = PermutedDimsArray(G,[1,2,4,3,5]);
    G .*= sqrt.(sum(abs2, reshape(G,589824,1,1,14,6),dims = 1));
    G = PermutedDimsArray(G,[1,2,4,3,5]);
end

function baz!(G)
    for slice in eachslice(G; dims=(3, 5))
        slice .*= sqrt(sum(abs2, slice))
    end
    return G
end

G = rand(8,256,14,288,6)+rand(8,256,14,288,6)*1im;

@btime a($G)
@btime b($G)
@btime baz!($G)

  1.105 s (30 allocations: 1.85 GiB)
  572.395 ms (27 allocations: 2.25 KiB)
  290.433 ms (0 allocations: 0 bytes)

Topic		Replies	Views
Performance difference in permuting Arrays Performance sort , sortperm	4	525	October 11, 2021
Product of permutations without allocation General Usage performance	11	471	December 12, 2023
Permute Multidimensional Array without allocation General Usage	9	2195	September 25, 2019
Trouble Pre-allocating Outputs Performance	9	760	September 16, 2020
Memory allocation when using permutedims New to Julia performance , memory-allocation , tensors	1	781	November 5, 2023

Performance optimization：Frequently use permutedims function

Related topics