Performance optimization:Frequently use permutedims function

Here is my example:

G = rand(8,256,14,288,6)+rand(8,256,14,288,6)*1im;
@elapsed begin
    G = permutedims(G,[1,2,4,3,5]);
    k = sqrt.(sum(abs2.(reshape(G,589824,1,1,14,6)),dims = 1));
    G .*= k;
    G = permutedims(G,[1,2,4,3,5]);
end
```.

In my case,codes block like this in a for loop is frequently used,The G array is a matrix used to simulate the input and is not counted in the running time.I don’t know how to further optimize it,i would appreciate it if anyone can help me?

permutedims allocates a new array each time you call it. If you call it often, you can make use of permutedims! instead, which accepts a preallocated array such that you do not have to allocate a new one each time.

2 Likes

You are not providing the H array. Do you need it? It’s hard to suggest improvements without a self-contained example.

It would be easier to help if you create a fully self-contained function that shows exactly what inputs are given and what outputs are required. Right now, it’s unclear which of the variables are actually needed for output, and which are only temporary helpers that could be removed for performance optimization.

For example, why do you perform permutation twice on G, is that intentional? Do you need k, and will you output both G, G1, H and k? And where does H come from?

The only definitive advice I have right now is for a part where you didn’t ask for help:

This should instead be

G = rand(ComplexF64, 8, 256, 14, 288, 6)
1 Like

That said, here are some remarks:

  1. If you use permutedims, specify the dimensions as a tuple, to avoid allocations: G = permutedims(G,(1,2,4,3,5));
  2. Never write sum(abs2.(X)), instead write sum(abs2, X), the latter avoids the temporary intermediate array.
  3. Don’t use permutedims, instead specify the dimensions in the sum function:
k2 = sum(abs2, G; dims=(1, 2, 4))
  1. Don’t calculate the sqrt before multiplying with H, since that creates a temporary array, instead fuse this calculation into the update: H .*= sqrt.(k2).

That’s a start.

7 Likes

I correct my codes.Here, “H” should be changed to “G”

The reason why two permutedims are needed is that when calculating k, the G array is obtained after the first permutedims. I wonder if there is any way to obtain k through the G before permutedims.

I have to think about the third point you mentioned. Does it mean that this doesn’t require any permutedims

You should try Strided.jl .

So you want to permute it back? You should probably use invperm for this. In this particular case, the permutation is its own inverse, but that’s not generally the case.

If you just need G, then you could do this:

function bar!(G)
    k2 = sum(abs2, G; dims=(1,2,4))
    G .*= sqrt.(k2);
    return G
end

which is much cleaner and faster, and allocates much less. Also it doesn’t hardcode dimension sizes, which you really should avoid.

Note that this will directly modify the array G.

There’s even more stuff you can do to speed this up and completely avoid any temporary arrays, but this is a simple solution.

This version completely avoids allocations, but is roughly the same speed:

function baz!(G)
    for slice in eachslice(G; dims=(3, 5))
        slice .*= sqrt(sum(abs2, slice))
    end
    return G
end
5 Likes

Yes,It’s a good. I already try “@Strided”,but sometimes its performance is not good enough.

This is exactly what I want. Thank you for your help. This can indeed avoid unnecessary performance overhead

function foo(G)
    H = reshape(permutedims(G,(1,2,4,3,5)),8,256,288,14,6,1)
    H = permutedims(H,(2,1,6,5,3,4))
    return H
end
G = rand(ComplexF64,8,256,14,288,6);
H = foo(G);

Could you help me take a look at this by the way? I have no clue at all.@DNF

function bar2(G)
    out = similar(G, 256, 8, 1, 6, 288, 14)
    @inbounds for i5=1:6, i4=1:288, i3=1:14, i2=1:256, i1=1:8
        out[i2,i1,1,i5,i4,i3] = G[i1,i2,i3,i4,i5]  # 注意索引对应关系
    end
    return out
end

This is how I do it at present.

I don’t know how to improve on the performance of this code, but I would not use hardcoded ranges like that, it’s dangerous when combined with @inbounds. Use size or axes to get the ranges.

But the takeaway from my previous answer was that you didn’t need permutedims or reshape at all! Perhaps you don’t need it now either, but since I don’t know what you what this for, I cannot suggest any alternative.

2 Likes

OK,thank you for your advice.

Also this works but the winner is baz!

using BenchmarkTools

function a(G)
    G = permutedims(G,[1,2,4,3,5]);
    k = sqrt.(sum(abs2.(reshape(G,589824,1,1,14,6)),dims = 1));
    G .*= k;
    G = permutedims(G,[1,2,4,3,5]);
end

function b(G)
    G = PermutedDimsArray(G,[1,2,4,3,5]);
    G .*= sqrt.(sum(abs2, reshape(G,589824,1,1,14,6),dims = 1));
    G = PermutedDimsArray(G,[1,2,4,3,5]);
end

function baz!(G)
    for slice in eachslice(G; dims=(3, 5))
        slice .*= sqrt(sum(abs2, slice))
    end
    return G
end

G = rand(8,256,14,288,6)+rand(8,256,14,288,6)*1im;

@btime a($G)
@btime b($G)
@btime baz!($G)

  1.105 s (30 allocations: 1.85 GiB)
  572.395 ms (27 allocations: 2.25 KiB)
  290.433 ms (0 allocations: 0 bytes)
2 Likes