Here is my example:
G = rand(8,256,14,288,6)+rand(8,256,14,288,6)*1im;
@elapsed begin
G = permutedims(G,[1,2,4,3,5]);
k = sqrt.(sum(abs2.(reshape(G,589824,1,1,14,6)),dims = 1));
G .*= k;
G = permutedims(G,[1,2,4,3,5]);
end
```.
Here is my example:
G = rand(8,256,14,288,6)+rand(8,256,14,288,6)*1im;
@elapsed begin
G = permutedims(G,[1,2,4,3,5]);
k = sqrt.(sum(abs2.(reshape(G,589824,1,1,14,6)),dims = 1));
G .*= k;
G = permutedims(G,[1,2,4,3,5]);
end
```.
In my case,codes block like this in a for loop is frequently used,The G array is a matrix used to simulate the input and is not counted in the running time.I don’t know how to further optimize it,i would appreciate it if anyone can help me?
permutedims
allocates a new array each time you call it. If you call it often, you can make use of permutedims!
instead, which accepts a preallocated array such that you do not have to allocate a new one each time.
You are not providing the H
array. Do you need it? It’s hard to suggest improvements without a self-contained example.
It would be easier to help if you create a fully self-contained function that shows exactly what inputs are given and what outputs are required. Right now, it’s unclear which of the variables are actually needed for output, and which are only temporary helpers that could be removed for performance optimization.
For example, why do you perform permutation twice on G
, is that intentional? Do you need k
, and will you output both G
, G1
, H
and k
? And where does H
come from?
The only definitive advice I have right now is for a part where you didn’t ask for help:
This should instead be
G = rand(ComplexF64, 8, 256, 14, 288, 6)
That said, here are some remarks:
permutedims
, specify the dimensions as a tuple, to avoid allocations: G = permutedims(G,(1,2,4,3,5));
sum(abs2.(X))
, instead write sum(abs2, X)
, the latter avoids the temporary intermediate array.permutedims
, instead specify the dimensions in the sum
function:k2 = sum(abs2, G; dims=(1, 2, 4))
sqrt
before multiplying with H
, since that creates a temporary array, instead fuse this calculation into the update: H .*= sqrt.(k2)
.That’s a start.
I correct my codes.Here, “H” should be changed to “G”
The reason why two permutedims are needed is that when calculating k, the G array is obtained after the first permutedims. I wonder if there is any way to obtain k through the G before permutedims.
I have to think about the third point you mentioned. Does it mean that this doesn’t require any permutedims
You should try Strided.jl .
So you want to permute it back? You should probably use invperm
for this. In this particular case, the permutation is its own inverse, but that’s not generally the case.
If you just need G
, then you could do this:
function bar!(G)
k2 = sum(abs2, G; dims=(1,2,4))
G .*= sqrt.(k2);
return G
end
which is much cleaner and faster, and allocates much less. Also it doesn’t hardcode dimension sizes, which you really should avoid.
Note that this will directly modify the array G
.
There’s even more stuff you can do to speed this up and completely avoid any temporary arrays, but this is a simple solution.
This version completely avoids allocations, but is roughly the same speed:
function baz!(G)
for slice in eachslice(G; dims=(3, 5))
slice .*= sqrt(sum(abs2, slice))
end
return G
end
Yes,It’s a good. I already try “@Strided”,but sometimes its performance is not good enough.
This is exactly what I want. Thank you for your help. This can indeed avoid unnecessary performance overhead
function foo(G)
H = reshape(permutedims(G,(1,2,4,3,5)),8,256,288,14,6,1)
H = permutedims(H,(2,1,6,5,3,4))
return H
end
G = rand(ComplexF64,8,256,14,288,6);
H = foo(G);
Could you help me take a look at this by the way? I have no clue at all.@DNF
function bar2(G)
out = similar(G, 256, 8, 1, 6, 288, 14)
@inbounds for i5=1:6, i4=1:288, i3=1:14, i2=1:256, i1=1:8
out[i2,i1,1,i5,i4,i3] = G[i1,i2,i3,i4,i5] # 注意索引对应关系
end
return out
end
This is how I do it at present.
I don’t know how to improve on the performance of this code, but I would not use hardcoded ranges like that, it’s dangerous when combined with @inbounds
. Use size
or axes
to get the ranges.
But the takeaway from my previous answer was that you didn’t need permutedims or reshape at all! Perhaps you don’t need it now either, but since I don’t know what you what this for, I cannot suggest any alternative.
OK,thank you for your advice.
Also this works but the winner is baz!
using BenchmarkTools
function a(G)
G = permutedims(G,[1,2,4,3,5]);
k = sqrt.(sum(abs2.(reshape(G,589824,1,1,14,6)),dims = 1));
G .*= k;
G = permutedims(G,[1,2,4,3,5]);
end
function b(G)
G = PermutedDimsArray(G,[1,2,4,3,5]);
G .*= sqrt.(sum(abs2, reshape(G,589824,1,1,14,6),dims = 1));
G = PermutedDimsArray(G,[1,2,4,3,5]);
end
function baz!(G)
for slice in eachslice(G; dims=(3, 5))
slice .*= sqrt(sum(abs2, slice))
end
return G
end
G = rand(8,256,14,288,6)+rand(8,256,14,288,6)*1im;
@btime a($G)
@btime b($G)
@btime baz!($G)
1.105 s (30 allocations: 1.85 GiB)
572.395 ms (27 allocations: 2.25 KiB)
290.433 ms (0 allocations: 0 bytes)