How to speed up permutedims for high dimensional tensors

Ah, the allocations suggests it might be related to tuple splatting,

julia> a = randn(fill(2, 14)...);

julia> @benchmark permutedims($a, $(randperm(14)))
BenchmarkTools.Trial: 
  memory estimate:  128.77 KiB
  allocs estimate:  8
  --------------
  minimum time:     77.574 μs (0.00% GC)
  median time:      83.264 μs (0.00% GC)
  mean time:        84.643 μs (1.13% GC)
  maximum time:     506.565 μs (82.24% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark reshape(a, 1<<7, 1<<7) * reshape(a, 1<<7, 1<<7)
BenchmarkTools.Trial: 
  memory estimate:  128.27 KiB
  allocs estimate:  6
  --------------
  minimum time:     136.216 μs (0.00% GC)
  median time:      210.382 μs (0.00% GC)
  mean time:        240.261 μs (1.67% GC)
  maximum time:     25.140 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

I thought it was fixed: https://github.com/JuliaLang/julia/pull/40468
But forgot I have to wait for Julia 1.7.