Ah, the allocations suggests it might be related to tuple splatting,
julia> a = randn(fill(2, 14)...);
julia> @benchmark permutedims($a, $(randperm(14)))
BenchmarkTools.Trial:
memory estimate: 128.77 KiB
allocs estimate: 8
--------------
minimum time: 77.574 μs (0.00% GC)
median time: 83.264 μs (0.00% GC)
mean time: 84.643 μs (1.13% GC)
maximum time: 506.565 μs (82.24% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark reshape(a, 1<<7, 1<<7) * reshape(a, 1<<7, 1<<7)
BenchmarkTools.Trial:
memory estimate: 128.27 KiB
allocs estimate: 6
--------------
minimum time: 136.216 μs (0.00% GC)
median time: 210.382 μs (0.00% GC)
mean time: 240.261 μs (1.67% GC)
maximum time: 25.140 ms (0.00% GC)
--------------
samples: 10000
evals/sample: 1
I thought it was fixed: https://github.com/JuliaLang/julia/pull/40468
But forgot I have to wait for Julia 1.7.