I am experiencing poor performance (maybe type unstable?) when having a type as an optional argument to a function, and using permutedims inside. Iโm having difficulty understanding why the first function is much slower than the second:
function test1(; FloatT=Float32)
    tmp = Array{FloatT}(undef, 100, 100, 100)
    new = permutedims(tmp, (3, 2, 1))
    for i in eachindex(new)
        new[i] = new[i] / 10
    end
    return new
end
function test2()
    tmp = Array{Float32}(undef, 100, 100, 100)
    new = permutedims(tmp, (3, 2, 1))
    for i in eachindex(new)
        new[i] = new[i] / 10
    end
    return new
end
And benchmarks (Julia 1.10):
julia> @benchmark test1()
BenchmarkTools.Trial: 75 samples with 1 evaluation.
 Range (min โฆ max):  60.314 ms โฆ 127.058 ms  โ GC (min โฆ max): 0.49% โฆ 49.45%
 Time  (median):     60.891 ms               โ GC (median):    0.91%
 Time  (mean ยฑ ฯ):   66.794 ms ยฑ  17.127 ms  โ GC (mean ยฑ ฯ):  9.11% ยฑ 13.86%
  โโ
  โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
  60.3 ms       Histogram: log(frequency) by time       126 ms <
 Memory estimate: 68.65 MiB, allocs estimate: 3998989.
julia> @benchmark test2()
BenchmarkTools.Trial: 2822 samples with 1 evaluation.
 Range (min โฆ max):  1.246 ms โฆ   9.494 ms  โ GC (min โฆ max):  0.00% โฆ 80.71%
 Time  (median):     1.567 ms               โ GC (median):     0.00%
 Time  (mean ยฑ ฯ):   1.770 ms ยฑ 527.255 ฮผs  โ GC (mean ยฑ ฯ):  11.85% ยฑ 16.25%
       โโโโโโโ
  โโโโโ
โโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
  1.25 ms         Histogram: frequency by time        3.33 ms <
 Memory estimate: 7.63 MiB, allocs estimate: 4.
Iโve tried with @code_warntype and Cthulhu but could not find a type instability. If I remove the call to permutedims, the functions are equally fast.