Type stability with type as argument and permutedims

I am experiencing poor performance (maybe type unstable?) when having a type as an optional argument to a function, and using permutedims inside. Iโ€™m having difficulty understanding why the first function is much slower than the second:

function test1(; FloatT=Float32)
    tmp = Array{FloatT}(undef, 100, 100, 100)
    new = permutedims(tmp, (3, 2, 1))
    for i in eachindex(new)
        new[i] = new[i] / 10
    end
    return new
end

function test2()
    tmp = Array{Float32}(undef, 100, 100, 100)
    new = permutedims(tmp, (3, 2, 1))
    for i in eachindex(new)
        new[i] = new[i] / 10
    end
    return new
end

And benchmarks (Julia 1.10):

julia> @benchmark test1()
BenchmarkTools.Trial: 75 samples with 1 evaluation.
 Range (min โ€ฆ max):  60.314 ms โ€ฆ 127.058 ms  โ”Š GC (min โ€ฆ max): 0.49% โ€ฆ 49.45%
 Time  (median):     60.891 ms               โ”Š GC (median):    0.91%
 Time  (mean ยฑ ฯƒ):   66.794 ms ยฑ  17.127 ms  โ”Š GC (mean ยฑ ฯƒ):  9.11% ยฑ 13.86%

  โ–ˆโ–‚
  โ–ˆโ–ˆโ–†โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–†โ–†โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–„โ–„โ–„ โ–
  60.3 ms       Histogram: log(frequency) by time       126 ms <

 Memory estimate: 68.65 MiB, allocs estimate: 3998989.

julia> @benchmark test2()
BenchmarkTools.Trial: 2822 samples with 1 evaluation.
 Range (min โ€ฆ max):  1.246 ms โ€ฆ   9.494 ms  โ”Š GC (min โ€ฆ max):  0.00% โ€ฆ 80.71%
 Time  (median):     1.567 ms               โ”Š GC (median):     0.00%
 Time  (mean ยฑ ฯƒ):   1.770 ms ยฑ 527.255 ฮผs  โ”Š GC (mean ยฑ ฯƒ):  11.85% ยฑ 16.25%

       โ–โ–ƒโ–‡โ–ˆโ–„โ–ƒโ–
  โ–ƒโ–‚โ–‚โ–ƒโ–…โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–…โ–„โ–ƒโ–‚โ–‚โ–‚โ–โ–‚โ–โ–โ–‚โ–โ–โ–โ–โ–‚โ–โ–โ–‚โ–‚โ–โ–‚โ–‚โ–‚โ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–‚โ–‚โ–‚โ–‚โ–ƒโ–‚โ–‚โ–‚โ–‚โ–‚โ–‚ โ–ƒ
  1.25 ms         Histogram: frequency by time        3.33 ms <

 Memory estimate: 7.63 MiB, allocs estimate: 4.

Iโ€™ve tried with @code_warntype and Cthulhu but could not find a type instability. If I remove the call to permutedims, the functions are equally fast.

This is one of these cases, where Julia does not specialize: Performance Tips ยท The Julia Language

1 Like

As a workaround, you may create a function barrier to enforce the specialization:

julia> function test2(; FloatT=Float32)
           tmp = Array{FloatT}(undef, 100, 100, 100)
           (tmp -> begin
               new = permutedims(tmp, (3, 2, 1))
               for i in eachindex(new)
                   new[i] = new[i] / 10
               end
               return new
           end)(tmp)
       end
test2 (generic function with 1 method)

julia> @btime test2();
  1.889 ms (4 allocations: 7.63 MiB)

Thank you for the replies, it was very helpful. Seems like the easiest solution is to do something like:

function test3(; t::Type{T}=Float32) where T
    tmp = Array{T}(undef, 100, 100, 100)
    new = permutedims(tmp, (3, 2, 1))
    for i in eachindex(new)
       new[i] = new[i] / 10
    end
    return new
end
@benchmark test3(t=Float32)
BenchmarkTools.Trial: 3033 samples with 1 evaluation.
 Range (min โ€ฆ max):  1.068 ms โ€ฆ   3.514 ms  โ”Š GC (min โ€ฆ max):  0.00% โ€ฆ 55.37%
 Time  (median):     1.488 ms               โ”Š GC (median):     0.00%
 Time  (mean ยฑ ฯƒ):   1.647 ms ยฑ 420.597 ฮผs  โ”Š GC (mean ยฑ ฯƒ):  10.16% ยฑ 14.73%

             โ–„โ–†โ–†โ–ˆโ–ƒ
  โ–‚โ–‚โ–…โ–‚โ–‚โ–โ–‚โ–ƒโ–ƒโ–‚โ–ƒโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–†โ–„โ–ƒโ–‚โ–‚โ–‚โ–‚โ–‚โ–โ–โ–โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–ƒโ–ƒโ–‚โ–ƒโ–ƒโ–‚โ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–‚ โ–ƒ
  1.07 ms         Histogram: frequency by time        2.79 ms <

 Memory estimate: 7.63 MiB, allocs estimate: 4.
2 Likes