Generic Functions Are Slow Here?

I write two functions that do the same thing, but one has allocations. All I change is the type from an abstract class to a concrete class.

julia> function baz((;h, t_w, E, F_y)::T) where T <: AISCSteel.Shapes.IShapes.AbstractRolledIShapes
           λ = h / t_w
           λ_p = 3.76 * sqrt(E / F_y)
           λ_r = 5.7 * sqrt(E / F_y)

           if λ <= λ_p
               class = :compact
           elseif λ_p < λ <= λ_r
               class = :noncompact
           else
               class = :slender
           end

           return λ, λ_p, λ_r, class
       end
baz (generic function with 1 method)

julia> @benchmark baz(w)
BenchmarkTools.Trial: 10000 samples with 991 evaluations per sample.
 Range (min … max):  42.129 ns … 669.190 ns  ┊ GC (min … max): 0.00% … 89.71%
 Time  (median):     45.114 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   45.697 ns ±  19.402 ns  ┊ GC (mean ± σ):  1.30% ±  2.85%

   ▁▁       ▆▄▄▂        ▄█▆▄▂▁▁   ▁   ▁▄▂      ▁               ▂
  ███▆▁▁▁▃▃▆█████▇▅▃▃▄▄▅███████▇▆▇██▆▆████▇▇▇▅▇█▇▆▅▄▄▃▄▅▅▅▅▇█▆ █
  42.1 ns       Histogram: log(frequency) by time      49.7 ns <

 Memory estimate: 48 bytes, allocs estimate: 1.

vs

julia> function test((;h, t_w, E, F_y)::AISCSteel.Shapes.IShapes.RolledIShapes.WShape)
                  λ = h / t_w
                  λ_p = 3.76 * sqrt(E / F_y)
                  λ_r = 5.7 * sqrt(E / F_y)

                  if λ <= λ_p
                      class = :compact
                  elseif λ_p < λ <= λ_r
                      class = :noncompact
                  else
                      class = :slender
                  end

                  return λ, λ_p, λ_r, class
              end
test (generic function with 1 method)

julia> @benchmark test(w)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  5.667 ns … 11.625 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.750 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.777 ns ±  0.109 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▃         █        █         ▃         ▂        ▂ ▂
  ▃▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█ █
  5.67 ns      Histogram: log(frequency) by time     5.92 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Is this expected?

Interpolate $w in your benchmark real quick

3 Likes

Thanks!! I spent some time today tracking type instabilities for the first time. I used Cthulu.jl and @code_warntype and I was pretty surprised how straight forward it was. The allocation was the last thing that was giving me troubles and it looks like I just wasn’t interpolating like you said.

For future reference, the difference was down to a brittle compiler optimization. BenchmarkTools inserts the expression into a function that treats interpolated names as local arguments and uninterpolated names as accessing global variables. baz(w) accesses what is likely an Any-typed variable w, and that’s responsible for the extra runtime dispatch observed (1 48-byte allocation is typical). test(w) also does that, but the compiler recognized that the only method (really <4 methods) of test accepts a concrete WShape input, so the dispatch is instead handled as a runtime type check of w branching to a MethodError or a statically dispatched and possible inlined test(w). CPU branch prediction further reduces the overhead. This optimization is leveraged rarely, as requiring a generic function to have <4 methods with limited type annotations is not nice to work with.

1 Like