Bad performance of anonymous functions even when they are type-stable

MWE:

julia> f1 = function (x)
                x/2
            end
#3 (generic function with 1 method)

julia> function f1_2(x)
           x/2
       end
f1_2 (generic function with 1 method)

julia> f2 = function (x)
                x
            end
#5 (generic function with 1 method)

julia> function f2_2(x)
           x
       end
f2_2 (generic function with 1 method)

julia> using BenchmarkTools

julia> @benchmark f1(0.1)
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  16.517 ns … 360.561 ns  β”Š GC (min … max): 0.00% … 90.23%
 Time  (median):     19.019 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   20.266 ns Β±   9.995 ns  β”Š GC (mean Β± Οƒ):  0.44% Β±  1.54%

     ▁    β–ˆ
  β–‚β–„β–†β–ˆβ–ƒβ–…β–†β–†β–ˆβ–ƒβ–‚β–ƒβ–„β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–‚β–‚β–β–‚β–‚β–‚ β–ƒ
  16.5 ns         Histogram: frequency by time         34.9 ns <

 Memory estimate: 16 bytes, allocs estimate: 1.

julia> @benchmark f1_2(0.1)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  0.800 ns … 237.800 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     0.800 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   0.904 ns Β±   3.317 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–ˆ    β–‡    β–‚                                                 ▁
  β–ˆβ–β–β–β–β–ˆβ–β–β–β–β–ˆβ–β–β–β–β–β–„β–β–β–β–β–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–‡ β–ˆ
  0.8 ns       Histogram: log(frequency) by time       1.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark f2(0.1)
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  11.712 ns …  1.559 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     13.614 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   15.054 ns Β± 19.157 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

      β–ˆβ–ˆ  β–‡β–‚
  β–‚β–ƒβ–„β–†β–ˆβ–ˆβ–†β–…β–ˆβ–ˆβ–…β–„β–„β–‚β–‚β–‚β–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–„β–‚β–ƒβ–‚β–‚β–‚β–‚β–‚β–β–‚β–‚β–β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚ β–ƒ
  11.7 ns         Histogram: frequency by time        27.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark f2_2(0.1)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  0.800 ns … 277.500 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     0.800 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   0.917 ns Β±   3.086 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–ˆ                            β–„
  β–ˆβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ˆβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ƒ β–‚
  0.8 ns          Histogram: frequency by time           1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Even though all functions are type-stable, the anonymous versions are drastically slower than the non-anonymous ones. Aren’t they supposed to be as generic as the latter?

BTW for both Julia 1.9.4 and 1.10.0 on my local machine, the same issue occurs.

Hence, if I have to generate functions on the fly (which is why I want to use anonymous functions), how can I achieve the same performance as the non-anonymous function versions, assuming I can ensure type stability?

Thanks!

You are introducing type instability by not making the global anonymous functions const.

julia> f1 = function (x)
           x/2
       end
#3 (generic function with 1 method)

julia> function f1_2(x)
           x/2
       end
f1_2 (generic function with 1 method)

julia> const f1_const = function (x)
           x/2
       end
#5 (generic function with 1 method)

julia> using BenchmarkTools

julia> @benchmark f1(0.1)
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min … max):  15.499 ns … 816.685 ns  β”Š GC (min … max): 0.00% … 94.69%
 Time  (median):     15.917 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   16.896 ns Β±  11.634 ns  β”Š GC (mean Β± Οƒ):  1.11% Β±  1.62%

   β–ƒβ–…β–ˆβ–‡β–ˆβ–…β–„β–β–               ▁   β–ƒβ–„β–„β–…β–ƒβ–„β–‚β–‚ ▁▁▃▂▃▂▃▁▁              β–‚
  β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–ƒβ–…β–„β–…β–β–ƒβ–β–„β–„β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–ˆβ–‡β–‡β–†β–ˆβ–†β–‡β–†β–†β–„β–…β–† β–ˆ
  15.5 ns       Histogram: log(frequency) by time      20.2 ns <

 Memory estimate: 16 bytes, allocs estimate: 1.

julia> @benchmark f1_2(0.1)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.770 ns … 21.875 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     1.771 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   1.803 ns Β±  0.264 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

   β–ˆ                                                       β–…
  β–…β–ˆβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ƒβ–ˆ β–‚
  1.77 ns        Histogram: frequency by time        1.82 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark f1_const(0.1)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.770 ns … 32.813 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     1.823 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   1.830 ns Β±  0.443 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–ˆ             β–ˆ
  β–ˆβ–β–β–β–β–β–β–β–β–β–β–β–β–β–ˆβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–„β–β–β–β–β–β–β–β–β–β–β–β–β–β–…β–β–β–β–β–β–β–β–β–β–β–β–β–β–‚ β–‚
  1.77 ns        Histogram: frequency by time        1.98 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> isconst(Main, :f1)
false

julia> isconst(Main, :f1_2)
true

julia> isconst(Main, :f1_const)
true

Essentially global values are type unstable because their types can change at any time. Another solution is to introduce some kind of local scope.

julia> let f1_let = function (x)
               x/2
           end
           @benchmark $f1_let(1.2)
       end
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.406 ns … 64.479 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     1.458 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   1.470 ns Β±  0.666 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

                              β–ˆβ–
  β–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ˆβ–ˆβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–‚ β–‚
  1.41 ns        Histogram: frequency by time        1.51 ns <

julia> g() = begin
           f1_local = x->x/2
           @benchmark $f1_local(8.2)
       end
g (generic function with 1 method)

julia> g()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.406 ns … 10.000 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     1.458 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   1.451 ns Β±  0.139 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  ▁                           β–ˆβ–
  β–ˆβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ˆβ–ˆβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–‚ β–‚
  1.41 ns        Histogram: frequency by time        1.51 ns <
4 Likes

Thanks for the reply! This makes sense to me now. Even though the anonymous function has a specific type, when assigned to a global variable f1 or f2, these variables are not type-stable because, unlike the function name for non-anonymous functions, they can be reassigned with other types of values.

Based on this understanding, I interpolated the value of f1 before running the benchmark, and they became as performant as the non-anonymous counterparts:

julia> @benchmark ($f1)(0.1)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  0.800 ns … 126.300 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     0.900 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   1.113 ns Β±   1.667 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–ˆ   β–ˆ         ▁                                  ▁    β–†   β–‚ β–‚
  β–ˆβ–β–β–β–ˆβ–β–β–β–β–ˆβ–β–β–β–β–ˆβ–β–β–β–β–β–β–β–β–β–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ˆβ–β–β–β–β–ˆβ–β–β–β–ˆ β–ˆ
  0.8 ns       Histogram: log(frequency) by time         2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark ($f2)(0.1)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  0.800 ns … 196.200 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     0.800 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   0.926 ns Β±   2.960 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–ˆ             β–‡              β–ƒ              ▁               ▁
  β–ˆβ–β–β–β–β–β–β–β–β–β–β–β–β–β–ˆβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ˆβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ˆβ–β–β–β–β–β–β–β–β–β–β–β–β–β–ˆ β–ˆ
  0.8 ns       Histogram: log(frequency) by time       1.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.