Implict function specialization vs. explict specialization vs. generated functions

question

#1

Is there any difference between these three ways of specializing a function that could affect performance? For instance, suppose we have functions f,g,h defined and called as follows:

f(a) = a+a

@generated g(a) = :(a+a)

h(a::Float64) = a+a
h(a::Int64) = a+a

f(1)
f(1.0)

g(1)
g(1.0)

h(1)
h(1.0)

My understanding is that after all this there will be two compiled versions of each function, a Float64 version and an Int64 version. And I presume the compiled code the same for each of the 3 functions. But in the limit that the functions have many different specializations, is the compilation and dispatch overhead (either in time or memory) similar in all three cases?


#2

With type stable arguments, f, g, and h are identical, as you surmise.

Here’s a benchmark for dynamic dispatch;

julia> f(a) = a+a
f (generic function with 1 method)

julia> @generated g(a) = :(a+a)
g (generic function with 1 method)

julia> h(a::Float64) = a+a
h (generic function with 1 method)

julia> h(a::Int64) = a+a
h (generic function with 2 methods)

julia> using BenchmarkTools

julia> const dynamic = Number[0, 0.0]

julia> for fn in [f, g, h]
           @btime $fn(dynamic[1]) + $fn(dynamic[2])
       end
  47.345 ns (2 allocations: 32 bytes)
  41.269 ns (2 allocations: 32 bytes)
  40.773 ns (2 allocations: 32 bytes)

Here we see that h is the fastest, followed by g and then f is much slower. In fact f is slower because of a compiler optimization gone wrong: f is simple enough to be inlined, so the compiler does so, and this inlines a dynamic dispatch to +, which is more expensive than dynamically dispatching to g or h which have much simpler method tables. In general this situation is rather rare; inlining doesn’t often hurt performance. So the result of this particular artificial benchmark should be taken with the understanding that in practice, slowdown due to inlining is not very common. If our functions are more expensive, perhaps to the extent that they are no longer feasible to inline, then there will be little difference between f and h (f would be a little faster because of the simpler method table).


#3

@fengyang.wang Thanks for the helpful response. It is good to know that the speed is essentially the same (apart from degenerate cases like the one above). The memory allocation shown in the benchmark appears to be just that needed to hold the function results. But what about the memory of the method definitions and dispatch table? I have seen a few posts (e.g. https://github.com/JuliaLang/julia/issues/7357#issuecomment-277056261, Is mem of compiled/eval'ed functions garbage collected?, and https://github.com/JuliaLang/julia/issues/18446) which indicate that defining lots of methods stresses the Julia runtime, with symptoms including high memory use and slightly degraded performance. I am curious as to whether generated functions or compiler specializations would have less overhead since there is (evidently) only one function definition specialized many times, instead of many definitions each specialized once.


#4

As a rule of thumb, using generated functions for anything that can be accomplished with regular dispatch is a bad idea and the compiler will make you pay for your transgression (in compile time, memory usage, etc.).