I created this MWE based on some package code, and Iβm not getting why the two different functions have not similar performance:
using BenchmarkTools
mutable struct A
q_0::Int
q_1::Int
q_2::Int
end
@noinline function g_1()
for n in 1:1000
a = A(rand(1:2), 1, 1)
f(a)
end
end
@noinline function g_2()
for n in 1:1000
f(A, 1, 1)
end
end
@noinline function f(a::A)
return a
end
@noinline function f(a::Type{A}, properties...)
return a(rand(1:2), properties...)
end
@benchmark g_1()
@benchmark g_2()
But I did modify it to get it to not inline, yet it still performed much better, because β
β I donβt know why youβre getting 3 allocations/call for f(::Type{A}, args...). Iβd think there should be only one.
Still, marking that one @inline instead will of course result in them matching performance.
g_1 runs with no allocations despite constructing mutable instances because the compiler recognized those instances have a fixed size and never make it out of that for-loop, so it moved it to the stack. If you made a local a outside the loop and returned it at the end, g_1 would do 1 allocation for that. g_2 canβt leverage this optimization because the construction happens in a separate @noinline function that the instance escapes, so the compiler must allocate it on the heap for the functions to share.
The other thing is Julia doesnβt automatically specialize on Function, Type, or Vararg that are only passed as arguments to other function calls. Replace , properties... with , i1, i2 and you shave 3000 to the expected 1000 allocations. As you said, this is only a problem with @noinline, replacing it with @inline makes all the allocations go away.
thanks @Elrod but I actually used the @noinline trick to show the difference I see in the actual code where there arenβt either @inline nor @noinline (but I actually noticed that on the actual code if I use @inline there is a big difference in performance)
thanks @benny I actually need properties... I think since the number of properties is actually unknown in the real code.
Given all of this, do you think using @inline in the actual code should fix the problem anyway even with different number (and types) of properties? And why does using @inline make a so big difference? you can see the βactual codeβ here if interested Agents.jl/issues/820
When a function call isnβt inlined, you jump from one function to another. That second function is compiled in isolation, and the compiled code is reused when the function is called anywhere else itβs not inlined.
When a function call is inlined, its code is pasted into the caller functionβs code, so itβs compiled along with the caller function. That compiled inlined code cannot be reused anywhere else, so it is customized for the caller function. For example, when inlining f(A, 1, 1), the compiler likely took the body a(rand(1:2), properties...) and put in the arguments, making A(rand(1:2), 1, 1). This is identical to g_1 and can use the same optimizations.
Inlining doesnβt always improve things. Compile times and code size often increases, and inlining too large functions can cause instruction cache misses. You can leave most of the inlining decisions to Juliaβs compiler, use @inline or @noinline to make suggestions (not guarantees) on a case-by-case basis.
One thing that could help is adding at least 1 type parameter for your method to force specialization, there is an example in the section of Performance Tips. That is, if you want the method to be compiled separately for f(A, 1, 1) vs f(A, 1, 1, 1).
The reason why the Julia compiler doesnβt automatically specialize this case is because when you have arbitrarily many combinations of Vararg, compiling each of them separately may not be worth it; itβs only worth it with fewer unique call signatures and more repeated uses of each of them. Carefully consider how the actual code is intended to be used, a benchmark repeating 1 call does not take into account the loss of performance in compiling too many call signatures with little reuse.
This reminds me, so far weβve only been talking about changing properties of the method, whether itβs annotating @inline/@noinline or add a method type parameter to force specialization. But @inline/@noinline can also be put at a specific function call, and it will override the annotation at the method itself. As weβve pointed out, successful inlining into g_2 specializes the call and eliminates the allocations just like g_1, no need to add a method parameter:
@noinline function g_2()
for n in 1:1000
@inline f(A, 1, 1) # overrides @noinline of f
end
end
@noinline function f(a::Type{A}, properties...)
return a(rand(1:2), properties...)
end
@time g_2() # no allocations
I see equivalent performance with the inlined parametric method, so I canβt explain why itβs not the same on your machine. Benchmark timings can vary when your machine is multitasking with other processes.
you are right, they have the same performance (I mistakenly used two different Julia versions, thatβs why it was different), much appreciated both the more thoroughly explainations on Vararg and @inline anyway!