I am seeing performance a big difference between using FunctionWrappers
and native function call. Is this to be expected ?
Please see the MWE below:
using BenchmarkTools
using FunctionWrappers: FunctionWrapper
struct A{T}
x::T
end
get_power(a::A{T}, y::T,z::T) where {T} = a.x + y + z
const Fwarpper = FunctionWrapper{Float64,Tuple{A{Float64},Float64,Float64}}
const dict = Vector{Fwarpper}(1)
dict[1] = Fwarpper(get_power)
a = A(1.5)
@btime @inbounds get_power($a,5.6,7.87);
0.488 ns (0 allocations: 0 bytes)
@btime @inbounds dict[1]($a,5.6,7.87);
12.863 ns (0 allocations: 0 bytes)
This suggest that there is a ~25X overhead for this case ?
Is it normal to expect this overhead between explicitly calling a function and calling a function via FunctionWrappers
?
It seems like this [post] (Performance of functions returned from container - #3 by kristoffer.carlsson) @yuyichao seemed to suggest there there is no overhead.
My Julia VersionInfo
Commit 68e911b (2017-05-18 02:31 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)
I might be doing this all wrong as I am using FunctionWrappers
for the first time.
thanks!
Firstly, remove the @inbounds
in front of the get_power
. It makes the expression not return anything so everything is optimized away. Also, why are you even using it here at all?
Secondly, why are you benchmarking the time it takes to do the dictionary lookup for the FunctionWrapper benchmarking but not for the standard function?
2 Likes
Thanks for the reply @kristoffer.carlsson and the @inbounds
tip. Still have lots to learn. My bad, Dict
is a bad variable name, its actually just an 1d Array
I removed the @inbounds
and tried removing array overhead completely:
using BenchmarkTools
using FunctionWrappers: FunctionWrapper
struct A{T}
x::T
end
get_power(a::A{T}, y::T,z::T) where {T} = a.x + y + z
const Fwarpper = FunctionWrapper{Float64,Tuple{A{Float64},Float64,Float64}}
f = Fwarpper(get_power)
a = A(1.5)
@btime get_power($a,5.6,7.87);
2.794 ns (0 allocations: 0 bytes)
@btime $f($a,5.6,7.87);
10.557 ns (0 allocations: 0 bytes)
Now the overhead is ~3.7x, which looks better, but I don’t under why there is any.
thanks.
The type of a.x is also Float64 as far as I can see.
A few things you can do more. Put @noinline
on get_power
to prevent it from inlining. Also, FunctionWrappers does a null check for precompilation support.
1 Like
No sure what you wanted to tell me. I am assuming Julia compiler can infer that.
Just tried your suggestions
@noinline get_power(a::A{T}, y::T,z::T) where {T} = a.x + y + z
@btime get_power($a,5.6,7.87);
4.749 ns (0 allocations: 0 bytes)
So now the overhead is ~2x, probably due to the null check, which I have no idea how to disable. But this means that there is performance penalty to FunctionWrappers of not having inlining. Am I correct ?
Thanks
Yes and that is the whole point of the package.
2 Likes
Sorry, I was confused. Ignore my comment.
This is correct. The main idea of the package is in the julia issue linked in the readme. For this particular property,
In these cases, it is acceptable that the callback function is not/cannot be inlined to the caller but it is also desired to avoid necessary boxing or runtime type check/dispatch.
In another word, the whole point is the you’ll have a single callsite that can be calling different functions with the same signature that you don’t want to specialize on.
1 Like