FunctionWrappers showing ~25x overhead over a native function call

Nitin_Arora · June 17, 2017, 8:53am

I am seeing performance a big difference between using FunctionWrappers and native function call. Is this to be expected ?
Please see the MWE below:

using BenchmarkTools  
using FunctionWrappers: FunctionWrapper 

struct A{T}
   x::T
end

get_power(a::A{T}, y::T,z::T) where {T} = a.x + y + z

const Fwarpper = FunctionWrapper{Float64,Tuple{A{Float64},Float64,Float64}}
const dict = Vector{Fwarpper}(1)

dict[1] = Fwarpper(get_power)
a = A(1.5)

@btime @inbounds get_power($a,5.6,7.87);
  0.488 ns (0 allocations: 0 bytes)

@btime @inbounds dict[1]($a,5.6,7.87);
  12.863 ns (0 allocations: 0 bytes)

This suggest that there is a ~25X overhead for this case ?
Is it normal to expect this overhead between explicitly calling a function and calling a function via FunctionWrappers?
It seems like this [post] (Performance of functions returned from container - #3 by kristoffer.carlsson) @yuyichao seemed to suggest there there is no overhead.

My Julia VersionInfo

Commit 68e911b (2017-05-18 02:31 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)

I might be doing this all wrong as I am using FunctionWrappers for the first time.

thanks!

kristoffer.carlsson · June 17, 2017, 9:33am

Firstly, remove the @inbounds in front of the get_power. It makes the expression not return anything so everything is optimized away. Also, why are you even using it here at all?

Secondly, why are you benchmarking the time it takes to do the dictionary lookup for the FunctionWrapper benchmarking but not for the standard function?

Nitin_Arora · June 17, 2017, 10:12am

Thanks for the reply @kristoffer.carlsson and the @inbounds tip. Still have lots to learn. My bad, Dict is a bad variable name, its actually just an 1d Array

I removed the @inbounds and tried removing array overhead completely:

using BenchmarkTools  
using FunctionWrappers: FunctionWrapper 

struct A{T}
   x::T
end

get_power(a::A{T}, y::T,z::T) where {T} = a.x + y + z

const Fwarpper = FunctionWrapper{Float64,Tuple{A{Float64},Float64,Float64}}

f =  Fwarpper(get_power)
a = A(1.5)

@btime get_power($a,5.6,7.87);
  2.794 ns (0 allocations: 0 bytes)

@btime $f($a,5.6,7.87);
  10.557 ns (0 allocations: 0 bytes)

Now the overhead is ~3.7x, which looks better, but I don’t under why there is any.

thanks.

dpsanders · June 17, 2017, 10:25am

The type of a.x is also Float64 as far as I can see.

kristoffer.carlsson · June 17, 2017, 10:32am

A few things you can do more. Put @noinline on get_power to prevent it from inlining. Also, FunctionWrappers does a null check for precompilation support.

Nitin_Arora · June 17, 2017, 10:51am

No sure what you wanted to tell me. I am assuming Julia compiler can infer that.

Nitin_Arora · June 17, 2017, 10:53am

Just tried your suggestions

@noinline get_power(a::A{T}, y::T,z::T) where {T} = a.x + y + z

@btime get_power($a,5.6,7.87);
  4.749 ns (0 allocations: 0 bytes)

So now the overhead is ~2x, probably due to the null check, which I have no idea how to disable. But this means that there is performance penalty to FunctionWrappers of not having inlining. Am I correct ?

Thanks

kristoffer.carlsson · June 17, 2017, 10:55am

Yes and that is the whole point of the package.

Nitin_Arora · June 17, 2017, 11:11am

Understood. Thanks

dpsanders · June 17, 2017, 11:26am

Sorry, I was confused. Ignore my comment.

yuyichao · June 17, 2017, 11:43am

This is correct. The main idea of the package is in the julia issue linked in the readme. For this particular property,

In these cases, it is acceptable that the callback function is not/cannot be inlined to the caller but it is also desired to avoid necessary boxing or runtime type check/dispatch.

In another word, the whole point is the you’ll have a single callsite that can be calling different functions with the same signature that you don’t want to specialize on.

Topic		Replies	Views
FunctionWrapper allocating when called Performance	10	158	September 26, 2024
Performance and memory regression with function wrappers in Julia 1.4+ General Usage	4	504	August 9, 2020
Surprising runtime behaviour when wrapping functions Performance question	4	415	September 14, 2021
Closure which discards the return is slower? Performance	7	807	October 20, 2017
Call function on vectors of mixed type (using `FunctionWrapper` and `Union`s) Performance	2	246	October 12, 2023

FunctionWrappers showing ~25x overhead over a native function call

Related topics