Why is functor slower than eval'ed function?

kevbonham · July 28, 2020, 9:20pm

I have a situation where I’d like to build some functions based on different inputs - using a callable function seems really clear to me, and has the functionality I want. I’d rather not use eval and other metaprogramming stuff because it seems way more complicated, but it seems like functors have a huge performance penalty.

It’s also possible I’m missing something critical and obvious, but this is my first foray into doing anything like this. Here’s a super simplified example:

julia> function makef(p1, p2, p3)
           eval(Meta.parse("f(x) = $p1*x^2 + $p2*x + $p3"))
       end
makef (generic function with 1 method)

julia> makef(3, 2, 1)
f (generic function with 1 method)

julia> struct Polynomial
           p1
           p2
           p3
       end

julia> function (p::Polynomial)(x)
           p.p1*x^2 + p.p2*x + p.p3
       end

julia> g = Polynomial(3,2,1)
Polynomial(3, 2, 1)

julia> using BenchmarkTools
[ Info: Precompiling BenchmarkTools [6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf]

julia> @btime f(2)
  0.030 ns (0 allocations: 0 bytes)
17

julia> @btime g(2)
  55.050 ns (0 allocations: 0 bytes)
17

Based on other discussions around stuff like this, I’m guessing something is being compiled away in f but not in g, but I’m not clear on why that would be, or if there’s a way to help g along a bit.

tomerarnon · July 28, 2020, 9:41pm

I think it’s the BenchmarkTools $ global variables thing.

julia> @btime g(2)
  85.875 ns (0 allocations: 0 bytes)
17

julia> @btime $g(2)
  0.027 ns (0 allocations: 0 bytes)
17

f doesn’t have the same problem:

julia> @btime f(2)
  0.027 ns (0 allocations: 0 bytes)
11

julia> @btime $f(2)
  0.028 ns (0 allocations: 0 bytes)
11

stevengj · July 28, 2020, 9:44pm

That’s because functions in Julia are const. In contrast, your variable g is not a constant.

bcmichael · July 28, 2020, 9:53pm

The 0.03 ns timing should be a big red flag that the function is being optimized away entirely. I’m pretty sure that’s substantially faster than the clock cycle of any computer I’m familiar with.

Oscar_Smith · July 28, 2020, 9:54pm

This code has a very important problem. The parameters of your struct aren’t typed. This means that the compiler will often have to do dispatch to figure out how to multiply p1*x. The solution is to make Polynomial a parimetric type. IE

struct Polynomial{T}
    p1::T
    p2::T
    p3::T
 end

tomerarnon · July 29, 2020, 12:44am

Yes, all of the above seems to make a difference, and my benchmarks were indeed wrong.
These are the corrected benchmarks:

julia> makef(3, 2, 1)
f (generic function with 1 method)

julia> g = Polynomial(3, 2, 1)
Polynomial{Int64}(3, 2, 1)

julia> h = Poly(3, 2, 1)
Poly(3, 2, 1)

julia> @btime f(Ref(2)[]);
  1.644 ns (0 allocations: 0 bytes)

julia> @btime g(Ref(2)[]);
  22.700 ns (0 allocations: 0 bytes)

julia> @btime $g(Ref(2)[]);
  1.643 ns (0 allocations: 0 bytes)

julia> @btime h(Ref(2)[]);
  79.174 ns (0 allocations: 0 bytes)

julia> @btime $h(Ref(2)[]);
  61.887 ns (0 allocations: 0 bytes)

kevbonham · July 29, 2020, 2:38pm

Thanks everyone! This is very clear and helpful info. I had one other question that I thought I would include here for the sake of completeness. For reference, on my system, the evaled version is 0.03 ns, and the naive, non-constant functor was 55 ns. Confirmed that if I do Polynomial{T} version suggested by @Oscar_Smith, and make it const as suggested by @tomerarnon and @stevengj, I get speeds similar to f(x).

I wasn’t sure if restricting the types of the fields without parameterizing the whole type would work, so I tried

julia> struct PolyReal
           p1::Real
           p2::Real
           p3::Real
       end

and it appears to be just as fast as f(x).

benchmark

julia> function (p::PolyReal)(x)
           p.p1*x^2 + p.p2*x + p.p3
       end

julia> const polyreal = PolyReal(3,2,1)
PolyReal(3, 2, 1)

julia> @btime polyreal(2)
  0.030 ns (0 allocations: 0 bytes)
17

julia> const polyrealmixed = PolyReal(3.,2,1) # in  case mixed types cause problems
PolyReal(3.0, 2, 1)

julia> @btime polyrealmixed(2) 
  0.032 ns (0 allocations: 0 bytes)
17.0

At least in this case. So it appears that in at least some cases, the compiler can peer into the struct to get the types. But this brings up another question - is there a good heuristic for when using {T} would be better, vs restricting types of individual fields (assuming I can do the latter in all the situations I care about)?

Or more specifically, aside from keeping all parameters the same type (which I understand would have benefits in some cases), is there a difference from the compiler’s perspective between:

struct Foo{T} where T <:Real
    x::T
    y::T
end

# and

struct Bar
    x::Real
    y::Real
end

??

tomerarnon · July 29, 2020, 2:40pm

Note that in my corrected benchmarks, I use Ref(2)[] as the argument to the function. If you pass in a literal 2, the compiler optimizes the function away, since it is being evaluated for a compile time constant. Try using the “Ref trick” to be sure you’re getting accurate results.

To answer your main question though:

So it appears that in at least some cases, the compiler can peer into the struct to get the types.

Yes, whenever types are given, the compiler has access to them. In your example for Foo, a Foo{Int} object declares to the compiler that x and y are Ints. The restriction on T tells the type Foo that it may only store things that are subtypes of Real. In your Bar example, the object tells the compiler that x and y are subsets of Real, but it doesn’t say which concrete type they are (it has no way of knowing). So they can be Int8s, Float64s, etc., meaning many optimizations are impossible with these values. Moreover, they may be different types in Bar, whereas they cannot be in Foo{T}.

But this brings up another question - is there a good heuristic for when using {T} would be better, vs restricting types of individual fields (assuming I can do the latter in all the situations I care about)?

Concrete types are critical for performance. If performance doesn’t matter in some part of code, then restricting to abstract types (like Bar) or not restricting at all is fine. If the type must be flexible in what it can store (Ints, Floats, etc) but still needs to be performant, that’s when parameters like T are needed.

kevbonham · July 29, 2020, 2:48pm

I was wondering about that! Should have asked what that was about. Interesting…

julia> @btime polyrealmixed(Ref(2)[])
  52.802 ns (2 allocations: 32 bytes)
17.0

julia> @btime polyreal(Ref(2)[])
  46.814 ns (0 allocations: 0 bytes)
17

julia> @btime polytc(Ref(2)[]) # typed constant
  1.371 ns (0 allocations: 0 bytes)
17

So I guess this answers my earlier question pretty definitively

Topic		Replies	Views
Some notes on performance of callable structs Performance	17	1019	December 30, 2023
Optimising function with const instead of struct? Performance	9	747	July 7, 2018
On the performance of function calls that depends on a variable Performance metaprogramming	12	953	February 24, 2021
Surprising runtime behaviour when wrapping functions Performance question	4	415	September 14, 2021
Best practice for storing functions in a struct General Usage	31	9745	February 4, 2021

Why is functor slower than eval'ed function?

Related topics