Tuples of closures

cgeoga · June 21, 2019, 1:15am

Hi all—I have a use case where I have a collection of functions that I want to include in a struct. I know the number of functions at compile time, but they will be made into closures with run-time data. It has been suggested to me that I could gain performance by making the relevant struct field a heterogeneous tuple instead of a Vector{Function}, which makes sense considering that Function is an abstract type. But I’m having trouble producing a minimal piece of code that demonstrates this benefit.

In the MWE-type snippet below, I have tried to make something relatively complicated and allocation-heavy to try and trick the compiler into not doing some optimization that it would do with the tuple of heterogeneous types and expose the benefit of the smarter typing:

using LinearAlgebra, BenchmarkTools

struct MaybeExpensive <: Function
  y::Vector{Float64}
  x::Vector{Float64}
end

function (D::MaybeExpensive)(z::Vector{Float64})
  return D.y.*dot(D.x, z) .+ det(D.x*z')
end

struct ShouldBeFaster{N,T<:Tuple{Vararg{<:Function, N}}}
  funs::T
end

struct ShouldBeSlower
  funs::Vector{Function}
end

sample = MaybeExpensive(randn(500), randn(500))
fast   = ShouldBeFaster((x->x+1, y->sqrt(y), sample))
slow   = ShouldBeSlower(collect(fast.funs))

rvec   = randn(500)

println("Benchmark for non-abstract types struct:")
@btime fast.funs[3]($rvec)
println()

println("Benchmark for abstract types struct:")
@btime slow.funs[3]($rvec)
println()

But these times and allocations are effectively the same. In fact, it isn’t unusual for the struct with the Vector{Function} field to do very slightly better.

My specific use case is that I’m going to use the elements of these structs/tuples to fill matrices. They are relatively fast functions to run, but I’m going to call them hundreds of thousands of times. Is there something I’m doing wrong in this code or in setting up the relevant benchmark?

Thank you for reading.

baggepinnen · June 21, 2019, 8:02am

If the function is expensive, you won’t see much difference from the dynamic dispatch. The difference becomes slightly more pronounced with a smaller vector

samp   = MaybeExpensive(randn(5), randn(5))
fast   = ShouldBeFaster((x->x+1, sqrt, samp))
slow   = ShouldBeSlower(collect(fast.funs))

rvec   = randn(5)

testfun(x,rvec) = x.funs[3](rvec)

println("Benchmark for non-abstract types struct:")
@btime testfun($fast, $rvec)
println()

println("Benchmark for abstract types struct:")
@btime testfun($slow, $rvec)
println()

Benchmark for non-abstract types struct:
  588.066 ns (6 allocations: 880 bytes)

Benchmark for abstract types struct:
  617.665 ns (6 allocations: 880 bytes)

rdeits · June 21, 2019, 11:50am

One thing that stands out is that you’re not interpolating $slow and $fast into the benchmarks, so those are still being treated as non-constant global variables. How do the results change if you do @btime ($fast).funs[3]($rvec) and @btime ($slow).funs[3]($rvec) ?

cgeoga · June 21, 2019, 3:23pm

Hello @baggepinnen and @rdeits—thank you very much for the replies, and apologies for incorrect benchmarking. I have an updated MWE-type sample below that more closely mimics the type of misdirection that my code really uses and fixed the interpolation mistake. Interestingly, now the Vector{Function} struct actually has a noticeably better speed.

using LinearAlgebra, BenchmarkTools

struct ShouldBeFast{N,T<:Tuple{Vararg{<:Function, N}}} 
  funs::T
end

struct ShouldBeSlow 
  funs::Vector{Function}
end

params = (2.0, 3.0)
fast   = ShouldBeFast((x->x+1, y->sqrt(y), (x,p)->p[1]/(p[2] + x)))
slow   = ShouldBeSlow(collect(fast.funs))
points = [(rand(), rand(), rand()) for _ in 1:500]

fillmat(F, idx, pt, pr) = [F.funs[idx](sum(abs2, xj.-yk), pr) for xj in pt, yk in pt]

println("Timing for non-abstract type struct:")
@btime fillmat($fast, $3, $points, $params)
println()

println("Timing for abstract type struct:")
@btime fillmat($slow, $3, $points, $params)

The output is now 13.165 ms for fast and 10.012 ms for slow, where slow makes one more (out of 750007) allocation than fast.

Does it make sense to conclude from this that for my use case I don’t stand to benefit very much from the heterogeneous typing?

Thank you for looking at this with me.

rdeits · June 21, 2019, 5:30pm

I think there’s something more basic going on, which is that it’s only type-stable to access a heterogeneous tuple if the compiler can actually see which index you’re using. For example:

julia> f(x, i) = x[i]
f (generic function with 1 method)

julia> @code_warntype f((1, "hello"), 2)
Body::Union{Int64, String}
1 ─ %1 = (Base.getfield)(x, i, true)::Union{Int64, String}
└──      return %1

From the types of the arguments to f, there’s no way to know what type will be returned, since that depends on the value of i.

That’s sort of what’s going on in your case, since fillmat takes the index as an input argument.

However, the compiler is pretty smart, and it can propagate literal constants through and resolve the issue. For example, if we write another function that calls f with some literal index, like this:

julia> g(x) = f(x, 1)
g (generic function with 1 method)

julia> @code_warntype g((1, "hello"))
Body::Int64
1 ─ %1 = (Base.getfield)(x, 1, true)::Int64
└──      return %1

Then that literal 1 gets propagated through to f(x, 1) and in turn to x[1] which the compiler can figure out.

So, in your case, do you see a different result if you change your benchmark to:

g(F, pt, pr) = fillmat(F, 3, pt, pr)

@btime g(...)

?

cgeoga · June 21, 2019, 7:38pm

Ah, that is a good point! I see why that is a problem. Interestingly, my results don’t really change when I use the g function you describe.

Is there a way for me to make a struct with a tuple of closures that avoids the type instability? Interestingly, even when I make something very static I get the same type instability:

struct MyCollection{A, B, C}
  vals :: Tuple{A, B, C}
end

fun(M::MyCollection, idx::Int64) = M.vals[idx]

@code_warntype fun(MyCollection((1, "is this", :Stable)), 1)
# output: ... ::Union{Int64, String, Symbol}...

I see why this makes sense, because even if the compiler knows A, B, and C, it still takes the idx arg that determines the output type at run-time. But considering that even in this simpler case it doesn’t work, is there any way to do this in my case (a variable number of closures) that is type-stable? I would now guess the answer is no.

EDIT:
I’m very new to thinking about really exploiting how much information I can share at compile time, so maybe this is really not the right thing to be doing. But if I rewrite fun as

fun(M::MyCollection, ::Val{N}) where{N} = M.vals[N]

then I do get type stability. Is something like this advisable?

SECOND EDIT:
I just tried the above function (passing a ::Val{N} where {N} instead of an Int64 at runtime) to the original MWE and now the speedup is huge. The Vector{Function} version make 750008 allocations and took 10ms and the tuple made 6 allocations and took 1.5ms!

Topic		Replies	Views
Access a tuple of structs using the values of an integer array Performance	16	191	February 12, 2024
A question about how arrays work, how memory is allocated and what happen when chunks of code inside a function are moved into another function Performance	10	376	May 13, 2022
[SOLVED] 37x performance hit when wrapping Refs. Any solution? Performance	2	460	May 20, 2021
Why is closure slower than handmade callable struct? Performance struct , closure	4	602	May 2, 2022
Some notes on performance of callable structs Performance	17	987	December 30, 2023

Tuples of closures

Related topics