Hi all—I have a use case where I have a collection of functions that I want to include in a struct. I know the number of functions at compile time, but they will be made into closures with run-time data. It has been suggested to me that I could gain performance by making the relevant struct field a heterogeneous tuple instead of a Vector{Function}
, which makes sense considering that Function
is an abstract type. But I’m having trouble producing a minimal piece of code that demonstrates this benefit.
In the MWE-type snippet below, I have tried to make something relatively complicated and allocation-heavy to try and trick the compiler into not doing some optimization that it would do with the tuple of heterogeneous types and expose the benefit of the smarter typing:
using LinearAlgebra, BenchmarkTools
struct MaybeExpensive <: Function
y::Vector{Float64}
x::Vector{Float64}
end
function (D::MaybeExpensive)(z::Vector{Float64})
return D.y.*dot(D.x, z) .+ det(D.x*z')
end
struct ShouldBeFaster{N,T<:Tuple{Vararg{<:Function, N}}}
funs::T
end
struct ShouldBeSlower
funs::Vector{Function}
end
sample = MaybeExpensive(randn(500), randn(500))
fast = ShouldBeFaster((x->x+1, y->sqrt(y), sample))
slow = ShouldBeSlower(collect(fast.funs))
rvec = randn(500)
println("Benchmark for non-abstract types struct:")
@btime fast.funs[3]($rvec)
println()
println("Benchmark for abstract types struct:")
@btime slow.funs[3]($rvec)
println()
But these times and allocations are effectively the same. In fact, it isn’t unusual for the struct with the Vector{Function}
field to do very slightly better.
My specific use case is that I’m going to use the elements of these structs/tuples to fill matrices. They are relatively fast functions to run, but I’m going to call them hundreds of thousands of times. Is there something I’m doing wrong in this code or in setting up the relevant benchmark?
Thank you for reading.