Performance of collections with Functions as abstract type

RGon · August 18, 2022, 2:43pm

I know from the docs I should avoid containers with abstract types, however I am running into a couple of use cases where I have tag calculation objects with arbitrary functions.

Base.@kwdef struct TagCalculation{T<:Function}
    name :: String
    calc :: T
    tags :: Vector{String}
end

However, I will have a vector of these and they will have different calculation types. I’ve noticed that if I have a vector of sin functions, I get this output

vsin = fill(sin, 100000)
100000-element Vector{typeof(sin)}:

If I change it so that one of them is a cos, I get

vf = [fill(sin, 99999);cos]
100000-element Vector{Function}:

So it automatically promotes to the type “Function”. Next if I try evaluating this on a vector of random numbers

x = randn(100000)
@time for ii in eachindex(vsin)
       vsin[ii](x[ii])
       end
  0.012697 seconds (498.98 k allocations: 9.140 MiB)

@time for ii in eachindex(vf)
       vf[ii](x[ii])
       end
  0.012592 seconds (498.98 k allocations: 9.140 MiB)

So performance is nearly identical in these cases. So is Function a special kind of abstract type that doesn’t suffer from performance in comparison to its concrete types, or is the performance penalty of abstraction hidden in the cost of something else (like calculating sin)? If I could break down common calculations into say, 20 categories, would it be worth the effort of grouping them into a NamedTuple of vectors (one vector for each calculation category) or would I be okay just lumping them into a single vector and getting their outputs (which are all Float64)?

DNF · August 18, 2022, 3:02pm

Benchmarking in global scope is not recommended, and it is possible that inefficiencies related to that are obscuring potential differences.

jishnub · August 18, 2022, 5:13pm

Benchmarking in a local scope shows:

julia> function test(v, x)
           y = similar(x)
           for ii in eachindex(v)
               y[ii] = v[ii](x[ii])
           end
           y
       end;

julia> using BenchmarkTools

julia> @btime test($vsin, $x);
  1.561 ms (2 allocations: 781.30 KiB)

julia> @btime test($vf, $x);
  6.122 ms (299491 allocations: 5.33 MiB)

Interestingly, if it’s known to be a small union of functions, one may use

julia> v2 = convert(Vector{Union{typeof(sin),typeof(cos)}}, vf);

julia> @btime test($v2, $x);
  1.621 ms (2 allocations: 781.30 KiB)

Perhaps there should be some heuristic to use a union as the eltype if there are only a few functions, instead of using the supertype.

RGon · August 18, 2022, 8:03pm

Right, thanks for the benchmarking tips! Unfortunately, the union of functions is unknown and would be pretty big in most cases. In one application they might be about 20-50 calculation types, in another application there might be more than a thousand.

That being said, the penalty only seems to be slowing it by a factor of 4 for a basic calculation. The calculations I plan on running are likely going to be more involved and include dictionary lookups, so the real-world impact of putting different functions together like this is probably going to be smaller with a huge benefit for simplification.

cgeoga · August 18, 2022, 8:39pm

You could also consider using FunctionWrappers.jl if you know the signature and input/output types of all the functions.

RGon · August 18, 2022, 9:04pm

Oh neat! I didn’t know that! I do know all the input and output type signatures (the input is always a single dataframe and the output is always a vector of floats). I don’t see much documentation here. How would I actually use it to wrap a function?

cgeoga · August 18, 2022, 9:48pm

Here’s a simple example. I see you already know that just fun::Function is an abstract type and bad for performance, but just in case somebody else stumbles on this some day down the road:


using FunctionWrappers
import FunctionWrappers: FunctionWrapper

struct BadStruct
  fun::Function
  second_arg::Float64
end

struct GoodStruct
  fun::FunctionWrapper{Float64, Tuple{Float64, Float64}}
  second_arg::Float64
end

evaluate_strfun(str, arg) = str.fun(arg, str.second_arg)

bad_example  = BadStruct(hypot, 1.0)
good_example = GoodStruct(hypot, 1.0)

# If you run these in a REPL that has colors enabled, you'll see more clearly.

@code_warntype evaluate_strfun(bad_example, 1.5)
#=
MethodInstance for evaluate_strfun(::BadStruct, ::Float64)
  from evaluate_strfun(str, arg) in Main at /home/cg/fwexample.jl:15
Arguments
  #self#::Core.Const(evaluate_strfun)
  str::BadStruct
  arg::Float64
Body::Any
1 ─ %1 = Base.getproperty(str, :fun)::Function
│   %2 = Base.getproperty(str, :second_arg)::Float64
│   %3 = (%1)(arg, %2)::Any
└──      return %3
=#

@code_warntype evaluate_strfun(good_example, 1.5)
#=
MethodInstance for evaluate_strfun(::GoodStruct, ::Float64)
  from evaluate_strfun(str, arg) in Main at /home/cg/fwexample.jl:15
Arguments
  #self#::Core.Const(evaluate_strfun)
  str::GoodStruct
  arg::Float64
Body::Float64
1 ─ %1 = Base.getproperty(str, :fun)::FunctionWrapper{Float64, Tuple{Float64, Float64}}
│   %2 = Base.getproperty(str, :second_arg)::Float64
│   %3 = (%1)(arg, %2)::Float64
└──      return %3
=#

Topic		Replies	Views
Performance of (a::AbstractType)(args...) function declarations Performance	3	484	October 12, 2019
How to avoid vectors of abstract types? Performance question	5	377	June 3, 2022
Looping over different types with common behavior Performance	9	1032	June 30, 2018
Performance issue with use of eltype()? General Usage performance	7	1059	September 7, 2017
Performance problem to apply a set of functions to one data General Usage	3	291	July 30, 2020

Performance of collections with Functions as abstract type

Related topics