Eliminating allocations with functions in struct


#1

Hello all,

I thought that the following would not result in dynamic dispatch, but I am still seeing allocation:

module DispTests

struct ParamFun{N,F}
    fₙ::F
    pₙ::NTuple{N,Vector{Float64}}
end

ParamFun{N}(f,p) where N = ParamFun{N,typeof(f)}(f,p)

function run_that_fun(f::ParamFun{N,F},x) where {N,F}
    r = 0.0
    @inbounds for i = 1:N
        r += (f.fₙ[i])(x,f.pₙ[i])
    end
    r
end

function example()
    f1(x,p) = abs(x)^p[1] + p[2]
    f2(x,p) = cos(p[1]*x) + sin(p[2]*x)
    f3(x,p) = x*(p[1]+p[2]+p[3])
    return ParamFun{3}((f1,f2,f3),([1.0,0.3],[0.2,1],[1,2,3.0]))
end
end
using BenchmarkTools
f = DispTests.example()
@benchmark DispTests.run_that_fun($f,2.0)
BenchmarkTools.Trial: 
  memory estimate:  144 bytes
  allocs estimate:  9
  --------------
  minimum time:     242.228 ns (0.00% GC)
  median time:      258.244 ns (0.00% GC)
  mean time:        269.538 ns (1.58% GC)
  maximum time:     2.345 μs (79.14% GC)
  --------------
  samples:          10000
  evals/sample:     438

Can I do this without using FunctionWrappers? If not, how would one go about using function wrappers for this, I can’t figure it out…

Thanks,
Jeremy


#2

The issue is in

    @inbounds for i = 1:N
        r += (f.fₙ[i])(x,f.pₙ[i])
    end

f.fₙ[i] and f.pₙ[i] will have different types in each iteration, making the code type-unstable, as witnessed by

@code_warntype DispTests.run_that_fun(f, 2.0)

If you’re trying to avoid all allocations, the first thing to check for should always be type stability.

One way to avoid the allocations is by using tuple recursion:

import Base: tail

paramfunsum(f::Tuple{}, x, p::Tuple{}) = 0 # base case
paramfunsum(f::Tuple, x, p::Tuple) = f[1](x, p[1]) + paramfunsum(tail(f), x, tail(p))

function run_that_fun(f::ParamFun{N,F},x) where {N,F}
    paramfunsum(f.fₙ, x, f.pₙ)
end

which results in

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     42.578 ns (0.00% GC)
  median time:      44.061 ns (0.00% GC)
  mean time:        46.218 ns (0.00% GC)
  maximum time:     163.997 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     990

Note that this relies on paramsumfun being inlined; in some cases you may have to force that using @inline. There is also a limit on how long the tuples can be before the compiler will give up and start allocating. If you’re running into that limit, you could use @generated functions instead, but the type of recursion above is typically preferred if possible.