# Store a sum of parameterized scalar functions

Hello everyone,

I need to compute the same sum of scalar functions (constant, linear, Gaussian kernel…)
for a large number of different values. I would like to create a structure that will hold the sum of these functions to speed-up the computations

I have created an abstract type `ParamFcn` for these scalar functions. Each function is then a subtype of `ParamFcn`. For instance `rbf` is the subtype for a Gaussian kernel parameterized by a mean `μ` and std `σ`.

I have created a subtype of `ParamFcn` called `AddedParamFcn` that holds a tuple of `ParamFcn` to compute the sum of the scalar functions.

However, even for simple computations, this way of proceeding is actually slower than taking the sum of the different functions . See example. I would expect to get some speed-up with this approach since the the functions can be precomputed.

Should I use a Tuple to store the different ParamFcn functions, or can I create an `Array{ParamFcn,1}\$ ( I don’t know how to implement this second option) since I know in advance the number of scalar functions.

What is a better way to implement this?

``````import Base: show, zero

# Type to hold 1D functions
abstract type ParamFcn end

# zero(::T) where {T <: ParamFcn} = T()

struct AddedParamFcn{T <: Tuple} <: ParamFcn
list::T
end
for g in Σg.list
println(io, "  \$g")
end
end

"""
g₁::ParamFcn + g₂::ParamFcn

Add the parameterized functions so that `(g₁ + g₂)(z) = g₁(z) + g₂(z)`.
"""
end

struct constant <:ParamFcn
end

const Cst = constant()

(C::constant)(z::T) where {T<:Real} = 1.0

struct linear <:ParamFcn
end

const Lin = linear()

(L::linear)(z::T) where {T<:Real} = z

# Gaussian Kernel
struct rbf <:ParamFcn
μ::Float64
σ::Float64
end

function show(io::IO, N::rbf)
println(io,"Gaussian kernel with mean \$(N.μ) and std \$(N.σ)")
end

(N::rbf)(z::T)  where {T<:Real} = 1/(N.σ*√(2*π))*exp(-(0.5/N.σ^2)*(z-N.μ)^2)

struct ψ₀ <:ParamFcn
ξ₀::Float64
σ₀::Float64
end

function show(io::IO, F::ψ₀)
println(io,"ψ₀ function with mean \$(F.ξ₀) and std \$(F.σ₀)")
end

function (F::ψ₀)(z::T) where {T<:Real}
@assert F.σ₀ >= 0.0 "std must be >= 0"
Δ₀ = (1/(F.σ₀*√2))*(z-F.ξ₀)
return 0.5*((z-F.ξ₀)*(1-erf(Δ₀))-F.σ₀*√(2/π)*exp(-Δ₀^2))
end

``````

Speed benchmark

``````function time_estim()
@btime begin
z= 0.0
for i=1:1000
z= randn()
1.0 + z + rbf(1.0,1.0)(z) + ψ₀(1.0,2.0)(z)
end
end

@btime U = Cst + Lin + rbf(1.0,1.0) + ψ₀(1.0, 2.0)
@btime begin
U = Cst + Lin + rbf(1.0,1.0) + ψ₀(1.0, 2.0);
z= 0
for i=1:1000
z= randn()
U(z)
end
end
end

time_estim()
``````

32.604 μs (0 allocations: 0 bytes)
0.015 ns (0 allocations: 0 bytes)
33.814 μs (0 allocations: 0 bytes)

For me, the difference between the benchmarks is below the noise level. There are no obvious bigger performance improvements visible to me. That said, AFAIK:

• A specialized method is compiled for each set of parameter types, not for values. So, putting your runtime parameters `ξ₀,σ₀` into a struct ψ₀ will not create an optimized version of the function `(F::ψ₀)(z::T)` for each set of runtime parameters. (can be done with the Val type or metaprogramming, if your set of parameters is not too large).
• Small thing: Put @assert in the type constuctor

Thank you for your answer. Sorry, I have not fully understood how to improve the code for part 1. Can you give me an example ?

If you want `(g₁ + g₂)(z) = g₁(z) + g₂(z)`, why not simply use `z -> g₁(z) + g₂(z)`?

1 Like

How can I create this new function by looping over my list of scalar functions?

Try `foldl`

1 Like

Thanks, I have included this function into the code by calling `mapreduce`.

As mentioned @AndiMD, it will be great if each scalar function or the sum of the scalar functions can have a specialized method based on the value of the parameters and not simply based on their types.
I don’t know how this

It is not clear to me why you expect that doing this will speed up your computations. You can do this to organize your code nicely (like you do with `AddedParamFcn`), but there will be no speed benefit compared to an unrolled version.

The best you can hope for is probably getting the same speed, if the compiler specializes everything, which should happen if you store the functions in a `Tuple` (definitely not an `Array`, don’t do that).

This will happen automatically for the `(Σg::AddedParamFcn)(z)` method. This is a key feature of Julia.

Incidentally,

``````mapreduce(g->g(z),+, Σg.list)
``````

can be written as

``````mapreduce(g, +, Σg.list)
``````
1 Like

In this case, I might have misunderstood something? Please correct me:
The method `(Σg::AddedParamFcn)(z)` should specialize for all types of `Σg`, which includes all type combinations stored in the Tuple Σg.list. But there will not be an optimized method for each set of values (like `Σg.list[1].σ==2.0`)

As long as types are concrete or parametrized to make this possible, you get a fully specialized method automatically (apart from some corner cases I won’t go into here).

So that method will be specialized for all concrete types, including tuples with your `rbf` type.

You can check this with `@code_warntype`. I would recommend reading

https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-tips-1

it is very useful and nicely written.

@Tamas_Papp, your simplification doesn’t seem to work:

``````(Σg::AddedParamFcn)(z::T) where {T<:Real} = mapreduce(g,+, Σg.list)

fi = rbf(1.0, 1.0)
U = +Lin
for i=1:10
U += fi
end

U(1.0)
``````

UndefVarError: g not defined

Stacktrace:
[2] top-level scope at In[16]:1

You are right, I overlooked the fact that you are mapping over the `g`s.

I am getting a speed-up of roughly 4(see code below) by using this storage of the additive map

``````function time_estim()
@btime begin
z= 0.0
for i=1:10000
z= randn()
z + mapreduce(i->rbf(1, i)(z),+,1:10)
end
end

@btime begin
U = +Lin
for i=1:10
U += rbf(1.0, i)
end
end

@btime begin
U = +Lin
for i=1:10
U += rbf(1.0, i)
end
z = 0
for i=1:10000
z = randn()
U(z)
end
end

end

time_estim()
``````

4.128 ms (320001 allocations: 6.41 MiB)
245.894 ns (20 allocations: 1.33 KiB)
1.187 ms (20020 allocations: 313.83 KiB)

Are you sure your benchmark codes do the same thing? Maybe use a summation variable to make sure nothing is optimized away.
Here are some variants, with comparison to a simple function `rbffct`. Providing the `init` argument to `mapreduce` helps a lot.

``````function rbffct(z::T,μ,σ)  where {T<:Real}
1/(σ*√(2*π))*exp(-(0.5/σ^2)*(z-μ)^2)
end

function time_estim()

@btime begin
s=0.0
for i=1:10000
z = randn()
s += mapreduce(i->rbf(1, i)(z),+,1:10; init=0.0)
end
s
end

@btime begin
s=0.0
U = mapreduce(i->rbf(1, i),+,1:10)
for i=1:10000
z = randn()
s += U(z)
end
s
end

@btime begin
s=0.0
U = +Lin
for i=1:10
U += rbf(1.0, i)
end
for i=1:10000
z = randn()
s += U(z)
end
s
end

@btime begin
s=0.0
for i=1:10000
z = randn()
s += mapreduce(i->rbffct(z, 1, i),+,1:10)
end
s
end

@btime begin
s=0.0
U(z) = mapreduce(i->rbffct(z, 1, i),+,1:10)
for i=1:10000
z = randn()
s += U(z)
end
s
end

end

time_estim()
``````

944.986 μs (0 allocations: 0 bytes)
1.566 ms (30017 allocations: 469.98 KiB)
1.630 ms (30020 allocations: 470.08 KiB)
1.205 ms (0 allocations: 0 bytes)
1.192 ms (0 allocations: 0 bytes)