Store a sum of parameterized scalar functions

mleprovost · March 31, 2020, 7:35pm

Hello everyone,

I need to compute the same sum of scalar functions (constant, linear, Gaussian kernel…)
for a large number of different values. I would like to create a structure that will hold the sum of these functions to speed-up the computations

I have created an abstract type ParamFcn for these scalar functions. Each function is then a subtype of ParamFcn. For instance rbf is the subtype for a Gaussian kernel parameterized by a mean μ and std σ.

I have created a subtype of ParamFcn called AddedParamFcn that holds a tuple of ParamFcn to compute the sum of the scalar functions.

However, even for simple computations, this way of proceeding is actually slower than taking the sum of the different functions . See example. I would expect to get some speed-up with this approach since the the functions can be precomputed.

Should I use a Tuple to store the different ParamFcn functions, or can I create an `Array{ParamFcn,1}$ ( I don’t know how to implement this second option) since I know in advance the number of scalar functions.

What is a better way to implement this?

import Base: show, zero

# Type to hold 1D functions
abstract type ParamFcn end

# zero(::T) where {T <: ParamFcn} = T()

struct AddedParamFcn{T <: Tuple} <: ParamFcn
    list::T
end
function show(io::IO, Σg::AddedParamFcn)
    println(io, "AddedParamFcn:")
    for g in Σg.list
        println(io, "  $g")
    end
end

"""
    g₁::ParamFcn + g₂::ParamFcn

Add the parameterized functions so that `(g₁ + g₂)(z) = g₁(z) + g₂(z)`.
"""
+(g::ParamFcn, Σg::AddedParamFcn) = AddedParamFcn((g, Σg.list...))
+(Σg::AddedParamFcn, g::ParamFcn) = AddedParamFcn((Σg.list..., g))
function +(Σg₁::AddedParamFcn, Σg₂::AddedParamFcn)
    AddedParamFcn((Σg₁..., Σg₂...))
end

+(g::ParamFcn...) = AddedParamFcn(g)

(Σg::AddedParamFcn)(z) = mapreduce(g->g(z),+, Σg.list)



struct constant <:ParamFcn
end

const Cst = constant()

(C::constant)(z::T) where {T<:Real} = 1.0


struct linear <:ParamFcn
end

const Lin = linear()

(L::linear)(z::T) where {T<:Real} = z

# Gaussian Kernel
struct rbf <:ParamFcn
        μ::Float64
        σ::Float64
end


function show(io::IO, N::rbf)
println(io,"Gaussian kernel with mean $(N.μ) and std $(N.σ)")
end

(N::rbf)(z::T)  where {T<:Real} = 1/(N.σ*√(2*π))*exp(-(0.5/N.σ^2)*(z-N.μ)^2)


struct ψ₀ <:ParamFcn
        ξ₀::Float64
        σ₀::Float64
end


function show(io::IO, F::ψ₀)
println(io,"ψ₀ function with mean $(F.ξ₀) and std $(F.σ₀)")
end


function (F::ψ₀)(z::T) where {T<:Real}
    @assert F.σ₀ >= 0.0 "std must be >= 0"
    Δ₀ = (1/(F.σ₀*√2))*(z-F.ξ₀)
    return 0.5*((z-F.ξ₀)*(1-erf(Δ₀))-F.σ₀*√(2/π)*exp(-Δ₀^2))
end

Speed benchmark

function time_estim()
@btime begin
z= 0.0
for i=1:1000
z= randn()     
1.0 + z + rbf(1.0,1.0)(z) + ψ₀(1.0,2.0)(z)
end
end
    
@btime U = Cst + Lin + rbf(1.0,1.0) + ψ₀(1.0, 2.0)
@btime begin
U = Cst + Lin + rbf(1.0,1.0) + ψ₀(1.0, 2.0);
z= 0
for i=1:1000
z= randn()
U(z)
end
end
end

time_estim()

32.604 μs (0 allocations: 0 bytes)
0.015 ns (0 allocations: 0 bytes)
33.814 μs (0 allocations: 0 bytes)

AndiMD · March 31, 2020, 9:51pm

For me, the difference between the benchmarks is below the noise level. There are no obvious bigger performance improvements visible to me. That said, AFAIK:

A specialized method is compiled for each set of parameter types, not for values. So, putting your runtime parameters ξ₀,σ₀ into a struct ψ₀ will not create an optimized version of the function (F::ψ₀)(z::T) for each set of runtime parameters. (can be done with the Val type or metaprogramming, if your set of parameters is not too large).
Small thing: Put @assert in the type constuctor

mleprovost · March 31, 2020, 10:05pm

Thank you for your answer. Sorry, I have not fully understood how to improve the code for part 1. Can you give me an example ?

stevengj · March 31, 2020, 10:13pm

If you want (g₁ + g₂)(z) = g₁(z) + g₂(z), why not simply use z -> g₁(z) + g₂(z)?

mleprovost · March 31, 2020, 10:15pm

How can I create this new function by looping over my list of scalar functions?

stevengj · April 1, 2020, 1:03am

Try foldl

mleprovost · April 1, 2020, 1:35am

Thanks, I have included this function into the code by calling mapreduce.

As mentioned @AndiMD, it will be great if each scalar function or the sum of the scalar functions can have a specialized method based on the value of the parameters and not simply based on their types.
I don’t know how this

Tamas_Papp · April 1, 2020, 5:30am

It is not clear to me why you expect that doing this will speed up your computations. You can do this to organize your code nicely (like you do with AddedParamFcn), but there will be no speed benefit compared to an unrolled version.

The best you can hope for is probably getting the same speed, if the compiler specializes everything, which should happen if you store the functions in a Tuple (definitely not an Array, don’t do that).

This will happen automatically for the (Σg::AddedParamFcn)(z) method. This is a key feature of Julia.

Incidentally,

mapreduce(g->g(z),+, Σg.list)

can be written as

mapreduce(g, +, Σg.list)

AndiMD · April 1, 2020, 11:51am

In this case, I might have misunderstood something? Please correct me:
The method (Σg::AddedParamFcn)(z) should specialize for all types of Σg, which includes all type combinations stored in the Tuple Σg.list. But there will not be an optimized method for each set of values (like Σg.list[1].σ==2.0)

Tamas_Papp · April 1, 2020, 12:08pm

As long as types are concrete or parametrized to make this possible, you get a fully specialized method automatically (apart from some corner cases I won’t go into here).

So that method will be specialized for all concrete types, including tuples with your rbf type.

You can check this with @code_warntype. I would recommend reading

https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-tips-1

it is very useful and nicely written.

mleprovost · April 1, 2020, 7:03pm

Thank you for your explanations

@Tamas_Papp, your simplification doesn’t seem to work:

(Σg::AddedParamFcn)(z::T) where {T<:Real} = mapreduce(g,+, Σg.list)

fi = rbf(1.0, 1.0)
U = +Lin
for i=1:10
    U += fi
end

U(1.0)

UndefVarError: g not defined

Stacktrace:
[1] (::AddedParamFcn{Tuple{linear,rbf,rbf,rbf,rbf,rbf,rbf,rbf,rbf,rbf,rbf}})(::Float64) at /media/mat/HDD/TransportMap/src/function.jl:40
[2] top-level scope at In[16]:1

Tamas_Papp · April 1, 2020, 7:05pm

You are right, I overlooked the fact that you are mapping over the gs.

mleprovost · April 1, 2020, 7:19pm

I am getting a speed-up of roughly 4(see code below) by using this storage of the additive map

function time_estim()
@btime begin
    z= 0.0
    for i=1:10000
    z= randn()     
    z + mapreduce(i->rbf(1, i)(z),+,1:10)
    end
end

@btime begin
    U = +Lin
    for i=1:10
        U += rbf(1.0, i)
    end
end
 
@btime begin
    U = +Lin
    for i=1:10
        U += rbf(1.0, i)
    end
    z = 0  
    for i=1:10000
        z = randn()
        U(z)
    end
end
    
end

time_estim()

4.128 ms (320001 allocations: 6.41 MiB)
245.894 ns (20 allocations: 1.33 KiB)
1.187 ms (20020 allocations: 313.83 KiB)

AndiMD · April 1, 2020, 9:00pm

Are you sure your benchmark codes do the same thing? Maybe use a summation variable to make sure nothing is optimized away.
Here are some variants, with comparison to a simple function rbffct. Providing the init argument to mapreduce helps a lot.

function rbffct(z::T,μ,σ)  where {T<:Real}
    1/(σ*√(2*π))*exp(-(0.5/σ^2)*(z-μ)^2)
end


function time_estim()
    
    @btime begin
        s=0.0
        for i=1:10000
            z = randn()
            s += mapreduce(i->rbf(1, i)(z),+,1:10; init=0.0)
        end
        s
    end

    @btime begin
        s=0.0
        U = mapreduce(i->rbf(1, i),+,1:10)
        for i=1:10000
            z = randn()
            s += U(z)
        end
        s
    end
    
    @btime begin
        s=0.0
        U = +Lin
        for i=1:10
            U += rbf(1.0, i)
        end
        for i=1:10000
            z = randn()
            s += U(z)
        end
        s
    end

    @btime begin
        s=0.0
        for i=1:10000
            z = randn()
            s += mapreduce(i->rbffct(z, 1, i),+,1:10)
        end
        s
    end

    @btime begin
        s=0.0
        U(z) = mapreduce(i->rbffct(z, 1, i),+,1:10)
        for i=1:10000
            z = randn()
            s += U(z)
        end
        s
    end
    
end

time_estim()

944.986 μs (0 allocations: 0 bytes)
1.566 ms (30017 allocations: 469.98 KiB)
1.630 ms (30020 allocations: 470.08 KiB)
1.205 ms (0 allocations: 0 bytes)
1.192 ms (0 allocations: 0 bytes)

Topic		Replies	Views
Function in type definition General Usage	24	7102	May 19, 2017
Best practice for storing functions in a struct General Usage	31	9854	February 4, 2021
How to add (anonymous) functions? General Usage question	18	1040	November 27, 2018
Improving performance elegantly - type stability General Usage type-stability	20	1474	January 13, 2019
Array of functions - is there a way to avoid allocations performance penalty? Performance memory-allocation , arrays	23	2422	May 23, 2019

Store a sum of parameterized scalar functions

Related topics