Specifying a set of algorithms

Suppose I have a function f and I need to specify which algorithms to use inside f (e.g. different optimization methods to combine together).
What is the best (i.e. most performant) way to do that?

MWE:

A(x) = 2x
B(x) = 3x
C(x) = 4x

function f(algorithms, x)
    for alg in algorithms
        x = alg(x)
    end

    return x
end

f((A, B, C, A, B), 1.5)
2 Likes

I’ve been thinking about the same thing recently.

If you know that each alg has an in-place implementation, you might want to preallocate your result before the loop.

using BenchmarkTools
using LinearAlgebra

# Original
A(x) = 2x
B(x) = 3x
C(x) = 4x

function f(algorithms, x)
    for alg in algorithms
        x = alg(x)
    end

    return x
end

println("Original:")
@btime f($(A, B, C, A, B), $[1.5, 3.0])


# In-place
A!(x) = lmul!(2,x)
B!(x) = lmul!(3,x)
C!(x) = lmul!(4,x)

function finplace(algorithms, x)
    # Allocates once
    xwork = copy(x)
    
    for alg in algorithms
        alg(xwork)
    end

    return xwork
end

println("In-place:")
@btime finplace($(A!, B!, C!, A!, B!), $[1.5, 3.0])

This prints out

Original:
373.892 ns (5 allocations: 480 bytes)
In-place:
193.564 ns (1 allocation: 96 bytes)

Perhaps you have a MWE where this assumption fails?

1 Like

If you provide the exact number of functions that is much, much faster and might
not allocate anything if the output of the functions are immutable:

julia> function f2(f1,f2,f3,f4,f5,x)
         return f5(f4(f3(f2(f1(x)))))
       end

julia> @btime f2(A,B,C,A,B,1.5)
  0.024 ns (0 allocations: 0 bytes)
216.0

# original here:

julia> @btime f((A,B,C,A,B),1.5)
  135.034 ns (6 allocations: 96 bytes)
216.0

I am sure there might be a smart way to write a macro for that, but I am not smart :neutral_face:

3 Likes

Preallocating would greatly benefit from @lmiq’s suggestion:

# Known number of functions

function f2(f1,f2,f3,f4,f5,x)
    return f5(f4(f3(f2(f1(x)))))
end

@btime f2($A,$B,$C,$A,$B,$[1.5, 3.0])

function f2inplace(f1,f2,f3,f4,f5,x)
    xwork = copy(x)
    f5(f4(f3(f2(f1(xwork)))))
    return xwork
end

@btime f2inplace($A!, $B!, $C!, $A!, $B!, $[1.5, 3.0])

which prints:

182.691 ns (5 allocations: 480 bytes)
42.684 ns (1 allocation: 96 bytes)

Isn’t this just function composition though?

Maybe something like this would be better:

julia> function f3(algs, x)
           g = reduce(∘, reverse(algs))
           return g(x)
       end
f3 (generic function with 1 method)

julia> @benchmark f((A,B,C,A,B), 1.5)
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  6
  --------------
  minimum time:     162.050 ns (0.00% GC)
  median time:      169.211 ns (0.00% GC)
  mean time:        180.256 ns (0.62% GC)
  maximum time:     1.107 μs (74.05% GC)
  --------------
  samples:          10000
  evals/sample:     755

julia> @benchmark f3((A,B,C,A,B), 1.5)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.490 ns (0.00% GC)
  median time:      1.497 ns (0.00% GC)
  mean time:        1.534 ns (0.00% GC)
  maximum time:     22.320 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

not quite as fast as @lmiq’s suggestion, but pretty fast if you dont wanna write a macro to compose your chain of functions

though admittedly, julia might be optimizing the compositions into a single multiplication given how simple the input functions are

4 Likes

In this case it’s just function composition, but the question was a general one about passing in an unknown number of functions.

2 Likes

For example, calculate the minimum of f(x) for each f in the list of functions.

1 Like

Good point. Effectivelly, with slightly more complicated functions, they are the same:

julia> A(x) = sin(x) ; B(x) = x^3 ; C(x) = log(x)
C (generic function with 1 method)

julia> @btime f2(A,B,C,A,B,1.5)
  9.885 ns (0 allocations: 0 bytes)
-4.260054977425002e-7

julia> @btime f3((A,B,C,A,B),1.5)
  9.874 ns (0 allocations: 0 bytes)
-4.260054977425002e-7


3 Likes

Why composition would not work for this? I am interested in the problem and I may not be understanding exactly the case.

If evaluating each f is very costly, like solving an optimization problem, I guess that any overhead of the calling sequence will be irrelevant, no?

2 Likes

Something like

sjulia> A(x) = sin(x) ; B(x) = x^3 ; C(x) = log(x)
C (generic function with 1 method)

julia> function my_calculation(algorithms, x)
           return minimum(f(x) for f in algorithms)
       end
my_calculation (generic function with 1 method)

julia> my_calculation( (A, B, C), 10)
-0.5440211108893698

In my application the functions are relatively simple, but will be called many times.
You may be right that the overhead is probably not relevant. I guess I was mainly worried about type stability.

Yes, I see that. The composition option appears to be type stable.

julia> @code_warntype f3((A,B,C,A,B),1.5)
Variables
  #self#::Core.Compiler.Const(f3, false)
  algs::Core.Compiler.Const((A, B, C, A, B), false)
  x::Float64
  g::Base.var"#62#63"{Base.var"#62#63"{Base.var"#62#63"{Base.var"#62#63"{typeof(B),typeof(A)},typeof(C)},typeof(B)},typeof(A)}

Body::Float64
1 ─ %1 = Main.reverse(algs)::Core.Compiler.Const((B, A, C, B, A), false)
│        (g = Main.reduce(Main.:∘, %1))
│   %3 = (g)(x)::Float64
└──      return %3


if you’re passing in an N-tuple of functions, you could likely write a generated function that unrolls the loop over N, though i personally don’t understand generated functions well enough to be able to help write something like that.

2 Likes

Is x always a Number in your application?

somewhat offtopic, but if you optimizing a function which assumes the minimum value of a set of functions at each point of the domain, you might be interested in https://link.springer.com/article/10.1007/s10898-008-9280-3

1 Like

If you use tuples and the approach above, type stability will depend on some
cutoffs determined by the length of the tuples. For me, this cutoff is at
four elements in algorithms.

julia> using BenchmarkTools

julia> A(x) = sin(x) ; B(x) = x^3 ; C(x) = log(x); D(x) = 2x; E(x) = exp(x);

julia> function my_calculation(algorithms, x)
           return minimum(f(x) for f in algorithms)
       end

julia> @benchmark my_calculation( $(Ref((A, B, C)))[], $(Ref(10))[] )
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     22.688 ns (0.00% GC)
  median time:      22.724 ns (0.00% GC)
  mean time:        22.989 ns (0.00% GC)
  maximum time:     65.425 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     996

julia> @benchmark my_calculation( $(Ref((A, B, C, D, E)))[], $(Ref(10))[] )
BenchmarkTools.Trial:
  memory estimate:  288 bytes
  allocs estimate:  18
  --------------
  minimum time:     571.109 ns (0.00% GC)
  median time:      580.388 ns (0.00% GC)
  mean time:        609.541 ns (1.02% GC)
  maximum time:     7.012 μs (89.64% GC)
  --------------
  samples:          10000
  evals/sample:     183

To get type stability for more elements, you can use “lispy tuple programming”
similar to https://stackoverflow.com/a/55849398.

julia> function my_calculation_lispy(algorithms::NTuple{N,Any}, x) where {N}
           alg = first(algorithms)
           remaining_algorithms = Base.tail(algorithms)
           return min(alg(x), my_calculation_lispy(remaining_algorithms, x))
       end

julia> function my_calculation_lispy(algorithms::Tuple{Any}, x)
           return first(algorithms)(x)
       end

julia> my_calculation_lispy( (A, B, C, D, E), 10) ≈ my_calculation( (A, B, C, D, E), 10)
true

julia> @benchmark my_calculation_lispy( $(Ref((A, B, C)))[], $(Ref(10))[] )
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     22.466 ns (0.00% GC)
  median time:      22.520 ns (0.00% GC)
  mean time:        22.991 ns (0.00% GC)
  maximum time:     56.554 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     996

julia> @benchmark my_calculation_lispy( $(Ref((A, B, C, D, E)))[], $(Ref(10))[] )
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     42.918 ns (0.00% GC)
  median time:      43.007 ns (0.00% GC)
  mean time:        44.077 ns (0.00% GC)
  maximum time:     123.421 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     990
4 Likes