A thought on function chaining with constant Arguments

Hello folks, and thanks for your oppinion in advance.
First, I’m pretty new to Julia so you may forgive me the style if you will :slight_smile:

While learning the language and creating the first functionalities, I came up with this one:

Suppose I have a chain of functions that all need the same arguments (KnowledgeBase, Parameters, etc) and additionally they need the input from the previous function (some sort of extended chaining if you will).
I basically figured out two ways to do this (without writing the call manually):
Custom composition with expressions or wrapping in lambdas and normal composition.:

EDIT: added leightweight variant from @Tamas_Papp for comparison. Recursive evaluation is slightly slower than composition but much easier to read.

using BenchmarkTools

struct MyChainer1{T, F<:Function} # construct as expression
    constantArgument::T
    fun::F
    function MyChainer1(constArg::Any, funs::AbstractVector{<:Function})
        @assert length(funs)>1 #else it would be pointless

        ex = :(X)
        for f in funs
            ex = :($f($ex, $constArg))
        end
        ex = :(X -> $ex)
        fun = eval(ex)
        
        new{typeof(constArg), typeof(fun)}(constArg, fun)
    end
end
(m1::MyChainer1)(data) = m1.fun(data)

struct MyChainer2{T, F<:Function} # construct from lambda-wrapped functions
    constantArgument::T
    fun::F
    function MyChainer2(constArg::Any, funs::AbstractVector{<:Function})
        @assert length(funs)>1 #else it would be pointless

        funs = map( g -> x -> g(x, constArg), funs)
        fun = reduce(∘, reverse(funs))

        new{typeof(constArg), typeof(fun)}(constArg, fun)
    end
end
(m2::MyChainer2)(data) = m2.fun(data)

struct MyChainer3{T, F<:Function} # construct by hand
    constantArgument::T
    fun::F
    function MyChainer3(constArg::T, fun::F) where {T, F<:Function}
        new{T, F}(constArg, fun)
    end
end
(m3::MyChainer3)(data) = m3.fun(data)

# as suggested by @pbayer and @Christopher_Fisher
function chain_quote(x::S, ca::A, fs::AbstractVector{<:Function}) where {S,A}
    if length(fs) == 1
        :( $(fs[1])($x, $ca) )
    else
        chain_quote( :( $(fs[1])($x, $ca) ), ca, fs[2:end])
    end
end

struct MyChainer4{T, F<:Function}
    constantArgument::T
    fun::F
    function MyChainer4(constArg::Any, funs::AbstractVector{<:Function})
        @assert length(funs)>1 #else it would be pointless

        sym = gensym()
        fun= eval(:($sym -> $(chain_quote(sym, constArg, funs)) ))

        new{typeof(constArg), typeof(fun)}(constArg, fun)
    end
end
(m4::MyChainer4)(data) = m4.fun(data)

# suggested by @Tamas_Papp (and @pbayer similarly)
chain_them(b, A) = b # base case
chain_them(b, A, f1, fs...) = chain_them(f1(b, A), A ,fs...)
setup_chain(A, fs...) = x -> chain_them(x, A, fs...) # addition to allow partial function application

A = [1 2 -4 1 1; 2 -5 4 2 3; 3 0 -3 -1 4;2 3 7 -2 -8 ; 0 3 0 -9 2]
chainer1 = MyChainer1(A, [/,*,+,*,+,/,+,*])
chainer2 = MyChainer2(A, [/,*,+,*,+,/,+,*])
chainer3 = MyChainer3(A, x -> ((((((((x / A) * A) + A) * A) + A) / A) + A) * A))
chainer4 = MyChainer4(A, [/,*,+,*,+,/,+,*])
chainer5 = setup_chain(A,/,*,+,*,+,/,+,*)

# all should yield the same results
@assert chainer1(A') == chainer2(A') && chainer2(A') == chainer3(A') &&  chainer3(A') == chainer4(A') && chainer3(A') == chain_them(A', A ,/,*,+,*,+,/,+,*) && chainer3(A') == chainer5(A')

show(stdout, "text/plain", @benchmark chainer1($A'))
println()
show(stdout, "text/plain", @benchmark chainer2($A'))
println()
show(stdout, "text/plain", @benchmark chainer3($A'))
println()
show(stdout, "text/plain", @benchmark chainer4($A'))
println()
show(stdout, "text/plain", @benchmark chainer5($A'))
println()
show(stdout, "text/plain", @benchmark chain_them($A', $A ,/,*,+,*,+,/,+,*))


To my surprise, wrapping everything in lambdas did not incur a performance penalty (here?).
EDIT: figured out that I was using benchmarktools wrong and the lambdas do seem to come with a penalty. Too bad since I liked that composition operator most.

But I’m still thinking that there must be a more elegant way to do this. Does anyone know of a better way? Am I doing something stupid here or did I miss something in the docs?

Thanks in advance!

Wouldn’t that be the classical case for recursion?

function chain(x::Int, ca::Int, fs::Tuple)
    if length(fs) == 1
        fs[1](x, ca)
    else
        chain(fs[1](x,ca), ca, fs[2:end])
    end
end

julia> @btime chain(42, 13, (+,*,+,*,+))
  1.488 μs (8 allocations: 480 bytes)
9477

not so fast as your preallocated chains, but simpler. If your chains vary, this would be certainly preferable.

1 Like

As a secondary point, the fields in your struct are abstract and will incur a performance penalty. You might try something like this in future applications:

struct MyChainer3{T,F<:Function} # construct by hand just for comparison
    constantArgument::T
    fun::F
end
1 Like

Good point, thank you. I will definitely try to incorporate this version to test it.

The reason I wanted to compose the beforehand is, that the Chain is supposed to be constructed only once or twice and will then be called repeatedly to process varying data. So the cost of construction will be negligible if the calls are fast.

But for the sake of elegance, this is far superior to my approach.

right, that again makes everything way faster, I’ll edit in a second.

Still, lambdas (second case seem slowest)

I don’t know enough about the specifics of Julia to explain why the lambas are slower. However, if you plan on reusing the functions, there might be a way to modify pbayers recursive code to return a chained function. That might make the code slightly more readable and the benchmarks more comparable. I am also interested in learning the best way to chain arbitrary functions.

Done! Unfortunately making it return a function makes it harder to read again. On the pro side: performance seems good with this, too.

I don’t understand why you would store the constant argument in a struct. If you want to compose beforehand, a closure is enough to store it. In this case I like your second solution:

function rChain(ca, fs::Tuple)
    fm = map(f -> x -> f(x, ca), fs)
    return reduce(∘, reverse(fm))
end

julia> f = rChain(13, (+,*,+,*,+))
#56 (generic function with 1 method)

julia> @btime f(42)
  25.327 ns (1 allocation: 16 bytes)
9477

I probably should have been more verbosel in my description. Storing constant values like 13 is obviously unnecessary.
The usage it is intended for is the following:

Suppose you have a KnowledgeBase which stores, well, knowledge about some data you want to process, relations between data, some classifiers or regressors. Arbitrary kinds of domain knowledge if you want.
Then you have a set of transformations which transform new data while applying you stored knowledge, so they all need your KnowledgeBase as input, as well as the input data after you applied the previous transformation.

So, you may have:
Your knowledge domain KB
and functions f1 to f5 which may for example do:

  • assess the validity of new input given previous knowledge
  • some filtering
  • applying classifiers which may or may not correspond to those in your KB
  • check if any of your stored knowledge applies to the new data, optionally shortcut
  • try to integrate the new data
  • check KB for consistency

I could of course put these things together in a big function in fixed order, but some of these operations may not be necessary, some may be missing, maybe someone has a good idea for additional functionality and wants to integrate it without breaking my functionality. So, breaking it up into small functions which can be composed is probably the best idea.
Then, I wanted to make composition more elegant, so as to not force users to write:

f5(f4(f3(f2(f1(data1, kb), kb), kb,) kb,), kb)
f5(f4(f3(f2(f1(data2, kb), kb), kb,) kb,), kb)

Instead:

pipeline = Compose(kb, [f1, f2, f3, f4, f5])
pipeline(data1)
pipeline(data2)

is much more readable, but current Chaining is not exactly sufficient to do this (I believe. There was a long discussion about the functionality of piping and Chaining in julia somewhere, it’s an open topic ).

The knowledge base should be stored, because every transformation may depend on the state of it and the processing of data2 may depend on the processing of data1. Also it enables querying the kb without keeping an external reference somewhere.

I hope this helps understanding.

1 Like

I may have missed something about the spec, but I would consider a simple recursive solution like

chain_them(A, b) = b # base case
chain_them(A, b, f1, fs...) = chain_them(A, f1(A, b), fs...)
chain_them(A, b, /, *, +, *, +, /, +, *) # usage example

especially for short chains.

2 Likes

Yeah I kinda liked the idea of composing once and then applying it everywhere. I guess a simple recursive solution is much easier to read and potentially debug. It’s not exactly as fast as a real composition but that won’t matter too much for my application (hopefully).

Anyway thanks for your contribution! I marked the topic as solved. If I can finish the important part of the code and make a proper benchmark, I’ll let you guys know about the performance implications.

I think it should be faster than anything else. Did you benchmark?

Why not take a slightly different approach: each function f1, f2, f3 … must take data and the knowledge base kb as two arguments and returns a tuple of processed data and kb: (data, kb). This is fed into the next function in the pipeline.

Then you could:

compose(f) = x -> f(x...)
compose(f1, fs...) = x -> x |> f1 |> compose(fs...)

pipeline = compose(f1, f2, f3, f4, f5)  # application ...
pipeline(data1, kb)
pipeline(data2, kb)

You need only for the last function in the pipe an inspect function, returning only the results, the processed data.

Then you can compose beforehand and call the pipelines as you want.

This is pretty indeed, thank you. I think I’ll stick with that for now.
Pretty cool how many different ways there are to do this and they all work pretty well so far in Julia.

@Tamas_Papp shortly, it seemed to make n-1 more allocations for n functions in the chain. But this might also be an issue with my benchmarking or missing parametrization, since I’m not really good with Julia yet.
If I find the time, I’ll cook up a proper benchmark eventually. But the answers helped me a lot for the time being.
Thanks to all of you.