Function Composition Strategies

mrufsvold · February 13, 2023, 3:25pm

In my project ExpandNestedData.jl, I need to compose a series of Iterator functions that are unique for any given input. I was using Iterators.repeat, Iterators.flatten, etc. but since each of these returns a parametrized type, things were very type instable.

I solve this with a custom iterator type:

mutable struct NestedIterator{T} <: AbstractArray{T, 1}
    get_index::Function
    column_length::Int64
end
Base.getindex(ni::NestedIterator, i) = ni.get_index(i)

"""repeat_each!(c, N) will return an array where each source element appears N times in a row"""
function repeat_each!(c::NestedIterator, n)
    c.get_index = c.get_index ∘ ((i) -> unrepeat_each(i, n))
    c.column_length *= n
end
unrepeat_each(i, n) = ceil(Int64, i/n)


"""cycle!(c, n) cycles through an array N times"""
function cycle!(c::NestedIterator, n)
    l = length(c)
    c.get_index = c.get_index ∘ ((i::Int64) -> uncycle(i, l))
    c.column_length *= n
end
uncycle(i,n) = mod((i-1),n) + 1

"""stack(c1::NestedIterator, c2::NestedIterator)
Return a single NestedIterator which is the result of vcat(c1,c2)
"""
function stack(c1::NestedIterator, c2::NestedIterator)
    type = Union{eltype(c1), eltype(c2)}
    len = (c1,c2) .|> length |> sum

    f = ((i::Int64) -> unstack(i, length(c1), c1.get_index, c2.get_index))
    
    return NestedIterator{type}(f, len)
end
unstack(i::Int64, c1_len::Int64, f1::Function, f2::Function) = i > c1_len ? f2(i-c1_len) : f1(i)

* code simplified for readability. See full code here.

Basically, I’m using some tricky math to invert the transformation of indices caused by the iterator functions. I pay the price of running the composed function for each index lookup, but the savings on type stability are worth it in my testing.

Now, I’m trying to take optimization one step further and figure out how to reduce the overhead of all of these compositions.

As you can see in the code, right now I’m just composing whatever the previous look up was with the new step. My gut says that this probably isn’t optimal. I know the best approach here is to measure, try something, measure again, but since this is a side project, I don’t really have time to go down every rabbit hole, so I’m curious if anyone has some intution/wisdom on avenues to explore that have the highest chance of payoff.

Potential paths I’ve thought of are to keep track of the order of the steps (repeateach => 1, cycle => 2, stack => 3) and then either
a. foldl compose across the whole thing (maybe the compiler can do some optomization?)
b. build an Expr of what the composed function would be if everything got inlined and then just eval that once

@uniment tagging you since it seems like function composition and anonymous functions are your jam

Edit: @fatteneder pointed out that my current implementation also suffers from type instability. Since unpacking a totally undefined data structure into table of undefined size and element types is fundamentally type instable, I guess I should really rephrase the question into something like: “What is the best way to handle unknown types and get back to type stable code as quickly as possible?” I think my custom type is a step in the right direction, my code is significantly faster now, but I’m wondering where to go from here.

fatteneder · February 13, 2023, 3:40pm

NestedIterator{T} is type-unstable, because Function is an abstract type.
So you pay some cost for dynamic dispatch when calling getindex.
But not sure how much impact that has, unless one benchmarks this …

mrufsvold · February 13, 2023, 3:46pm

Sorry, I should have been more specific –

The instability I’m referring to was the fact that the type signature of Iterators is something like

Iterator.Flatten{Iterator.Repeat{...}}. It was slowing every function call down to compile for a unique type. Where as NestedIterator{T} causes slow dynamic dispatch, but doesn’t trigger recompilation.

Edit: I know :get_index is type instable. That’s kinda why I think pivoting to storing an array if integers that refer to the functions I need to compose and then composing them all in one step would be an improvement.

uniment · February 14, 2023, 12:47am

Can you offer an MWE showing the problem this code solves? I need something to sink my teeth into.

mrufsvold · February 14, 2023, 12:51am

Yup! I’ll post something tomorrow.

fatteneder · February 14, 2023, 11:02am

Hide the type instability with a function barrier.
In your case, would just be the Base.getindex implementation.
But I think you need to add a type annotation too here, e.g.

Base.getindex(ni::NestedIterator, i) = ni.get_index(i)::Int

Topic		Replies	Views
Type stability on Arrays of Arrays in nested for loops? Performance question , arrays , type-stability	16	831	February 20, 2021
Efficiently constructing a composition of functions New to Julia	4	314	June 15, 2023
Anonymous functions, type stability New to Julia	11	979	September 12, 2023
Type instability possibly due to variable playing role of input and output General Usage question , tuple , code_warntype , type-stability	11	450	February 15, 2024
When and How to use Types General Usage question , performance , parametric-types	7	1243	March 11, 2021

Function Composition Strategies

Related topics