Function Composition Strategies

In my project ExpandNestedData.jl, I need to compose a series of Iterator functions that are unique for any given input. I was using Iterators.repeat, Iterators.flatten, etc. but since each of these returns a parametrized type, things were very type instable.

I solve this with a custom iterator type:

mutable struct NestedIterator{T} <: AbstractArray{T, 1}
    get_index::Function
    column_length::Int64
end
Base.getindex(ni::NestedIterator, i) = ni.get_index(i)

"""repeat_each!(c, N) will return an array where each source element appears N times in a row"""
function repeat_each!(c::NestedIterator, n)
    c.get_index = c.get_index ∘ ((i) -> unrepeat_each(i, n))
    c.column_length *= n
end
unrepeat_each(i, n) = ceil(Int64, i/n)


"""cycle!(c, n) cycles through an array N times"""
function cycle!(c::NestedIterator, n)
    l = length(c)
    c.get_index = c.get_index ∘ ((i::Int64) -> uncycle(i, l))
    c.column_length *= n
end
uncycle(i,n) = mod((i-1),n) + 1

"""stack(c1::NestedIterator, c2::NestedIterator)
Return a single NestedIterator which is the result of vcat(c1,c2)
"""
function stack(c1::NestedIterator, c2::NestedIterator)
    type = Union{eltype(c1), eltype(c2)}
    len = (c1,c2) .|> length |> sum

    f = ((i::Int64) -> unstack(i, length(c1), c1.get_index, c2.get_index))
    
    return NestedIterator{type}(f, len)
end
unstack(i::Int64, c1_len::Int64, f1::Function, f2::Function) = i > c1_len ? f2(i-c1_len) : f1(i)

* code simplified for readability. See full code here.

Basically, I’m using some tricky math to invert the transformation of indices caused by the iterator functions. I pay the price of running the composed function for each index lookup, but the savings on type stability are worth it in my testing.

Now, I’m trying to take optimization one step further and figure out how to reduce the overhead of all of these compositions.

As you can see in the code, right now I’m just composing whatever the previous look up was with the new step. My gut says that this probably isn’t optimal. I know the best approach here is to measure, try something, measure again, but since this is a side project, I don’t really have time to go down every rabbit hole, so I’m curious if anyone has some intution/wisdom on avenues to explore that have the highest chance of payoff.

Potential paths I’ve thought of are to keep track of the order of the steps (repeateach => 1, cycle => 2, stack => 3) and then either
a. foldl compose across the whole thing (maybe the compiler can do some optomization?)
b. build an Expr of what the composed function would be if everything got inlined and then just eval that once

@uniment tagging you since it seems like function composition and anonymous functions are your jam :slight_smile:

Edit: @fatteneder pointed out that my current implementation also suffers from type instability. Since unpacking a totally undefined data structure into table of undefined size and element types is fundamentally type instable, I guess I should really rephrase the question into something like: “What is the best way to handle unknown types and get back to type stable code as quickly as possible?” I think my custom type is a step in the right direction, my code is significantly faster now, but I’m wondering where to go from here.

NestedIterator{T} is type-unstable, because Function is an abstract type.
So you pay some cost for dynamic dispatch when calling getindex.
But not sure how much impact that has, unless one benchmarks this …

1 Like

Sorry, I should have been more specific –

The instability I’m referring to was the fact that the type signature of Iterators is something like

Iterator.Flatten{Iterator.Repeat{...}}. It was slowing every function call down to compile for a unique type. Where as NestedIterator{T} causes slow dynamic dispatch, but doesn’t trigger recompilation.

Edit: I know :get_index is type instable. That’s kinda why I think pivoting to storing an array if integers that refer to the functions I need to compose and then composing them all in one step would be an improvement.

Can you offer an MWE showing the problem this code solves? I need something to sink my teeth into.

Yup! I’ll post something tomorrow.

1 Like

Hide the type instability with a function barrier.
In your case, would just be the Base.getindex implementation.
But I think you need to add a type annotation too here, e.g.

Base.getindex(ni::NestedIterator, i) = ni.get_index(i)::Int
3 Likes