In ExpandNestedData
, I have a module for composing many nested iterators. It works by boiling down repeat
and cycle
into some callable structs that do division and modulo functions (respectively) on a provided index to route it back to the correct index of a seed iterator. I then compose them together as I nest them. It works very well for my application, yay!
But I also need a similar solution for a lazy vcat
. This one needs to take N child iterators, a requested index, and then route the index call to the correct child get_index
function.
I can do something like
struct Unvcat{F,G}
f_len::Int64
f::F
g::G
end
(u::Unvcat)(i) = i <= u.f_len ? u.f(i) : u.g(i - u.f_len)
Then I can just compose the functions together and we’re good to go.
However, I can have hundreds or thousands of NestedIterator
s to stack, so composing this takes AGES.
So I thought I’d have a go at just building the switch function with metaprogramming and eval
the result:
"""make_switch(fs, lengths)
Create a switching function that takes an integer `i` and compares it against
each length provided in order. Once it accumulates the sum of lengths greater than or equal to
`i`, it subtracts the previous total length and runs the corresponding function.
"""
function make_switch(fs, lengths)
func_def = compose_switch_body(fs,lengths)
@eval $func_def
end
function compose_switch_body(fs,lengths)
total_len = sum(lengths)
_fs = Iterators.Stateful(fs)
_lengths = Iterators.Stateful(lengths)
l = popfirst!(_lengths)
if_stmt = :(
if i <= $(l)
$(popfirst!(_fs))(i)
end
)
curr_stmt = if_stmt
prev_l = l
for (f,l) in zip(_fs, _lengths)
# insert a `elseif` for every subsequent function
curr_l = l + prev_l
ex = Expr(
:elseif,
:(i <= $(curr_l)),
:($f(i-$prev_l))
)
push!(curr_stmt.args, ex)
prev_l = curr_l
curr_stmt = ex
end
name = gensym("unvcat_switch")
error_str = "Attempted to access $total_len-length vector at index "
func_def = :(
function $(name)(i)
i > $(total_len) && error($error_str * "$i")
$if_stmt
end
)
return func_def
end
It works like a charm. Super fast to make, super fast to run. Feeling really great!
Aaaah, but World Age! Running iter.get_index(i)
is now using an eval
ed function that isn’t in the world age yet.
My solution for now is to define a special iterate
and collect
versions that provide a function barrier that calls Base.invokelatest
so iterating over all the values can be fast. But I’d love to provide the user with a way to use the fast function with the lazy iterator so they can avoid allocations if need be. For now, I have getindex(n::NestedIterator, i) = Base.invokelatest(n.get_index, i)
. Which is fine but obviously not ideal.
Are there better solutions I’m missing? Maybe a solution with generated functions? Maybe something to make composing functions faster?
PS, I also considered having a callable struct that holds a vector of functions and a vector of lengths, and then make it callable with a function that iterates over both vectors, but that is terribly type unstable since I can’t know the types of the functions. But maybe I’m overthinking that?