New Iterator from Nested Iterators

bmit · August 30, 2023, 4:44pm

Is there any shortcut to create a new iterator from existing nested iterators? I know I can manually define a new type, length, iterate, and eltype, but I’m wondering if there’s a more direct way.

Below is a MWE of a generator that I’d like to turn into a new type to iterate on. Notice that the second iterator is dependent on the value of the first so I don’t think I can simply use product. I’m wondering if there’s a use for flatten here?

testitr(N::Integer) = ((i,j) for i in 1:N for j in i:N)

collect(testitr(5))
15-element Vector{Tuple{Int64, Int64}}:
(1, 1)
(1, 2)
(1, 3)
(1, 4)
(1, 5)
(2, 2)
(2, 3)
(2, 4)
(2, 5)
(3, 3)
(3, 4)
(3, 5)
(4, 4)
(4, 5)
(5, 5)

stephancb · August 30, 2023, 8:53pm

This can be coded with flatten

julia> inner(i, N) = ((i,j) for j in i:N)
inner (generic function with 1 method)

julia> outer(N) = Iterators.flatten(inner(i, N) for i in 1:N)
outer (generic function with 1 method)

julia> collect(outer(5))
15-element Vector{Tuple{Int64, Int64}}:
 (1, 1)
 (1, 2)
 (1, 3)
 (1, 4)
 (1, 5)
 (2, 2)
 (2, 3)
 (2, 4)
 (2, 5)
 (3, 3)
 (3, 4)
 (3, 5)
 (4, 4)
 (4, 5)
 (5, 5)

but in this case I would not call it a shortcut.

bmit · August 30, 2023, 9:36pm

Cool - I see how flatten works, but you’re right that it’s not super helpful. I was hoping it would give me length and eltype so that iteration was more efficient, but that doesn’t look to be the case.

I’ll play with this a bit more to see if I can leverage the flatten methods to save myself having to write a custom iterate.

mrufsvold · August 30, 2023, 11:32pm

Are you sure it doesn’t have eltype? I’m not at my computer to check, but I’d be surprised to learn that flatten can’t get an inferred eltype here!

bmit · August 31, 2023, 12:37am

julia> eltype(outer(5))
Any

DanielVandH · August 31, 2023, 12:59am

Probably related: `eltype` could be more accurate for `Iterators.flatten` · Issue #48249 · JuliaLang/julia · GitHub

bmit · August 31, 2023, 4:52pm

The problem is actually with the inner generator itself I think. Is there some way to get this to return the correct type?

julia> eltype(inner(3,4))
Any

bmit · August 31, 2023, 5:30pm

After a bit more reading it seems that Generators are simply are not type stable in Julia. Given that, it seems very difficult to make flatten work for something like this. I think we’re stuck writing custom Iterators for anything performance critical.

nsajko · August 31, 2023, 6:07pm

Yeah, I think iterators are one of the areas of Base where the design is lacking. On the other hand, it seems like it would be pretty easy to write a “Base.Generator but with eltype” package.

mrufsvold · September 1, 2023, 4:28am

function Base.eltype(g::Base.Generator{I,F}) where {I,F}
    iter_eltype = eltype(I)
    f = g.f::F
    return_types = Base.return_types(f, (iter_eltype,))
    return Union{return_types...}
end

eltype(inner(3,4))
# Tuple{Int64, Int64}

@code_typewarn says that accessing the :f property is not type stable (which seems weird since the type is in the parameters of the Generator type.

I went back and forth about whether to Union all return types or just return Any if more than one is returned.

Is there a reason a function like this couldn’t be in Base?

edit: f is even parametricly typed in Generator definition. What’s up with the instability?

nsajko · September 1, 2023, 5:32am

Probably some kind of bug. Doesn’t happen on nightly, at least.

This sadly returns a Vector{Any}, right? Meaning that neither it’s length nor element type is known at compile time. So the destructuring in the next line causes run time dispatch, and eltype itself can’t be type stable.

Also, @allocated reports that your eltype allocates a staggering ~~5632 bytes~~ EDIT: after everything is compiled as much as possible, it’s merely 704 bytes. That’s still a lot though, considering that eltype is usually assumed to be cost-free.

In conclusion, I don’t think using Base.return_types is a viable approach here.

mrufsvold · September 1, 2023, 12:54pm

So what I didn’t realize is that iterating over Generators is still type stable even though you get eltype; I thought that defining eltype would unlock iterate’s ability to specialize, but I checked and I was wrong.

For what it’s worth though, I went into some internals and got the allocations down another 100 bytes (to 608). I’m sure someone who knows what they’re doing could shave off a lot more by bypassing some of the higher-level functions I’m calling:

# Basically just stolen from Base's version
function return_types(@nospecialize(f), @nospecialize(types);
                        world::UInt=Base.get_world_counter(),
                        interp::Core.Compiler.AbstractInterpreter=Core.Compiler.NativeInterpreter(world))
    
    (ccall(:jl_is_in_pure_context, Bool, ()) || world == typemax(UInt)) &&
    error("code reflection cannot be used from generated functions")
    
    if isa(f, Core.OpaqueClosure)
        _, rt = only(Base.code_typed_opaque_closure(f))
        return rt::Type
    end

    if isa(f, Core.Builtin)
        argtypes = Any[Base.to_tuple_type(types).parameters...]
        rt = Core.Compiler.builtin_tfunction(interp, f, argtypes, nothing)
        return Core.Compiler.widenconst(rt)::Type
    end
    
    tt = Base.signature_type(f, types)
    matches = Base._methods_by_ftype(tt, #=lim=#-1, world)::Vector
    if length(matches) == 1
        match = only(matches)::Core.MethodMatch
        meth = Base.func_for_method_checked(match.method, types, match.sparams)
        ty = Core.Compiler.typeinf_type(interp, meth, match.spec_types, match.sparams)
        return something(ty, Any)::Type
    end
    return Any
end


function Base.eltype(g::Base.Generator{I,<:Any}) where I
    iter_eltype = eltype(I)
    return_type = return_types(g.f, (iter_eltype,))
    return return_type
end

I’m not saying this should get added to Base because you make a good point about the allocations vs the promise of eltype being zero cost. But it was a fun experiment!

bmit · September 1, 2023, 2:46pm

If we can’t fix Generator eltype in Base, is it possible to tell the compiler what the eltype and length are for a specific Generator that you create?

Let me post a longer MWE to see if we can eliminate the allocations here. The short version is that I’m creating a performance critical iterator that iterates over index combinations of various lengths. If it seems like a strange way to do this, it’s because it’s still a MWE. I’m trying to nest iterators a few layers deep this way, but this shows the issue I’m seeing with a single nesting. Here’s the inner Iterator I’m using for reference.

# Based on: https://github.com/Quantum-Many-Body/QuantumLattices.jl/blob/master/src/Toolkit.jl
# improved so there is no allocation on object creation (state is now a tuple)
struct Combinations{M,C}
    contents::C
    N::Int
    Combinations{M}(contents::C) where {M,C} = new{M,C}(contents, length(contents))
end

@inline Base.eltype(::Type{Combinations{M, C}}) where {M, C} = NTuple{M, eltype(C)}
@inline Base.length(c::Combinations{M}) where M = binomial(c.N, M)
Base.iterate(c::Combinations{M}) where M = (M > c.N) ? nothing : (ntuple(i->c.contents[i], Val(M)), nextmstate(ntuple(identity, M), c.N))
Base.iterate(c::Combinations{M}, state) where M = (isempty(state) || (state[1] > c.N-M+1)) ? nothing : (ntuple(i->c.contents[state[i]], Val(M)), nextmstate(state, c.N))

function nextmstate(state::NTuple{M}, N::Int) where {M}
    for i in M:-1:1
        state = Base.setindex(state, state[i] + 1, i)
        (state[i] > N - (M - i)) && continue
        for j in (i + 1):M
            state = Base.setindex(state, state[j - 1] + 1, j)
        end
        break
    end
    return state
end

Then I iterate using the following

# create a tuple of 0s and 1s of length N with 1s at `nzindices`
function sptuple(::Val{N}, nzindices::NTuple{M,Int})::NTuple{N,Int} where {N,M}
    res = ntuple(i -> zero(Int), N)
    for i in eachindex(nzindices)
        res = Base.setindex(res, one(Int), nzindices[i])
    end
    return res
end

function sptuple(::Val{N}, ::Tuple{})::NTuple{N,Int} where {N}
    return ntuple(i->zero(Int), N)
end

# create an inner and outer Iterator using flatten.
inner(::Val{N}, ::Val{M}) where {N,M} = (sptuple(Val(N), j) for j in Combinations{M}(ntuple(identity, N)))
outer(::Val{N}) where {N} = Iterators.flatten(inner(Val(N), Val(i)) for i in 0:N)

Benchmarking, I see the following:

using BenchmarkTools
# single allocation on collect as expected on inner iterator
@btime collect(inner(Val(5), Val(3)))  # 110.381 ns (1 allocation: 496 bytes)
# I need this to also a be a single allocation. expect around ~300ns based on inner benchmark
@btime collect(outer(Val(5)))  # 5.826 μs (166 allocations: 13.27 KiB)

My hunch is that if I could somehow define eltype and length methods for inner and outer that this would work. Something like:

Base.eltype(inner(...)) where {N,M} = NTuple{N,Int}
Base.eltype(outer(...)) where {N} = NTuple{N,Int}
Base.length(inner(...)) where {N,M} = length(Combinations{M}(ntuple(identity, N)))
Base.length(outer(...)) where {N} = prod(i->length(inner(Val(N), Val(i))), 0:N)

nsajko · September 1, 2023, 3:01pm

Without looking at your code in depth yet:

I think it would be simple to create a replacement for Base.Generator that would have the element type as a type parameter, and use it to implement eltype. Maybe I do that later. We could even call Base.return_types when constructing such a generator (presumably the construction itself is not performance-sensitive).
However, IMO Julia’s standard iterator protocol is itself not very good. It’s fine for simple cases, but I don’t think it enables good performance for complicated compositions of iterators. One of the problems is that the iteration protocol features the iterator state, but a user of the iterator type has no way of getting the type of the state (unlike eltype, there’s no iterstatetype). However none of this is necessarily a problem if you’re the author of all iterator types in question.

bmit · September 1, 2023, 3:11pm

my concern with calling Base.return_types is that I think it will get called every time inner is called - once per outer iteration. Won’t this create similar performance issues to what I’m seeing now? In other words construction of the inner iterators are performance sensitive which is why I had to write my own Combinations.
You’re absolutely right. I own the Iterators here and could absolutely write my custom type and iterate function. I’m trying to avoid this for two reasons. First, it seems like a lot of effort if all I want to do it create a new iterator by nesting two existing ones. Second, it makes the code really hard to follow. The fact that I’m creating an iterator to essentially do

for i in 1:N
    for j in F{i}()
        for k in G{i}(j)
            # do stuff
        end
    end
end

is quickly lost.

mrufsvold · September 1, 2023, 3:16pm

In ExpandNestedData.jl, I have a lazy implementation of repeat, cycle, and vcat using callable structs to route an index back to a source iterator using some division and modulo arithmetic. I found that nesting dozens of types from Iterators really hurt both type stability and execution time. I never got into the details to figure out why though.

PS if you look at the code, I’m using invoke latest right now which I know sucks. I have a solution in the works but haven’t had time to clean it up.

nsajko · September 2, 2023, 8:47am

Regarding your specific problem, the collection represented by outer(Val(5)) is (I now realize) merely the 2^5 integers from 0 to 2^5 - 1, represented as strings of bits. You could have the iterator state literally be just a single UInt and iterate would just increment the state and extract the bits.

nsajko · September 2, 2023, 9:07am

So this should be a faster implementation of your outer iterator (EDIT: the order is actually not the same, not sure if that matters):

to_bitstring(m::Val, n::Integer) = ntuple(
  let n = n
    i -> (n >>> (i - 1)) % Bool
  end,
  m,
)

to_bitstring(::Val{m}) where {m} =
  (n::Integer) -> to_bitstring(Val(m), n)

# Use a static range (I think the packages `Static` and
# `StaticNumbers` support those) instead of `0:(2^m - 1)` for
# more performance?
int_to_bitstring_iterator(::Val{m}) where {m} =
  Iterators.map(to_bitstring(Val(m)), 0:(2^m - 1))

EDIT: yes, it’s quite a bit faster:

julia> using BenchmarkTools

julia> c(it::F, n::Val) where {F} = collect(it(n))
c (generic function with 1 method)

julia> @btime c(int_to_bitstring_iterator, Val(10));
  2.767 μs (1 allocation: 10.12 KiB)

julia> @btime c(outer, Val(10));
  134.662 μs (4172 allocations: 725.50 KiB)

nsajko · September 2, 2023, 2:02pm

Repo on Gitlab.com, PR to the General registry.

bmit · September 14, 2023, 12:51am

Thank you for the help! I am aware that they are simpler implementations for my MWE, but I’m trying to represent a more complicated problem that cannot be solved so easily.

I’ve been playing with your implementation of Generator in LazyMapWithElType a bit, trying to get it to work for my example above. I haven’t been able to figure out how to use it to nest two Generators either with Iterators.flatten or otherwise. Is there a specific way you were imaging this getting used to solve that issue?