New Iterator from Nested Iterators

If we can’t fix Generator eltype in Base, is it possible to tell the compiler what the eltype and length are for a specific Generator that you create?

Let me post a longer MWE to see if we can eliminate the allocations here. The short version is that I’m creating a performance critical iterator that iterates over index combinations of various lengths. If it seems like a strange way to do this, it’s because it’s still a MWE. I’m trying to nest iterators a few layers deep this way, but this shows the issue I’m seeing with a single nesting. Here’s the inner Iterator I’m using for reference.

# Based on: https://github.com/Quantum-Many-Body/QuantumLattices.jl/blob/master/src/Toolkit.jl
# improved so there is no allocation on object creation (state is now a tuple)
struct Combinations{M,C}
    contents::C
    N::Int
    Combinations{M}(contents::C) where {M,C} = new{M,C}(contents, length(contents))
end

@inline Base.eltype(::Type{Combinations{M, C}}) where {M, C} = NTuple{M, eltype(C)}
@inline Base.length(c::Combinations{M}) where M = binomial(c.N, M)
Base.iterate(c::Combinations{M}) where M = (M > c.N) ? nothing : (ntuple(i->c.contents[i], Val(M)), nextmstate(ntuple(identity, M), c.N))
Base.iterate(c::Combinations{M}, state) where M = (isempty(state) || (state[1] > c.N-M+1)) ? nothing : (ntuple(i->c.contents[state[i]], Val(M)), nextmstate(state, c.N))

function nextmstate(state::NTuple{M}, N::Int) where {M}
    for i in M:-1:1
        state = Base.setindex(state, state[i] + 1, i)
        (state[i] > N - (M - i)) && continue
        for j in (i + 1):M
            state = Base.setindex(state, state[j - 1] + 1, j)
        end
        break
    end
    return state
end

Then I iterate using the following

# create a tuple of 0s and 1s of length N with 1s at `nzindices`
function sptuple(::Val{N}, nzindices::NTuple{M,Int})::NTuple{N,Int} where {N,M}
    res = ntuple(i -> zero(Int), N)
    for i in eachindex(nzindices)
        res = Base.setindex(res, one(Int), nzindices[i])
    end
    return res
end

function sptuple(::Val{N}, ::Tuple{})::NTuple{N,Int} where {N}
    return ntuple(i->zero(Int), N)
end

# create an inner and outer Iterator using flatten.
inner(::Val{N}, ::Val{M}) where {N,M} = (sptuple(Val(N), j) for j in Combinations{M}(ntuple(identity, N)))
outer(::Val{N}) where {N} = Iterators.flatten(inner(Val(N), Val(i)) for i in 0:N)

Benchmarking, I see the following:

using BenchmarkTools
# single allocation on collect as expected on inner iterator
@btime collect(inner(Val(5), Val(3)))  # 110.381 ns (1 allocation: 496 bytes)
# I need this to also a be a single allocation. expect around ~300ns based on inner benchmark
@btime collect(outer(Val(5)))  # 5.826 μs (166 allocations: 13.27 KiB)

My hunch is that if I could somehow define eltype and length methods for inner and outer that this would work. Something like:

Base.eltype(inner(...)) where {N,M} = NTuple{N,Int}
Base.eltype(outer(...)) where {N} = NTuple{N,Int}
Base.length(inner(...)) where {N,M} = length(Combinations{M}(ntuple(identity, N)))
Base.length(outer(...)) where {N} = prod(i->length(inner(Val(N), Val(i))), 0:N)