If we can’t fix Generator
eltype
in Base
, is it possible to tell the compiler what the eltype
and length
are for a specific Generator
that you create?
Let me post a longer MWE to see if we can eliminate the allocations here. The short version is that I’m creating a performance critical iterator that iterates over index combinations of various lengths. If it seems like a strange way to do this, it’s because it’s still a MWE. I’m trying to nest iterators a few layers deep this way, but this shows the issue I’m seeing with a single nesting. Here’s the inner Iterator
I’m using for reference.
# Based on: https://github.com/Quantum-Many-Body/QuantumLattices.jl/blob/master/src/Toolkit.jl
# improved so there is no allocation on object creation (state is now a tuple)
struct Combinations{M,C}
contents::C
N::Int
Combinations{M}(contents::C) where {M,C} = new{M,C}(contents, length(contents))
end
@inline Base.eltype(::Type{Combinations{M, C}}) where {M, C} = NTuple{M, eltype(C)}
@inline Base.length(c::Combinations{M}) where M = binomial(c.N, M)
Base.iterate(c::Combinations{M}) where M = (M > c.N) ? nothing : (ntuple(i->c.contents[i], Val(M)), nextmstate(ntuple(identity, M), c.N))
Base.iterate(c::Combinations{M}, state) where M = (isempty(state) || (state[1] > c.N-M+1)) ? nothing : (ntuple(i->c.contents[state[i]], Val(M)), nextmstate(state, c.N))
function nextmstate(state::NTuple{M}, N::Int) where {M}
for i in M:-1:1
state = Base.setindex(state, state[i] + 1, i)
(state[i] > N - (M - i)) && continue
for j in (i + 1):M
state = Base.setindex(state, state[j - 1] + 1, j)
end
break
end
return state
end
Then I iterate using the following
# create a tuple of 0s and 1s of length N with 1s at `nzindices`
function sptuple(::Val{N}, nzindices::NTuple{M,Int})::NTuple{N,Int} where {N,M}
res = ntuple(i -> zero(Int), N)
for i in eachindex(nzindices)
res = Base.setindex(res, one(Int), nzindices[i])
end
return res
end
function sptuple(::Val{N}, ::Tuple{})::NTuple{N,Int} where {N}
return ntuple(i->zero(Int), N)
end
# create an inner and outer Iterator using flatten.
inner(::Val{N}, ::Val{M}) where {N,M} = (sptuple(Val(N), j) for j in Combinations{M}(ntuple(identity, N)))
outer(::Val{N}) where {N} = Iterators.flatten(inner(Val(N), Val(i)) for i in 0:N)
Benchmarking, I see the following:
using BenchmarkTools
# single allocation on collect as expected on inner iterator
@btime collect(inner(Val(5), Val(3))) # 110.381 ns (1 allocation: 496 bytes)
# I need this to also a be a single allocation. expect around ~300ns based on inner benchmark
@btime collect(outer(Val(5))) # 5.826 μs (166 allocations: 13.27 KiB)
My hunch is that if I could somehow define eltype
and length
methods for inner
and outer
that this would work. Something like:
Base.eltype(inner(...)) where {N,M} = NTuple{N,Int}
Base.eltype(outer(...)) where {N} = NTuple{N,Int}
Base.length(inner(...)) where {N,M} = length(Combinations{M}(ntuple(identity, N)))
Base.length(outer(...)) where {N} = prod(i->length(inner(Val(N), Val(i))), 0:N)