Delegating efficiently iteration from one object to another

Yes, the situation is vexing. I would really like some developers to chime in. It seems to me there should be built in Julia a way to delegate efficiently iteration!

Hopefully someone will chime in with a solution :slight_smile:

Otherwise you can file an issue, maybe also linking to Mike Innesā€™ blog post. By the way Iā€™m not sure a new iteration protocol would have to wait for Julia 2.0ā€¦ For example it could dispatch with iterate(::Iterator{T}) instead of iterate(::T) in a backward-compatible manner I think?

For me this works quite well:

mutable struct Group{N}
    s::NTuple{N,String}
    I::Iterators.ProductIterator{NTuple{N,String}}
    Group(s::NTuple{N,String}) where N = new{N}(s)
end

Base.length(G::Group) = prod(length.(G.s))
Base.eltype(::Type{Group}) = String

@inline function Base.iterate(G::Group)
    G.I = Iterators.product(G.s...) # memoize the iterator
    next = iterate(G.I)
    if next !== nothing
        s, state = next
        prod(s), state
    end
end

@inline function Base.iterate(G::Group, state)
    next = iterate(G.I, state)
    if next !== nothing
        s, state = next
        prod(s), state
    end
end


using BenchmarkTools

G = Group(("one", "happy", "man", "on", "the", "mountain"))
I = Iterators.product(G.s...)
P = (prod(s) for s in I)

for g in G end # warmup
for p in P end # warmup

@btime count(_ -> true, $G)
@btime count(_ -> true, $P)

Returns

julia> @btime count(_ -> true, $G)
  63.608 Ī¼s (2160 allocations: 67.50 KiB)
2160

julia> @btime count(_ -> true, $P)
  63.039 Ī¼s (2160 allocations: 67.50 KiB)
2160

Of course with collect there is a slight penalty because one needs a specialization that preallocates

julia> @btime collect(P);
  72.024 Ī¼s (2163 allocations: 84.58 KiB)

julia> @btime collect(G);
  75.176 Ī¼s (2162 allocations: 84.45 KiB)
  • It has been discussed before why it is not the correct implementation to make I a field of G.
  • indeed count is faster than collect and shows also the problem with the delegated iterator:
julia> @btime count(_->true,$G)
  3.823 Ī¼s (188 allocations: 10.22 KiB)
45

julia> I=(prod(x) for x in Iterators.product(G.s...));

julia> @btime count(_->true,$I)
  1.093 Ī¼s (45 allocations: 1.41 KiB)
45

You said

The iterator is not part of Group construction since constructing the list of transversals is expensive
so is only done on demand when iterating is demanded.

In my implementation I is not part of the construction. It is initialized only when the iteration is first started, and the iterator is then memoized inside G. You can replace the field I with whatever alternative iterator you come up. The memoization technique is still applicable. You donā€™t pay the price of constructing the iterator if you never iterate over G.


Your timings are off. Have you run them in a clean session?

  • You made the struct mutable which is not desirable

  • As mentioned before this implementation prevents two concurrent iterations.

    I ran the timings (of my previous implementation, not yours) in a clean session.

If you donā€™t like the memoization, you can do

struct Group{N}
    s::NTuple{N,String}
end

Base.length(G::Group) = prod(length.(G.s))
Base.eltype(::Type{Group}) = String

@inline function Base.iterate(G::Group)
    I = Iterators.product(G.s...)
    next = iterate(I)
    if next !== nothing
        s, state = next
        prod(s), (I, state)
    end
end

@inline function Base.iterate(G::Group, (I, state))
    next = iterate(I, state)
    if next !== nothing
        s, state = next
        prod(s), (I, state)
    end
end


using BenchmarkTools

G = Group(("one", "happy", "man", "on", "the", "mountain"))
I = Iterators.product(G.s...)
P = (prod(s) for s in I)

for g in G end # warmup
for p in P end # warmup

@btime count(_ -> true, $G)
@btime count(_ -> true, $P)

and still get the same performance

julia> @btime count(_ -> true, $G)
  63.409 Ī¼s (2160 allocations: 67.50 KiB)
2160

julia> @btime count(_ -> true, $P)
  63.603 Ī¼s (2160 allocations: 67.50 KiB)
2160

The key takeaways are the typing of Group and the @inline. With these you get the same performance. Now Group is immutable and there is no memoization. Where is the problem?

False. The iterator G.I does not change during iteration so it is possible to iterate over it concurrently. The progression of the iteration is stored in state, not by mutating G.I.

  • Your last implementation is the same as mine, apart from the @inline which does not change timings.
  • I see you changed the example which explains the different timings.
  • I do not know if you are right about concurrency. Not my domain of expertise. I repeated the point of Sudete.

Well, if you disregard the typing of Group and the @inline, which make the performance equal to that of Iterators.productā€¦

What is still missing from my last example? Group is immutable, there is no memoization, the performance is the optimal one. If there is something I overlooked please tell me. But it seems to me that this implementations satisfies your request. Right?


Even with the memoized version you could iterate over G twice simultaneously. But I agree that the non-memoizing version is cleaner, leaving the struct immutable.

Your typing of Group is ā€œcheatingā€. The number of transversals is not known at group construction time as has been explained: in the real-life case of permutation groups the computation of transversals is expensive and is not done at group creation time, only when iteration is demanded.

What do you mean? You pass a vector of Strings. You know how many they are. Then just convert it to a tuple to make the length known to the compiler.

When you use Iterators.product you are fixing the number of strings anyway.

Perhaps my MWE is not correct: I should hide from Iterators.product the number of strings.

But you canā€™t. When you call Iterators.product you ā€œfreezeā€ the number of subiterators.

So you cannot compare the performance of a dynamic version (your Group with Vector) with a static version (Iterators.product). The latter will always be faster. The only workaround is to make Group static as well.

You confuse me. In both cases the Vector is made into a tuple by the call Iterators.product(G.s...)

In your first version Base.iterate is not able to infer the return type of I (it infers Any), because it gets produced from a Vector.

In my last version, Base.iterate correctly infers the type to be Base.Iterators.ProductIterator{NTuple{6,String}}.

To see this check @code_typed iterate(G) for both versions.

This makes a huge difference.

  • It is a problem that the type of I is not inferable. Maybe it is the problem.
  • From what you say Vector is definitely correct since 6 is not known at compile time in the real life example.

Yes, it is the problem. But the type of I cannot be inferable if you construct it from a Vector:

julia> foo(v::Vector) = Iterators.product(v...)
julia> @code_warntype foo(["asd", "qwe"])

It doenā€™t need to be known at compile time, but rather at construction time of G = Group(...). In your example, G is fixed once created, you donā€™t require the flexibility to append to G.s. Hence, to take advantage of this fact, you must type it so that the compiler can infer the type of Iterators.product(G.s...). If G.s is a Vector, it canā€™t do that because the length is not in the type. If you use a Tuple it can and you get faster code.

It is not known at construction time of the group. Just at construction time of the iterator.

How can it be? Is the group mutable? Can you give a more precise example? Because in the one you gave it was known at construction time.