# Delegating efficiently iteration from one object to another

My problem occured while trying to write an efficient iterator for permutations groups: they can be efficient generated by running throught the products of some lists of permutations (some transversals).

I give here a simple MWE which mimics that situation, using concatenation of strings instead of multiplication of permutations to mimic a non-commutative product.

So imagine that a group is:

``````struct Group
s::Vector{String}
end

G=Group(["one","happy","man"])
``````

and the “elements” of `G` are all the words made by taking one letter from the first string, one from the second, etc…

I can make an efficient iterator for this

``````I=(prod(x) for x in Iterators.product(G.s...));
``````

and the timing is not bad.

``````julia> @btime collect(I)
902.024 ns (46 allocations: 1.84 KiB)
3×5×3 Array{String,3}:
[:, :, 1] =
"ohm"  "oam"  "opm"  "opm"  "oym"
"nhm"  "nam"  "npm"  "npm"  "nym"
"ehm"  "eam"  "epm"  "epm"  "eym"

[:, :, 2] =
"oha"  "oaa"  "opa"  "opa"  "oya"
"nha"  "naa"  "npa"  "npa"  "nya"
"eha"  "eaa"  "epa"  "epa"  "eya"

[:, :, 3] =
"ohn"  "oan"  "opn"  "opn"  "oyn"
"nhn"  "nan"  "npn"  "npn"  "nyn"
"ehn"  "ean"  "epn"  "epn"  "eyn"
``````

But now I would like `G` itself to be able to iterate. I do not want to write `for s in I` or `for s in iterator(G)` but `for s in G`. So I want `G` itself to iterate through `I`.

The only way I found is:

``````Base.length(G::Group)=prod(length.(G.s))

Base.eltype(::Type{Group})=String

function Base.iterate(G::Group)
I=(prod(x) for x in Iterators.product(G.s...))
p=iterate(I)
if isnothing(p) return end
str,state=p
str,(I,state)
end

function Base.iterate(G::Group,(I,state))
p=iterate(I,state)
if isnothing(p) return end
str,state=p
str,(I,state)
end
``````

This iterates OK but the performance is terrible:

``````julia> @btime collect(G);
5.672 μs (241 allocations: 13.92 KiB)
``````

So my question is: is there a better way to do that, that is to delegate efficiently iteration from `G` to a dynamically computed `I`?

Note, your delegating implementation also loses the shape of the resulting collection. Is that something you want to maintain?

1 Like

I do not care about the shape.

I didn’t try this but note Performance Tips · The Julia Language (change `isnothing(p)` to `p === nothing`).

2 Likes

I think your iteration is probably much closer to fine than the benchmark shows, it’s just `collect` that’s slow. Your `I` in the first implementation is of type `Generator`, which has a specialized version of `collect` since it knows it is `AbstractArray` based, at least for the shape coming from `product`. Your benchmark is mostly fixed by delegating `collect` to the iterator too. The other benchmarking issue is that your second test includes the time to create `I`, while it’s not included in the first test. For me, constructing `I` takes about 1/6th the time of `collecting(I)`

``````struct Group
s::Vector{String}
end
Base.length(g::Group)=Base.length((prod(x) for x in Iterators.product(G.s...)))
Base.eltype(::Group)=String
Base.iterate(g::Group,i...) = Base.iterate((prod(x) for x in Iterators.product(G.s...)),i...)
Base.collect(g::Group) = Base.collect(prod(x) for x in Iterators.product(G.s...))

G=Group(["one","happy","man"])

using BenchmarkTools

@btime collect(G)

@btime I=(prod(x) for x in Iterators.product(G.s...))
I=(prod(x) for x in Iterators.product(G.s...))
@btime collect(I)
``````
1 Like

``````julia> @btime collect(G);
5.123 μs (190 allocations: 10.77 KiB)
``````

Why is not `p===nothing` the implementation of `isnothing`?

I get

``````julia> @btime I=(prod(x) for x in Iterators.product(G.s...))
253.435 ns (4 allocations: 112 bytes)
``````

so constructing `I` is rather negligible.

It is funny that

``````Base.collect(G::Group)=collect(prod(x) for x in Iterators.product(G.s...))
``````

does not make collecting `G` as fast as collecting `I`:

``````julia> @btime collect(G)
1.993 μs (49 allocations: 1.94 KiB)
``````

But I consider this sidestepping, not solving the problem of a slow iterator!

It is. It doesn’t help though. The reason why is a bit technical.

3 Likes

With `collect` delegated, I get

``````@btime collect(G) # 1.949 μs (50 allocations: 1.95 KiB)
@btime I=(prod(x) for x in Iterators.product(G.s...)) #  268.156 ns (4 allocations: 112 bytes)
@btime collect(I) # 1.656 μs (46 allocations: 1.84 KiB)
``````

1.656 μs + 268.156 ns = 1.924 μs, so the difference between delegated `collect(G)` and `collect(I)` is almost exactly explained by the not-negligible in the context time to construct `I`.

You also haven’t exhibited that the delegated iterator for `G` is slow, since you’re not comparing iterators. The code you’re timing

``````@btime collect(I)
``````

doesn’t use an iterator for `I`. The implementation `function collect(itr::Generator)` from `array.jl` line 680, bypasses the iteration interface, and ends up using `_array_for` internally. If you want a fair comparison, you need

``````function testiterator(itr)
for i in itr
dosomethingnontrivialwith(i)
end
end
@btime testiterator(I)
``````

but again, making that `fair` with your `G` iterator is difficult, since `G` has to generate it’s version of the `I` object.

If iteration is really a significant aspect of your group, then constructing the iterator could be part of its constructor, or deferred to first use, and made part of the object so it can be reused.

``````struct Group
s::Vector{String}
itr
Group(s::Vector{String})=new(s,(prod(x) for x in Iterators.product(s...)))
end
Base.length(g::Group)=Base.length(g.itr)
Base.eltype(::Group)=String
Base.iterate(g::Group,i...) = Base.iterate(g.itr,i...)
Base.collect(g::Group) = Base.collect(g.itr)

G=Group(["one","happy","man"])
``````

The iterator is not part of `Group` construction since constructing the list of transversals is expensive
so is only done on demand when iterating is demanded. Further, computations with `G` may lead to
finding a better list of transversals, so I really want to construct `I` dynamically.

By the way, when exploring this problem, I found a few minutes ago a proposal for improving the iteration protocol [https://mikeinnes.github.io/2020/06/04/iterate.html] which would clearly solve the problem. But this is for julia 2.0, or perhaps never…

@malacroi the iterator implementation in Jean Michel’s original post seems like the right approach to me: the generator is constructed once, then passed to the next iterations as part of the state.

By contrast, the implementation `Base.iterate(g::Group,i...) = Base.iterate((prod(x) for x in Iterators.product(G.s...)),i...)` is inefficient as it constructs a new generator on each iteration (though the issue is bypassed in the test due to the `collect` overload that does its own thing).

The approach of storing the iterator in the Group object is less robust as it will fail if several iterations are started concurrently (though that might be unlikely).

@Jean_Michel there are several places in your `iterate` implementation that are not inferred (according to `@code_warntype iterate(G)`). This is not the case for `@code_warntype iterate(prod(x) for x in Iterators.product(G.s...))` so maybe that’s the cause for the lower performance…

I found one type annotation that speeds up by a factor 2:

``````struct Group
s::Vector{String}
end
Base.length(g::Group) = prod(length.(g.s))
Base.eltype(::Group) = String

function Base.iterate(g::Group)
generatorType = Base.Generator{Base.Iterators.ProductIterator{Tuple{String,String,String}},typeof(prod)}
I = (prod(x) for x in Iterators.product(g.s...))::generatorType
p = iterate(I)
if p === nothing return end
p, (I, p)
end

function Base.iterate(g::Group, (I, state))
p = iterate(I, state)
if p === nothing return end
p, (I, p)
end

G=Group(["one","happy","man"])

using BenchmarkTools

@btime collect(G)
@btime collect(prod(x) for x in Iterators.product(G.s...))
``````

With the `::generatorType` annotation I get:

``````  2.728 μs (95 allocations: 6.27 KiB)
1.944 μs (50 allocations: 1.95 KiB)
``````

Without the annotation:

``````  6.511 μs (190 allocations: 10.77 KiB)
1.931 μs (50 allocations: 1.95 KiB)
``````

There are remaining issues according to `@code_warntype` but I don’t think they explain the remaining difference.

If the iterator construction is expensive, you probably want a way to save or discard the inner iterator, depending on whether you need it multiple times, instead of always discarding it. Since you can’t store the iterator in the group itself, how about using `Group` as a functor. So if `G::Group`, then `G()` is an iterable object for traversing the group.

``````using BenchmarkTools
struct Group
s::Vector{String}
end

struct GroupIterator itr end
Base.length(g::GroupIterator)=Base.length(g.itr)
Base.eltype(g::GroupIterator)=String
Base.iterate(g::GroupIterator)=Base.iterate(g.itr)
Base.iterate(g::GroupIterator,@nospecialize(i)) = Base.iterate(g.itr,i)

Base.collect(g::GroupIterator)=Base.collect(g.itr)
(g::Group)() = GroupIterator((prod(x) for x in Iterators.product(g.s...)))
``````

this gets me

``````@btime collect(G()) # 1.893 μs (50 allocations: 1.95 KiB)
@btime collect(prod(x) for x in Iterators.product(g.s...)) # 1.903 μs (50 allocations: 1.95 KiB)
``````

though again, this isn’t really a test of `iterate()`. To do that I tried

``````function timeiterate(itr)
h=zero(UInt64)
for i::String in itr
h=hash(i)⊻h
end
return h
end
GI=G()
@btime timeiterate(GI)
@btime timeiterate(prod(x) for x in Iterators.product(G.s...))
``````

which is now only off by a factor of 2. A big difference happened when I added `@nospecialize` to `Base.iterate(g::GroupIterator,@nospecialize(i)) = Base.iterate(g.itr,i)`, which eats up a big part of the type instability.

Well, I said I did not want `for x in I` or `for x in iterator(G)`. Saying `for x in G()` is just the same. Further, I already gave a meaning to `G()`; it is the identity element of `G`.

By the way if you had just done

``````(G::Group)() =(prod(x) for x in Iterators.product(G.s...))
``````

you would have gotten the same result…

Thanks! But in my real-life situation (where the iterator is a bit more complicated because it tries to do less multiplications by storing some partial products) I could not find an annotation which would improve the situation

Also I realized my annotation is cheating since it hardcodes the number of elements in the group with `Tuple{String,String,String}`. You could get the same speed benefit without type annotation by defining the group as

``````struct Group
s::Tuple{String,String,String}
end
``````

which you probably don’t want.