Delegating efficiently iteration from one object to another

My problem occured while trying to write an efficient iterator for permutations groups: they can be efficient generated by running throught the products of some lists of permutations (some transversals).

I give here a simple MWE which mimics that situation, using concatenation of strings instead of multiplication of permutations to mimic a non-commutative product.

So imagine that a group is:

struct Group
  s::Vector{String}
end

G=Group(["one","happy","man"])

and the “elements” of G are all the words made by taking one letter from the first string, one from the second, etc…

I can make an efficient iterator for this

I=(prod(x) for x in Iterators.product(G.s...));

and the timing is not bad.

julia> @btime collect(I)
  902.024 ns (46 allocations: 1.84 KiB)
3×5×3 Array{String,3}:
[:, :, 1] =
 "ohm"  "oam"  "opm"  "opm"  "oym"
 "nhm"  "nam"  "npm"  "npm"  "nym"
 "ehm"  "eam"  "epm"  "epm"  "eym"

[:, :, 2] =
 "oha"  "oaa"  "opa"  "opa"  "oya"
 "nha"  "naa"  "npa"  "npa"  "nya"
 "eha"  "eaa"  "epa"  "epa"  "eya"

[:, :, 3] =
 "ohn"  "oan"  "opn"  "opn"  "oyn"
 "nhn"  "nan"  "npn"  "npn"  "nyn"
 "ehn"  "ean"  "epn"  "epn"  "eyn"

But now I would like G itself to be able to iterate. I do not want to write for s in I or for s in iterator(G) but for s in G. So I want G itself to iterate through I.

The only way I found is:

Base.length(G::Group)=prod(length.(G.s))

Base.eltype(::Type{Group})=String

function Base.iterate(G::Group)
  I=(prod(x) for x in Iterators.product(G.s...))
  p=iterate(I)
  if isnothing(p) return end
  str,state=p
  str,(I,state)
end

function Base.iterate(G::Group,(I,state))
  p=iterate(I,state)
  if isnothing(p) return end
  str,state=p
  str,(I,state)
end

This iterates OK but the performance is terrible:

julia> @btime collect(G);
  5.672 μs (241 allocations: 13.92 KiB)

So my question is: is there a better way to do that, that is to delegate efficiently iteration from G to a dynamically computed I?

Note, your delegating implementation also loses the shape of the resulting collection. Is that something you want to maintain?

1 Like

I do not care about the shape.

I didn’t try this but note Performance Tips · The Julia Language (change isnothing(p) to p === nothing).

2 Likes

I think your iteration is probably much closer to fine than the benchmark shows, it’s just collect that’s slow. Your I in the first implementation is of type Generator, which has a specialized version of collect since it knows it is AbstractArray based, at least for the shape coming from product. Your benchmark is mostly fixed by delegating collect to the iterator too. The other benchmarking issue is that your second test includes the time to create I, while it’s not included in the first test. For me, constructing I takes about 1/6th the time of collecting(I)

struct Group
    s::Vector{String}
end
Base.length(g::Group)=Base.length((prod(x) for x in Iterators.product(G.s...)))
Base.eltype(::Group)=String
Base.iterate(g::Group,i...) = Base.iterate((prod(x) for x in Iterators.product(G.s...)),i...)
Base.collect(g::Group) = Base.collect(prod(x) for x in Iterators.product(G.s...))

G=Group(["one","happy","man"])

using BenchmarkTools

@btime collect(G)

@btime I=(prod(x) for x in Iterators.product(G.s...))
I=(prod(x) for x in Iterators.product(G.s...))
@btime collect(I)
1 Like

Your suggestion improves things:

julia> @btime collect(G);
  5.123 μs (190 allocations: 10.77 KiB)

Why is not p===nothing the implementation of isnothing?

I get

julia> @btime I=(prod(x) for x in Iterators.product(G.s...))
  253.435 ns (4 allocations: 112 bytes)

so constructing I is rather negligible.

It is funny that

Base.collect(G::Group)=collect(prod(x) for x in Iterators.product(G.s...))

does not make collecting G as fast as collecting I:

julia> @btime collect(G)
  1.993 μs (49 allocations: 1.94 KiB)

But I consider this sidestepping, not solving the problem of a slow iterator!

It is. It doesn’t help though. The reason why is a bit technical.

3 Likes

With collect delegated, I get

@btime collect(G) # 1.949 μs (50 allocations: 1.95 KiB)
@btime I=(prod(x) for x in Iterators.product(G.s...)) #  268.156 ns (4 allocations: 112 bytes)
@btime collect(I) # 1.656 μs (46 allocations: 1.84 KiB)

1.656 μs + 268.156 ns = 1.924 μs, so the difference between delegated collect(G) and collect(I) is almost exactly explained by the not-negligible in the context time to construct I.

You also haven’t exhibited that the delegated iterator for G is slow, since you’re not comparing iterators. The code you’re timing

@btime collect(I)

doesn’t use an iterator for I. The implementation function collect(itr::Generator) from array.jl line 680, bypasses the iteration interface, and ends up using _array_for internally. If you want a fair comparison, you need

function testiterator(itr)
    for i in itr
         dosomethingnontrivialwith(i)
    end
end
@btime testiterator(I)

but again, making that fair with your G iterator is difficult, since G has to generate it’s version of the I object.

If iteration is really a significant aspect of your group, then constructing the iterator could be part of its constructor, or deferred to first use, and made part of the object so it can be reused.

struct Group
    s::Vector{String}
    itr
    Group(s::Vector{String})=new(s,(prod(x) for x in Iterators.product(s...)))
end
Base.length(g::Group)=Base.length(g.itr)
Base.eltype(::Group)=String
Base.iterate(g::Group,i...) = Base.iterate(g.itr,i...)
Base.collect(g::Group) = Base.collect(g.itr)

G=Group(["one","happy","man"])

The iterator is not part of Group construction since constructing the list of transversals is expensive
so is only done on demand when iterating is demanded. Further, computations with G may lead to
finding a better list of transversals, so I really want to construct I dynamically.

By the way, when exploring this problem, I found a few minutes ago a proposal for improving the iteration protocol [https://mikeinnes.github.io/2020/06/04/iterate.html] which would clearly solve the problem. But this is for julia 2.0, or perhaps never…

@malacroi the iterator implementation in Jean Michel’s original post seems like the right approach to me: the generator is constructed once, then passed to the next iterations as part of the state.

By contrast, the implementation Base.iterate(g::Group,i...) = Base.iterate((prod(x) for x in Iterators.product(G.s...)),i...) is inefficient as it constructs a new generator on each iteration (though the issue is bypassed in the test due to the collect overload that does its own thing).

The approach of storing the iterator in the Group object is less robust as it will fail if several iterations are started concurrently (though that might be unlikely).

@Jean_Michel there are several places in your iterate implementation that are not inferred (according to @code_warntype iterate(G)). This is not the case for @code_warntype iterate(prod(x) for x in Iterators.product(G.s...)) so maybe that’s the cause for the lower performance…

Well, too bad, is there something one could do about it?

I found one type annotation that speeds up by a factor 2:

struct Group
    s::Vector{String}
end
Base.length(g::Group) = prod(length.(g.s))
Base.eltype(::Group) = String

function Base.iterate(g::Group)
  generatorType = Base.Generator{Base.Iterators.ProductIterator{Tuple{String,String,String}},typeof(prod)}
  I = (prod(x) for x in Iterators.product(g.s...))::generatorType
  p = iterate(I)
  if p === nothing return end
  p[1], (I, p[2])
end

function Base.iterate(g::Group, (I, state))
  p = iterate(I, state)
  if p === nothing return end
  p[1], (I, p[2])
end

G=Group(["one","happy","man"])

using BenchmarkTools

@btime collect(G)
@btime collect(prod(x) for x in Iterators.product(G.s...))

With the ::generatorType annotation I get:

  2.728 μs (95 allocations: 6.27 KiB)
  1.944 μs (50 allocations: 1.95 KiB)

Without the annotation:

  6.511 μs (190 allocations: 10.77 KiB)
  1.931 μs (50 allocations: 1.95 KiB)

There are remaining issues according to @code_warntype but I don’t think they explain the remaining difference.

If the iterator construction is expensive, you probably want a way to save or discard the inner iterator, depending on whether you need it multiple times, instead of always discarding it. Since you can’t store the iterator in the group itself, how about using Group as a functor. So if G::Group, then G() is an iterable object for traversing the group.

using BenchmarkTools
struct Group
  s::Vector{String}
end

struct GroupIterator itr end
Base.length(g::GroupIterator)=Base.length(g.itr)
Base.eltype(g::GroupIterator)=String
Base.iterate(g::GroupIterator)=Base.iterate(g.itr)
Base.iterate(g::GroupIterator,@nospecialize(i)) = Base.iterate(g.itr,i)

Base.collect(g::GroupIterator)=Base.collect(g.itr)
(g::Group)() = GroupIterator((prod(x) for x in Iterators.product(g.s...)))

this gets me

@btime collect(G()) # 1.893 μs (50 allocations: 1.95 KiB)
@btime collect(prod(x) for x in Iterators.product(g.s...)) # 1.903 μs (50 allocations: 1.95 KiB)

though again, this isn’t really a test of iterate(). To do that I tried

function timeiterate(itr)
    h=zero(UInt64)
    for i::String in itr
        h=hash(i)⊻h
    end
    return h
end
GI=G()
@btime timeiterate(GI)
@btime timeiterate(prod(x) for x in Iterators.product(G.s...))

which is now only off by a factor of 2. A big difference happened when I added @nospecialize to Base.iterate(g::GroupIterator,@nospecialize(i)) = Base.iterate(g.itr,i), which eats up a big part of the type instability.

Well, I said I did not want for x in I or for x in iterator(G). Saying for x in G() is just the same. Further, I already gave a meaning to G(); it is the identity element of G.

By the way if you had just done

(G::Group)() =(prod(x) for x in Iterators.product(G.s...))

you would have gotten the same result…

Thanks! But in my real-life situation (where the iterator is a bit more complicated because it tries to do less multiplications by storing some partial products) I could not find an annotation which would improve the situation

Also I realized my annotation is cheating since it hardcodes the number of elements in the group with Tuple{String,String,String}. You could get the same speed benefit without type annotation by defining the group as

struct Group
  s::Tuple{String,String,String}
end

which you probably don’t want.