Yes, the situation is vexing. I would really like some developers to chime in. It seems to me there should be built in Julia a way to delegate efficiently iteration!
Hopefully someone will chime in with a solution
Otherwise you can file an issue, maybe also linking to Mike Innesā blog post. By the way Iām not sure a new iteration protocol would have to wait for Julia 2.0ā¦ For example it could dispatch with iterate(::Iterator{T})
instead of iterate(::T)
in a backward-compatible manner I think?
For me this works quite well:
mutable struct Group{N}
s::NTuple{N,String}
I::Iterators.ProductIterator{NTuple{N,String}}
Group(s::NTuple{N,String}) where N = new{N}(s)
end
Base.length(G::Group) = prod(length.(G.s))
Base.eltype(::Type{Group}) = String
@inline function Base.iterate(G::Group)
G.I = Iterators.product(G.s...) # memoize the iterator
next = iterate(G.I)
if next !== nothing
s, state = next
prod(s), state
end
end
@inline function Base.iterate(G::Group, state)
next = iterate(G.I, state)
if next !== nothing
s, state = next
prod(s), state
end
end
using BenchmarkTools
G = Group(("one", "happy", "man", "on", "the", "mountain"))
I = Iterators.product(G.s...)
P = (prod(s) for s in I)
for g in G end # warmup
for p in P end # warmup
@btime count(_ -> true, $G)
@btime count(_ -> true, $P)
Returns
julia> @btime count(_ -> true, $G)
63.608 Ī¼s (2160 allocations: 67.50 KiB)
2160
julia> @btime count(_ -> true, $P)
63.039 Ī¼s (2160 allocations: 67.50 KiB)
2160
Of course with collect
there is a slight penalty because one needs a specialization that preallocates
julia> @btime collect(P);
72.024 Ī¼s (2163 allocations: 84.58 KiB)
julia> @btime collect(G);
75.176 Ī¼s (2162 allocations: 84.45 KiB)
- It has been discussed before why it is not the correct implementation to make I a field of G.
- indeed
count
is faster thancollect
and shows also the problem with the delegated iterator:
julia> @btime count(_->true,$G)
3.823 Ī¼s (188 allocations: 10.22 KiB)
45
julia> I=(prod(x) for x in Iterators.product(G.s...));
julia> @btime count(_->true,$I)
1.093 Ī¼s (45 allocations: 1.41 KiB)
45
You said
The iterator is not part of
Group
construction since constructing the list of transversals is expensive
so is only done on demand when iterating is demanded.
In my implementation I
is not part of the construction. It is initialized only when the iteration is first started, and the iterator is then memoized inside G
. You can replace the field I
with whatever alternative iterator you come up. The memoization technique is still applicable. You donāt pay the price of constructing the iterator if you never iterate over G
.
Your timings are off. Have you run them in a clean session?
-
You made the struct mutable which is not desirable
-
As mentioned before this implementation prevents two concurrent iterations.
I ran the timings (of my previous implementation, not yours) in a clean session.
If you donāt like the memoization, you can do
struct Group{N}
s::NTuple{N,String}
end
Base.length(G::Group) = prod(length.(G.s))
Base.eltype(::Type{Group}) = String
@inline function Base.iterate(G::Group)
I = Iterators.product(G.s...)
next = iterate(I)
if next !== nothing
s, state = next
prod(s), (I, state)
end
end
@inline function Base.iterate(G::Group, (I, state))
next = iterate(I, state)
if next !== nothing
s, state = next
prod(s), (I, state)
end
end
using BenchmarkTools
G = Group(("one", "happy", "man", "on", "the", "mountain"))
I = Iterators.product(G.s...)
P = (prod(s) for s in I)
for g in G end # warmup
for p in P end # warmup
@btime count(_ -> true, $G)
@btime count(_ -> true, $P)
and still get the same performance
julia> @btime count(_ -> true, $G)
63.409 Ī¼s (2160 allocations: 67.50 KiB)
2160
julia> @btime count(_ -> true, $P)
63.603 Ī¼s (2160 allocations: 67.50 KiB)
2160
The key takeaways are the typing of Group
and the @inline
. With these you get the same performance. Now Group
is immutable and there is no memoization. Where is the problem?
False. The iterator G.I
does not change during iteration so it is possible to iterate over it concurrently. The progression of the iteration is stored in state
, not by mutating G.I
.
- Your last implementation is the same as mine, apart from the
@inline
which does not change timings. - I see you changed the example which explains the different timings.
- I do not know if you are right about concurrency. Not my domain of expertise. I repeated the point of Sudete.
Well, if you disregard the typing of Group
and the @inline
, which make the performance equal to that of Iterators.product
ā¦
What is still missing from my last example? Group
is immutable, there is no memoization, the performance is the optimal one. If there is something I overlooked please tell me. But it seems to me that this implementations satisfies your request. Right?
Even with the memoized version you could iterate over G
twice simultaneously. But I agree that the non-memoizing version is cleaner, leaving the struct
immutable.
Your typing of Group
is ācheatingā. The number of transversals is not known at group construction time as has been explained: in the real-life case of permutation groups the computation of transversals is expensive and is not done at group creation time, only when iteration is demanded.
What do you mean? You pass a vector of Strings
. You know how many they are. Then just convert it to a tuple to make the length known to the compiler.
When you use Iterators.product
you are fixing the number of strings anyway.
Perhaps my MWE is not correct: I should hide from Iterators.product
the number of strings.
But you canāt. When you call Iterators.product
you āfreezeā the number of subiterators.
So you cannot compare the performance of a dynamic version (your Group
with Vector
) with a static version (Iterators.product
). The latter will always be faster. The only workaround is to make Group
static as well.
You confuse me. In both cases the Vector
is made into a tuple by the call Iterators.product(G.s...)
In your first version Base.iterate
is not able to infer the return type of I
(it infers Any
), because it gets produced from a Vector
.
In my last version, Base.iterate
correctly infers the type to be Base.Iterators.ProductIterator{NTuple{6,String}}
.
To see this check @code_typed iterate(G)
for both versions.
This makes a huge difference.
- It is a problem that the type of
I
is not inferable. Maybe it is the problem. - From what you say
Vector
is definitely correct since 6 is not known at compile time in the real life example.
Yes, it is the problem. But the type of I
cannot be inferable if you construct it from a Vector
:
julia> foo(v::Vector) = Iterators.product(v...)
julia> @code_warntype foo(["asd", "qwe"])
It doenāt need to be known at compile time, but rather at construction time of G = Group(...)
. In your example, G
is fixed once created, you donāt require the flexibility to append to G.s
. Hence, to take advantage of this fact, you must type it so that the compiler can infer the type of Iterators.product(G.s...)
. If G.s
is a Vector
, it canāt do that because the length is not in the type. If you use a Tuple
it can and you get faster code.
It is not known at construction time of the group. Just at construction time of the iterator.
How can it be? Is the group mutable? Can you give a more precise example? Because in the one you gave it was known at construction time.