On the behavior of reversed iterators

From the docs:

│ Note

│ zip orders the calls to its subiterators in such a way that stateful iterators will not advance when another iterator finishes in the current iteration.

I assume this means that zip can reorder which arguments are iterated first - to better handle stateful iteration, of all things.

So that issue is kind of ignoring the contract in the docs.

Not necessarily, the equality can hold for impure functions depending on state that happens to not affect the iteration, which can be useful (functors). Since we can define Iterators.reverse and Base.reverse methods ourselves, we could make an iterable with different elements forwards versus backwards. It’s not something anyone would want, but it breaks the equality even with a pure function applied to the elements (map defaults to identity).

So that’s why earlier I said these methods seem intentional. Even if the eager versions are not equal, the nested lazy versions are made to be equal even if it wouldn’t match the eager versions. Now with more eyes on it, some people have the opinion it’s incorrect, but I suppose someone who thought it was a feature got to write it.

OP seems right there, stateful iterators’ lengths decrease with iteration so the type shouldn’t have the HasLength trait as documented.

julia> [length(itr) for i in (itr = Iterators.Stateful(1:5))] |> print
[4, 3, 2, 1, 0]
julia> Base.IteratorSize(typeof(Iterators.Stateful(1:5)))
Base.HasLength()

help?> Base.IteratorSize
  IteratorSize(itertype::Type) -> IteratorSize

  Given the type of an iterator, return one of the following values:

    •  SizeUnknown() if the length (number of elements) cannot be determined in advance.

    •  HasLength() if there is a fixed, finite length.

    •  HasShape{N}() if there is a known length plus a notion of multidimensional shape (as for
       an array). In this case N should give the number of dimensions, and the axes function is
       valid for the iterator.

    •  IsInfinite() if the iterator yields values forever.

  The default value (for iterators that do not define this function) is HasLength(). This means that
  most iterators are assumed to implement length.
3 Likes

The length of Iterators.Stateful does change as it is iterated though, so its useful to have for a single iterator. It works fine in zip on its own or with other iterators.

julia> x = Iterators.Stateful([1, 2, 3])
Base.Iterators.Stateful{Vector{Int64}, Union{Nothing, Tuple{Int64, Int64}}, Int64}([1, 2, 3], (1, 2), 3)

julia> length(x)
3

julia> iterate(x)
(1, nothing)

julia> length(x)
2

The problem happens when the same stateful iterator being used in multiple places at once so it iterates too fast. Its not that different to map! being broken when the dest array is the same as the source array.

Should we disallow a lot of length related behaviors because this is possible to do?

You’re right. Should have been more precise in that, e.g., in Haskell, such laws are stated (and documented) to hold for these functions and given the implementation obeys them cannot be broken by passing in a pure function. If impure functions were allowed as well, they could potentially be broken though. I did not mean to imply that they have to fail in this case.

Same here, I was considering the implementation – as also used in Haskell – where zip cuts its arguments to the length of the shorter one. In this case, the law will not hold on all possible inputs.
In general, methods for reverse could be defined arbitrarily, but the function should probably not be called reverse then, i.e., one might reasonably expect that reverse . reverse == id. The underlying question is “which properties can be expected to hold for some function from a generic interface”? It seems that some implementations in Julia make stronger assumptions than most programmers would reasonably expect. Ideally, such assumptions should be documented and better yet, also tested.

1 Like

I wasn’t saying length is useless, it’s that the trait HasLength is documented to be a FIXED finite length, which seems to be the wrong assumption in that issue (“…cannot rely on predicting the length in advance.”). It seems rooted in this iterators.jl line

IteratorSize(::Type{<:Stateful{T}}) where {T} = IteratorSize(T) isa HasShape ? HasLength() : IteratorSize(T)

It seems like it should be SizeUnknown, like rest iterators. The docs even says a stateful iterator mutates to become its own rest iterator:

julia> Base.IteratorSize(Iterators.rest(1:5, 2))
Base.SizeUnknown()

julia> Base.IteratorSize(Iterators.Stateful(1:5))
Base.HasLength()

help?> Iterators.Stateful
  Stateful(itr)

  There are several different ways to think about this iterator wrapper:

    1. It provides a mutable wrapper around an iterator and its iteration state.

    2. It turns an iterator-like abstraction into a Channel-like abstraction.

    3. It's an iterator that mutates to become its own rest iterator whenever an item is
       produced.
3 Likes

Ok contradicting the fixed length in the docs does seem to be an explicit bug that should be fixed.

Maybe performance considerations influence prefering HasLength, as vectors can be preallocated in collect, otherwise push! has to be used.