Is it always safe to “re-use” an iterator/generator?

The way iterators work in Python, there is the following (potentially surprising) behavior:

>>> l = [1, 2, 3]
>>> g = (x for x in l)
>>> list(g)
[1, 2, 3]
>>> list(g)
[]

That is, iterators are “used up” during the iteration and will not return items again which have already been iterated over.

Unfortunately, I haven’t been able to find much detailed documentation about the behavior of iterators in Julia. The main parts are in

and focus more on how to define new iterators, not on what the guarantees on the behavior should be. Since (contrary to Python) the iteration state is kept track of as a separate variable outside the iterator object itself in Julia, presumably it should always be possible to iterate an iterator repeatedly (since it should probably return the same value given the same state parameter):

julia> l = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> g = (x for x in l)
Base.Generator{Vector{Int64}, typeof(identity)}(identity, [1, 2, 3])

julia> collect(g)
3-element Vector{Int64}:
 1
 2
 3

julia> collect(g)
3-element Vector{Int64}:
 1
 2
 3

But is this really always the case?

I think there are a few exceptions for cases that involve iterating over streaming data. e.g. the eachline iterator is documented as closing the file at the end of iteration, so it is isempty if you try to iterate a second time.

2 Likes

Also iterating over a Channel will take from it, but that is probably expected.

1 Like

Also see Iterators.Stateful, which turns any iterator into a “stateful” iterator (i.e. an iterator which behaves more like a Channel or a Python iterator).

For example:

julia> l = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

a generator is stateless (as is the vector l itself)

julia> g = (x for x in l)
Base.Generator{Vector{Int64}, typeof(identity)}(identity, [1, 2, 3])

julia> collect(g)
3-element Vector{Int64}:
 1
 2
 3

julia> collect(g)
3-element Vector{Int64}:
 1
 2
 3

but we can build a stateful iterator out of it:

julia> h = Iterators.Stateful(g)
Base.Iterators.Stateful{Base.Generator{Vector{Int64}, typeof(identity)}, Union{Nothing, Tuple{Int64, Int64}}}(Base.Generator{Vector{Int64}, typeof(identity)}(identity, [1, 2, 3]), (1, 2), 0)

julia> collect(h)
3-element Vector{Int64}:
 1
 2
 3

julia> collect(h)
Int64[]
2 Likes

Indeed, “proof by counter-example” :smiley:

julia> s = "a
       b
       c
       d"
"a\nb\nc\nd"

julia> el = eachline(IOBuffer(s))
Base.EachLine{IOBuffer}(IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=7, maxsize=Inf, ptr=1, mark=-1), Base.var"#408#411"(), false)

julia> collect(el)
4-element Vector{String}:
 "a"
 "b"
 "c"
 "d"

julia> collect(el)
String[]

Thanks!

There are no guarantees, because iterators can be inifinite and/or stateful, and the only way to find out is read the docs/source or experiment.

The good news is that (as far as I’m aware) iterators of a given type all behave the same. For example, Base.Generator always behaves as in your example.

Not necessarily; for a stateful iterator, it may make more sense to keep the state internally. See an example here.

Well, it’s definitely possible to document whatever guarantees there should be as part of the interface specification. The way the documentation is structured, it just didn’t seem clear to me one way or the other.

Infinite iterators don’t cause problems here in principle – since their values obviously can’t be stored in any kind of memory, they’re always computed “on the fly” anyway, which can be kept track of using the state variable. But when it comes to what I’ve learned now are “stateful iterators”, it’s a matter of definition whether that behavior violates the iterator interface or not. Given a constant value of state, I’d certainly find it reasonable to expect iterate(iter, state) to always return the same value – but as you and others show, this doesn’t seem to be a requirement.

There are indeed proposals to formalize the guarantees that iterators provide via traits. See for example this issue and the links therein:

2 Likes

A relevant previous discussion: