Difference between calling Iterators.rest on an array vs a range

phantom · March 23, 2023, 8:10pm

I’m just curious why Iterators.rest yields different results depending on whether a range or an array is used. eg

df1 = DataFrame( :x=> 1:5)

To start a loop on the second iteration of a range I would use


collect(Iterators.rest(eachindex(df1.x),1)) # state = 1
4-element Vector{Int64}:
 2
 3
 4
 5

The example in the docs use an array, and to start on the second iteration of an array would be

collect(Iterators.rest(collect(eachindex(df1.x)),2)) # state = 2
4-element Vector{Int64}:
 2
 3
 4
 5

I was just wondering what accounts for the difference?

Benny · March 23, 2023, 11:46pm

Minor but important point, you’re not looking at the range, you’re looking at its indices eachindex, which in this case is a Base.OneTo(5). But if you remove that method call, this discrepancy remains.

The main point here is that state is not an iteration count or the corresponding item. See how the first two items below (1,2) are the same for the range and the array, but the states are different? Although states indicate iteration progress, there’s no expectation we can intuit the state that produces the n-th iteration. The state can actually be anything, even strings*.

julia> iterate(1:5), iterate(1:5, 1)
((1, 1), (2, 2))

julia> iterate(collect(1:5)), iterate(collect(1:5), 2)
((1, 2), (2, 3))

*unintuitive strings as state

struct HelloWorld end

function Base.iterate(::HelloWorld, state=nothing)
  if state == "ZA"
    ("WORLD!", "STOP TIME")
  elseif state == "STOP TIME" # no more items
    nothing
  else # unexpected or no states start at Hello
    ("Hello ", "ZA")
  end
end

for i in HelloWorld() println(i) end        # Hello WORLD!
println(iterate(HelloWorld(), "whoops"))    # ("Hello ", "ZA")
println(iterate(HelloWorld(), "ZA"))        # ("WORLD!", "STOP TIME")
println(iterate(HelloWorld(), "STOP TIME")) # nothing

Liozou · March 24, 2023, 2:58pm

As stated above, Iterators.rest takes a state as its second argument, which is usually an implementation detail of the iterator. It so happens that state=1 does not refer to the same iteration point for a range and a vector. Iterators.rest should mostly be used on the iterators that you define, since the iteration state is otherwise not usually part of any API.
If you are curious and want to look at the definition of iterate for ranges and for vectors, you can type @edit iterate(1:5) and @edit iterate([3, 5, 6]) in your REPL and look at the code. The full iteration protocol is detailed in the docs here and there.

In contrast, the user-facing Iterators.drop does not rely on the internals of the iteration protocol and can be used to skip the n first elements, as you would expect. With your examples, when dropping 1 element (i.e. starting at the second element of the iterator):

julia> collect(Iterators.drop(1:5, 1))
4-element Vector{Int64}:
 2
 3
 4
 5

julia> collect(Iterators.drop(collect(1:5), 1))
4-element Vector{Int64}:
 2
 3
 4
 5

phantom · March 24, 2023, 5:33pm

Thank you both so much for clarifying that distinction in such a clear and understandable way. I don’t think I would have pick up on it otherwise. I just marked the first answer as there is currently no way to mark multiple solutions.

Topic		Replies	Views
Using Iterators.rest to iterate over a range General Usage	3	453	July 6, 2020
Iterators, collections, arrays General Usage question	8	3110	July 16, 2017
Iteration over OrdinalRange General Usage question	3	429	April 28, 2019
Strange behavior with list comprehensions using Iterators.Stateful General Usage	3	432	April 1, 2020
Documentation of the iteration interface is confusing New to Julia	2	571	May 25, 2018

Difference between calling Iterators.rest on an array vs a range

Related topics