Difference between calling Iterators.rest on an array vs a range

I’m just curious why Iterators.rest yields different results depending on whether a range or an array is used. eg

df1 = DataFrame( :x=> 1:5) 

To start a loop on the second iteration of a range I would use


collect(Iterators.rest(eachindex(df1.x),1)) # state = 1
4-element Vector{Int64}:
 2
 3
 4
 5

The example in the docs use an array, and to start on the second iteration of an array would be

collect(Iterators.rest(collect(eachindex(df1.x)),2)) # state = 2
4-element Vector{Int64}:
 2
 3
 4
 5

I was just wondering what accounts for the difference?

Minor but important point, you’re not looking at the range, you’re looking at its indices eachindex, which in this case is a Base.OneTo(5). But if you remove that method call, this discrepancy remains.

The main point here is that state is not an iteration count or the corresponding item. See how the first two items below (1,2) are the same for the range and the array, but the states are different? Although states indicate iteration progress, there’s no expectation we can intuit the state that produces the n-th iteration. The state can actually be anything, even strings*.

julia> iterate(1:5), iterate(1:5, 1)
((1, 1), (2, 2))

julia> iterate(collect(1:5)), iterate(collect(1:5), 2)
((1, 2), (2, 3))

*unintuitive strings as state

struct HelloWorld end

function Base.iterate(::HelloWorld, state=nothing)
  if state == "ZA"
    ("WORLD!", "STOP TIME")
  elseif state == "STOP TIME" # no more items
    nothing
  else # unexpected or no states start at Hello
    ("Hello ", "ZA")
  end
end

for i in HelloWorld() println(i) end        # Hello WORLD!
println(iterate(HelloWorld(), "whoops"))    # ("Hello ", "ZA")
println(iterate(HelloWorld(), "ZA"))        # ("WORLD!", "STOP TIME")
println(iterate(HelloWorld(), "STOP TIME")) # nothing
1 Like

As stated above, Iterators.rest takes a state as its second argument, which is usually an implementation detail of the iterator. It so happens that state=1 does not refer to the same iteration point for a range and a vector. Iterators.rest should mostly be used on the iterators that you define, since the iteration state is otherwise not usually part of any API.
If you are curious and want to look at the definition of iterate for ranges and for vectors, you can type @edit iterate(1:5) and @edit iterate([3, 5, 6]) in your REPL and look at the code. The full iteration protocol is detailed in the docs here and there.

In contrast, the user-facing Iterators.drop does not rely on the internals of the iteration protocol and can be used to skip the n first elements, as you would expect. With your examples, when dropping 1 element (i.e. starting at the second element of the iterator):

julia> collect(Iterators.drop(1:5, 1))
4-element Vector{Int64}:
 2
 3
 4
 5

julia> collect(Iterators.drop(collect(1:5), 1))
4-element Vector{Int64}:
 2
 3
 4
 5
3 Likes

Thank you both so much for clarifying that distinction in such a clear and understandable way. I don’t think I would have pick up on it otherwise. I just marked the first answer as there is currently no way to mark multiple solutions.

1 Like