Zip with length checking

Consider a loop

for (i, elt) in zip(indexes, itr)
    do_something(i, elt)
end

where Base.IteratorSize(itr) can potentially be Base.HasLength() or Base.SizeUnknown(). I want the code to work for both.

indexes is an AbstractVector so I know its length.

I want to check that the iteration does not terminate “early” because itr is shorter than `indexes. What’s the idiomatic way to do so?

I thought of the following:

  1. manually iterate itr using,
  2. use a counter and check that.

But neither seems elegant.

1 Like

zip(indices, takestrict(itr, length(indices))) from Itertools would be a choice. Actually, I might add zipstrict to Itertools.

2 Likes

Sorry to resurrect this thread, but has this actually been implemented in IteratorTools or base iterators? zipstrict seems like it would be a very helpful function to do “map” with multiple arguments in a way that’s similar to broadcast.

1 Like

there is Iterators._zip_lengths_finite_equal which is undocumented & internal, but I think in this case it’s pretty fine to just use it anyway

it might be reasonable to define

zipstrict(a...) = _zip_lengths_finite_equal(a) ? throw(ArgumentError(...)) : Zip(a)

or use a strict kwarg. but there isn’t really precedence for this kind of API in any existing iterators in Base.Iterators

That won’t work, unfortunately, with SizeUnknowns. For that you do need to reach into the zip implementation to demand that running out of one iterator asserts that all iterators are complete.

1 Like

AFAIK, no. I think it would be a fine addition to IterTools.jl.

The implementation would have to consider all the size trait combinations mentioned in this topic, also for more than two arguments, but in principle this should not be difficult.

Which one would be preferable?

  • lazy: as soon as one of the iterators runs out, raise an exception if there were any iterators before it in the ziplist that didn’t run out, or any iterators after it that don’t run out
  • eager: raise an exception if we can’t validate ahead of time that all iterators are either infinite or the same length, and then just let a normal zip happen

The lazy version seems pretty easy to implement in a way that covers all cases, right? And the eager version is impossible to get to work with SizeUnknowns, as @mbauman points out?

I would go with lazy first, and also as a fallback, and then specialize to eager for the cases where this leads to a significant improvement.