Providing length info to filters

Iterator filters don’t know the length of their output by default. However, there are cases where the program logic makes the length of the output known. How can we provide this info to improve the performance of subsequent uses of the filter, like collect?

Example

Let cube be an iterator that returns 9 CartesianIndexes (known length).

not33= Iterators.filter( c-> c.I != (3,3), cube )

We know from the program logic that exactly 1 item in cube will be filtered out, so length(not33) == 8 should be true.

I thought that Iterators.take should know its size, but that isn’t the case either. The only solution I can think of is to wrap my iterators in a custom type like take that will know its length at creation:

struct LengthfulIterator
  iter
  n::Int
end
Base.length(li::LengthfulIterator)= li.n
Base.iterate(li:LengthfulIterator, s)= ...

> not33= LengthfulIterator(
           Iterators.filter( c-> c.I != (3,3), cube ),
           8)
> length(not33) == 8
true

This does sound a bit contrived & verbose, and I wonder if there isn’t a more standard solution.

1 Like

Have you seen https://docs.julialang.org/en/v1/base/collections/#Base.IteratorSize and https://docs.julialang.org/en/v1/manual/interfaces/index.html#man-interface-iteration-1? I am no iterator expert, but I think those will help.

Thanks for your reply. What you point to is for defining length in custom iterator types, like the LengthfulIterator in my example above, so I don’t think it helps with standard ones like filter, which define this trait as IteratorSize(::Iterators.Filter)= SizeUnknown() anyway.

Ah sorry, I should have read your post more carefully first. To me, what you have seems like a reasonable way to do it, but maybe someone else will have a better idea.

Edit: IterTools.jl does a lot of wrapping like this (e.g. https://github.com/JuliaCollections/IterTools.jl/blob/d39cba8014709175686670b9089d9ac724a19ec7/src/IterTools.jl#L177), which makes me think it’s an idiomatic way to do it.