Length from iterator?

I’m searching for a way to get the lenght from a iterator. The iterator can be consumed, is not problem. So, I’m find Query.jl, IterTools.jl and the iterator methods from base, but i’m not finding something to calculate the length from a iterator.

What i’m missing?

Not all iterators have lengths. Some do, and if they do you can find it using length(itr).

1 Like

Is not a equivalent to C# Enumerable.Count?:

In this method, is not important the type of the source, simply the enumerator (iterator) is consumed and the element are counted.

and… are there a package or functions from base to get aggregates from a iterator (consuming it), as:

  • sum
  • count
  • mean
    etc?

sum and other functions will take in iterators, but some iterators such as x=countfrom(1) are infinite length since iterators are lazy.

Thanks,

and for lenght in particular, is posible to use it with any iterator, and the function counts the elements consuming the iterator?

this response confuse me:

https://github.com/JuliaLang/julia/issues/13772#issuecomment-151126890

is not a fast operation, but you can define your own counter:

function mylength(iter)
n=0
    for i in iter  
        n+=1
    end
return n
end

the arguments in the github are valid in my opinion, what happens if an iterator has a side effect? what if it is a non-terminating one?. if an custom iterator type has a length, is necessary to define it, in my opinion

1 Like

Yeah, but in my experience counting elements from a iterator (consuming it) is a very common operation and if that has side effects its something normal. I suppose it is a habit.

1 Like

and, in my opinion length is a aggregate as any other. so, if sum (a aggregate) works with iterators (consuming it), why not length?

You could use

ilength(itr) = foldl((a,_)->a+1,itr)
@test ilength(1:10) == 10

function was named ilen, not len since dispatching on itr is not well type-defined.
may be have a look to count. IterTools.jl package too
o(n) complexity expected.

I am proponent to move Iterators in stdlib adding some missing funcs like takewhile, dropwhile, compress, imap, ilength from IterTools

3 Likes

Just wondering, when is it useful to get the length of an iteratior, but not any of the elements?

I imagine someting equivalent to this in C#:

var list= new[] {1,2,3,4,5}
return list.Where(it=>it%2==0).Count()

I imagine in Julia it’s easy to do this type of calculations in streamming, without creating DataFrames, when i’m doing explorations over the data.

PD: It’s only a example, the real thing its much more complicated and with very large CSV’s, so, DataFrames is not a option. And i can cache in a dataframe a small set of data to do the develop loop fast, but in production i want to stream the data.

I would set up that example in Julia as follows:

list=1:5
return count(iseven,list)

And in case of a quite large file I’d go that way:

#file.txt is a large textfile consisting of string lines
count(x->length(x)<40, eachline("file.txt"))

eachline is an iterator.

2 Likes

This solution is very good, thanks you.

You’re welcome :slight_smile:

Btw do you still need a function to calculate the length of an iterator by consuming it (if necessary)? And if yes, what would you like it to return if it is known to have infinite length?

As in C#, the simple response, the function doesn’t returns and stay in a infinite loop. It’s the behavior that i expected.

But, for my specifc use case, your solution is good.

Sometimes I found it useful to get the length and the last iterate, e.g. if the iterator does some numerical work until a convergence criteria holds.

So a generally useful function is:

    lastiterate(itr) = foldl((x, y) -> y, itr)

And then

julia> lastiterate('A':'Z')
'Z'

and

julia> lastiterate(enumerate('A':'Z'))
(26, 'Z')
1 Like

According to Interfaces · The Julia Language you can try to query whether the iterator itr has a length or not with Base.IteratorSize(itr). If this returns a Base.HasLength or a Base.HasShape, then the itr should provide a length method for querying the number of items.

Though I have to admit that this may not be reliable:

  1. Base.IteratorSize is marked as optional. Moreover, there’s a fallback method that returns Base.HasLength() even though there may not be a length method. Example: struct A; end; a=A(); Base.IteratorSize(a) returns Base.HasLength() but length(a) errors (of course).
  2. These methods need to be qualified with Base. as they are not exported, hinting that they may not be considered official interfaces, although they are documented in the interfaces section of the manual.
1 Like