using DataFrames
df = DataFrame(:a=>[1,2,3])
Base.IteratorSize(DataFrame) # Base.HasLength()
length(df) # method length(::DataFrame) doesn't exist
I’m ok with DataFrames not having length
, but in that case why
Base.IteratorSize(DataFrame) == Base.HasLength()
2 Likes
To my understanding, DataFrame
does not implement the iterator interface, i.e.,
julia> for x in df
println(x)
end
ERROR: AbstractDataFrame is not iterable. Use eachrow(df) to get a row iterator or eachcol(df) to get a column iterator
julia> Base.IteratorSize(typeof(eachrow(df)))
Base.HasShape{1}()
What you are seeing is just the default implementation of IteratorSize
:
julia> @which Base.IteratorSize(DataFrame)
Base.IteratorSize(::Type) in Base at generator.jl:93
which is implemented as IteratorSize(::Type) = HasLength() # HasLength is the default
and simply returns HasLength()
for any type.
2 Likes
But since length
is not defined, maybe it would be better to overload this?
Maybe, but this would need to be done for any type then:
julia> struct MyType end
julia> length(MyType())
ERROR: MethodError: no method matching length(::MyType)
julia> Base.IteratorSize(MyType)
Base.HasLength()
Thus, currently Base.IteratorSize
cannot be actually used to check that length
will work as it’s only meaningful for types that actually care for the iterator interface. In particular, types not depending on that interface will not bother with overwriting this method (why should they).
Arguably, the best fix would probably be to remove the default implementation of IteratorSize
. Yet at this stage, this will probably be breaking … on the other hand, it would not matter much for code outside of iterators anyways (or why would you want to check IteratorSize
on a type that is not iterable?).
1 Like
In 1.10 (iirc) we will get Tricks.jl style compile-time hasmethod
we might be able to use that to give a implementation of IteratorSize
that actually checks what methods are defined.
Then you would only need to overload it is you were doing something odd.
We also could have it return an error status if the thing did not define iterate
which iirc DataFrames do not. Right now there is no status for this since we assume people only try to use it on iterators, as was stated
4 Likes
I do agree, calling IteratorSize
on a completely generic object (i.e., one that you have no information if it is even an iterable) is a code smell. Your code should either assume it is (or is not) an iterable, or pass this information explicitly along the generic object.
They do, it’s the iterate
method that throws:
julia> iterate(DataFrame())
ERROR: AbstractDataFrame is not iterable. Use eachrow(df) to get a row iterator or eachcol(df) to get a column iterator
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] iterate(#unused#::DataFrame)
@ DataFrames ~/.julia/packages/DataFrames/LteEl/src/abstractdataframe/iteration.jl:23
[3] top-level scope
@ REPL[4]:1
1 Like
this comment seems irrelevant to question…