Can `eltype()` deduce the element type of a generator?

Alexander_Belopolsky · October 13, 2017, 8:01pm

As of Julia Version 0.7.0-DEV.2036, calling eltype() on even the simplest generator returns Any:

julia> eltype(x for x in 1:2)
Any

Unless the generator function is very complicated, It does not look like a hard problem to deduce the actual type. As a POC, I tried

julia> Base.eltype(g::Base.Generator) = Base.return_types(g.f, (eltype(g.iter),))[1]
julia> eltype(x for x in 1:2)
Int64
julia> eltype(x*1.0 for x in 1:2)
Float64

I don’t think this would be hard to generalize for the case when return_types() returns multiple values.

yuyichao · October 13, 2017, 8:36pm

No you code should not rely on inference.

Alexander_Belopolsky · October 13, 2017, 9:01pm

Would you mind elaborating on this some more? Not being able to get the element type of a generator seems to significantly limit generators’ usefulness. Is there some way to decorate a generator with the element type information? For example, I can probably write a macro @ET such that eltype(@ET Int (x for x in 1:2)) would return Int.

yuyichao · October 13, 2017, 9:11pm

No that’s not supposed to matter at all.

No

You can certainly have your own wrapper to do that.

mbauman · October 13, 2017, 9:45pm

You could totally create a wrapper struct for generators that asserted the type of all elements must be the type of the first (or is Union{} if empty). Our iteration protocol makes that a bit tricky at the moment, but it’s totally doable. Grab and store the first element and the iteration state, and then use a “deferred” iteration strategy wherein the subsequent element (or Nullable()) is in the state.

Alexander_Belopolsky · October 13, 2017, 9:46pm

Is that me being dumb or you being terse?

I’ve just looked at how collect(::Generator) is implemented. The first thing it does is inferring the element type:

github.com

JuliaLang/julia/blob/84ca8eea42a93aa31f8434d7003058099588bd07/base/array.jl#L652


      
              macro default_eltype(itrt)
                  return :(Any)
              end
          end
          
          _array_for(::Type{T}, itr, ::HasLength) where {T} = Array{T,1}(Int(length(itr)::Integer))
          _array_for(::Type{T}, itr, ::HasShape) where {T} = similar(Array{T}, indices(itr))
          
          function collect(itr::Generator)
              isz = iteratorsize(itr.iter)
              et = @default_eltype(typeof(itr))
              if isa(isz, SizeUnknown)
                  return grow_to!(Array{et,1}(0), itr)
              else
                  st = start(itr)
                  if done(itr,st)
                      return _array_for(et, itr.iter, isz)
                  end
                  v1, st = next(itr, st)
                  collect_to_with_first!(_array_for(typeof(v1), itr.iter, isz), v1, itr, st)
              end

Where the macro @default_eltype expands to a call to Core.Inference.return_type.

So why is collect() allowed to use inference while eltype() is not?

mbauman · October 13, 2017, 9:57pm

The key is how collect uses that information. It’s only used in the empty case. Otherwise, collect just uses the types of the values that it extracts from the array, gradually widening the array type as needed.

Alexander_Belopolsky · October 13, 2017, 10:01pm

Is that true? It looks like in the SizeUnknown branch it is used unconditionally.

github.com

JuliaLang/julia/blob/84ca8eea42a93aa31f8434d7003058099588bd07/base/array.jl#L654


      
              end
          end
          
          _array_for(::Type{T}, itr, ::HasLength) where {T} = Array{T,1}(Int(length(itr)::Integer))
          _array_for(::Type{T}, itr, ::HasShape) where {T} = similar(Array{T}, indices(itr))
          
          function collect(itr::Generator)
              isz = iteratorsize(itr.iter)
              et = @default_eltype(typeof(itr))
              if isa(isz, SizeUnknown)
                  return grow_to!(Array{et,1}(0), itr)
              else
                  st = start(itr)
                  if done(itr,st)
                      return _array_for(et, itr.iter, isz)
                  end
                  v1, st = next(itr, st)
                  collect_to_with_first!(_array_for(typeof(v1), itr.iter, isz), v1, itr, st)
              end
          end

nalimilan · October 13, 2017, 10:02pm

Have a look at what grow_to! does.

yuyichao · October 13, 2017, 10:04pm

Because people don’t like predictable behavior and forced the base to use the inference to give non-error in some cases that should really be error.

Yes, but it doesn’t/shouldn’t affect the final value. If it does, you should report it as a bug.

Alexander_Belopolsky · October 13, 2017, 10:20pm

I am new to this community, but you guys seem to like to communicate in riddles.

Why does it matter what exactly grow_to! does when it receives the inferred element type encoded in the type of its first argument? Looking closer at the two-argument grow_to! , I do realize that it discards the dest type that it receives replacing it with Union{}:

github.com

JuliaLang/julia/blob/84ca8eea42a93aa31f8434d7003058099588bd07/base/array.jl#L710


      
                      new = similar(dest, R)
                      copy!(new,1, dest,1, i-1)
                      @inbounds new[i] = el
                      return collect_to!(new, itr, i+1, st)
                  end
              end
              return dest
          end
          
          function grow_to!(dest, itr)
              out = grow_to!(similar(dest,Union{}), itr, start(itr))
              return isempty(out) ? dest : out
          end
          
          function grow_to!(dest, itr, st)
              T = eltype(dest)
              while !done(itr, st)
                  el, st = next(itr, st)
                  S = typeof(el)
                  if S === T || S <: T
                      push!(dest, el::T)

What I don’t understand is why this is necessary and if it is why it does not matter. From what I see grow_to! has to do a lot of extra work because it does not know the element type of the result.

Alexander_Belopolsky · October 13, 2017, 10:25pm

Is that sarcasm or a serious explanation? Are you suggesting that collecting from an empty generator should give an error?

mbauman · October 13, 2017, 10:35pm

Haha, sorry about that. This one decision has elicited a lot of discussion over the past few years… and you’re in the development channel, which is where you’ll tend to get much shorter and curt replies that assume quite a bit of knowledge.

In short, type inference is entirely an optimization. It should be unobservable. We should be able to turn off inference or improve its results nilly wily without any results changing. Performance will change, of course.

The return type of comprehensions used to depend on inference in all cases, and it could cause lots of surprising effects. If you changed something to be a global that wrecked type inference, you’d suddenly get an Array{Any} that can’t work with BLAS or other external (non-Julia) libraries.

Yes, there are three options for what to do in the [x for x in empty] case — use inference, return an Array{Union{}} or be an error. All three have been proposed, and I think we did the Union{} thing for a brief while.

yuyichao · October 13, 2017, 10:37pm

That is a serious explanation and yes I am saying that an error is better than a random type. A Vector{Union{}}(0) would work too of course though it’ll cause type instability.

Alexander_Belopolsky · October 13, 2017, 11:24pm

Thanks, @mbauman. This makes perfect sense. And thank you for being patient with me. I should have reformulated my question as a user question and posted it under “Usage”. This said, what is the proper etiquette here - should I open a new topic or change the category here?

My specific question relates to implementing an IterableTables “sink” functionality for my own table type. The approach taken by IterableTables is to wrap generators in an element type aware “GeneratorIterator”:

julia> eltype(TableTraits.getiterator(i for i in 1:0))
Int64
julia> eltype(TableTraits.getiterator(@NT(a=i, b=0.1i) for i in 1:0))
NamedTuples._NT_a_b{Int64,Float64}

The discussion in this topic so far suggests that this is not a good idea. What would be the right approach? I don’t think throwing an error on an empty query result is an option in my case.

yuyichao · October 13, 2017, 11:56pm

That’s exactly why I said,

If the user of a iterator wants a type when it’s not empty, that type should be supplied explicitly. Otherwise no answer can be given theoretically. In general though the user should not care about the type at all and just passing on the iterator does not require any knowledge of the return type of the iterator.

Alexander_Belopolsky · October 14, 2017, 12:31am

That’s true, but the problem that I am trying to solve does seem to require such knowledge. Moreover, all solutions I’ve seen so far (Base.collect() and TableTraits.getiterator()) do use inference in one way or another. I understand that inference may at times be fragile, but it is not in the case of f(x)::T for x in iter. A macro can easily be written to convert a generator expression with a type assert to an iterator type with iteratoreltype(iter) == HasEltype(). Why couldn’t that be the default?

The current behavior –

julia> eltype(x::Int for x in 1:0)
Any

– looks very counterintuitive.

yuyichao · October 14, 2017, 1:34am

It is, T is not necessarily known and that it is still meaningless in the empty case as well as being able to be Union{} instead.
It’s also wrong when T is not a leaftype.

Because it couldn’t be done as explained above.

And as I said, if you know the type, supply it explicitly to the user of the iterator. Having a eltype is not what a generator is for.

Alexander_Belopolsky · October 14, 2017, 2:32am

It was stated that it could not be done, but not explained. Having eltype() of a generator return the actual type can be done as I have shown in my first post. My solution may violate some theoretical principles, but in practice, the same solution is implemented in various places. @mbauman explained that type inference may be fragile and should be avoided, but what can be wrong with the following POC:

struct TypedGenerator{T}
    g::Base.Generator
end

macro typed(ex)
    if ex.args[1].head == :(::)
        :(TypedGenerator{$(ex.args[1].args[2])}($ex))
    else
        ex
    end
end

Base.iteratoreltype(::TypedGenerator{T}) where T = Base.HasEltype()
Base.eltype(::TypedGenerator{T}) where T = T
# Define iterator protocol methods delegating to `.g`.

julia> eltype(@typed (f(x) for x in 1:2))
Any

julia> eltype(@typed (f(x)::Int for x in 1:2))
Int64

If such code is acceptable in a user program, why would it be a bad idea for (f(x)::Int for x in 1:2) to return a variant of TypedGenerator{T} without the @typed macro?

How would the user that knows the type of f(x) and wants to provide that type in a generator expression (f(x) for x in iter) actually do it? I suggest that (f(x)::T for x in iter) is the first thing that comes to mind and if you subscribe to the principle of least surprise, the eltype() of this expression should be T.

yuyichao · October 14, 2017, 2:47am

I did!

So to repeat,

Or to break it down,

T is not necessarily known (and it might not be a constant at all)
It is still meaningless in the empty case
Return value could be Union{}
T may not be a leaftype.

Again, because it’s wrong.

Don’t use generator and use some other type. Any wrapper around this iterator would do. For example, just give the type to your TableTraits.getiterator function.

Adding a ::T in an expression to change the behavior is about as much surprise as you can get. This kind of behavioral change does not exist anywhere else in the language.

Topic		Replies	Views
Should Generators finally be given eltype Internals & Design	25	2298	February 7, 2019
Generator - iterators should deliver accurate eltype Internals & Design	3	1160	August 3, 2017
`eltype` could be smarter Internals & Design	8	630	May 4, 2022
Inference bug when using nested generators? General Usage	21	873	June 23, 2022
New Iterator from Nested Iterators General Usage question , iterators	19	657	September 14, 2023

Can `eltype()` deduce the element type of a generator?

Related topics