`eltype` could be smarter

There have been plenty of eltype discussions before, for example Generator - iterators should deliver accurate eltype.

I understand that this can sometimes be a hard problem. But then there are cases like this:

julia> x = rand(3)
3-element Vector{Float64}:
 0.565481
 0.965843
 0.461929

julia> gen =  (sqrt(a) for a in x)
Generator{Vector{Float64}, var"#7#8"}(#7, [0.565481, 0.965843, 0.461929])

julia> eltype(gen)
Any

julia> Core.Compiler.return_type(gen.f, Tuple{eltype(gen.iter)})
Float64

Of course Any is always a valid answer. But then, we could just change eltype to always return Any. That would be valid, just not useful.

In this case, making it more useful seems trivial. Why are we not doing this by default?

3 Likes

As I understand it, the problem is that then the return value of eltype depends on internal details of the compiler (the type inference) and canā€™t be easily predicted or relied on from one version of Julia to the next.

4 Likes

Thanks @stevengj . Iā€™ve heard that argument too, but I donā€™t think it really answers the question. It seems to say we donā€™t want to give a good answer now, because we might later give a slightly less good answer. So letā€™s just always give a really bad answer.

Does anyone really think Any is a good result for practical work? When I get this, I immediately turn to Core.Compiler.return_type, since that seems to be the best way to get good performance in cases like this. Iā€™d much rather know the compiler is doing the best it can, and a result of Any means calling return_type myself wonā€™t help things. That approach would make users like me less likely to use it, which many people would seem to prefer.

I mostly use eltype to get a result I can use for later dispatch in order to get better performance. Of course there might be inference regressions, but that would just mean performance regressions down the line. Iā€™d just rather be able to trust that eltype is doing its best, so I can get the best performance and avoid digging into the compiler.

Sort of sidestepping the problem here, but in cases like the OP where you are interested in the eltype of a pretty quick operation wouldnā€™t it always work to do something like typeof(first(gen))? What is lost here?

Often, but not in general:

julia> gen = (1 + x for x in (1, 1.0, 3+4im))
Generator{Tuple{Int64, Float64, Complex{Int64}}, var"#11#12"}(#11, (1, 1.0, 3+4im))

julia> typeof(first(gen))
Int64
2 Likes

If the iterator is stateful, you lose the first element.


You can help inference along by bypassing it via a type parameter of the iterator that contains the eltype - you then of course have the problem of your users having to specify the type ahead of time. For an example on how this can be done, check out this inferrable Flatten iterator (just noticed this has a bug - I forgot a convert in the early return). In that package, I sidestep the problem by never exposing that iterator in the first place and having it ā€œfall into placeā€ through the API. Thatā€™s of course not possible when itā€™s supposed to be front-facing user API.

3 Likes

So then the result of your code (and not just the performance) depends on compiler internals. This might sometimes be okay, but itā€™s a fragile strategy for general use.

There are ways to get both performance and better stability. e.g. look at the way collect works (or map). It is structured so that for any non-empty iterator, it eagerly allocates the array based on the type of the first element. In the unlikely event of a type-unstable iterator where they encounter a different type later on, it re-allocates the array with a wider type. This way it is fast in the common case of a type-inferred type-stable iterator, but the type inference only improves performance rather than changing the result. (For the case of an empty iterator, collect may use Core.Compiler.return_type to return the appropriately typed empty array; the alternative would be to return Any[] and have a type-unstable collect in the common case, which would be worse.)

7 Likes

It depends. I was thinking of reductions, for example as in this PR

There, we want to use a floating point trick for types that will return an AbstractFloat, and fall back to sum(log, x) in other cases. The result wonā€™t depend on which method is called, only performance.

Thanks. @mcabbott had a similar comment, but I didnā€™t see that as completely removing the benefit of eltype returning a narrower type. Iā€™ll have a closer look and think through this some more.

By the way, @tkf has a nice pacakge BangBang.jl specifically for enabling these sorts of workflows.

E.g. here is how one might implement map using BangBang.jl

using BangBang
function mymap(f, iters...)
    out = Union{}[]
    for tup in zip(iters...)
        out = push!!(out, f(tup...))
    end
    out
end
1 Like