`eltype` could be smarter

cscherrer · May 4, 2022, 2:41pm

There have been plenty of eltype discussions before, for example Generator - iterators should deliver accurate eltype.

I understand that this can sometimes be a hard problem. But then there are cases like this:

julia> x = rand(3)
3-element Vector{Float64}:
 0.565481
 0.965843
 0.461929

julia> gen =  (sqrt(a) for a in x)
Generator{Vector{Float64}, var"#7#8"}(#7, [0.565481, 0.965843, 0.461929])

julia> eltype(gen)
Any

julia> Core.Compiler.return_type(gen.f, Tuple{eltype(gen.iter)})
Float64

Of course Any is always a valid answer. But then, we could just change eltype to always return Any. That would be valid, just not useful.

In this case, making it more useful seems trivial. Why are we not doing this by default?

stevengj · May 4, 2022, 2:51pm

As I understand it, the problem is that then the return value of eltype depends on internal details of the compiler (the type inference) and can’t be easily predicted or relied on from one version of Julia to the next.

cscherrer · May 4, 2022, 3:01pm

Thanks @stevengj . I’ve heard that argument too, but I don’t think it really answers the question. It seems to say we don’t want to give a good answer now, because we might later give a slightly less good answer. So let’s just always give a really bad answer.

Does anyone really think Any is a good result for practical work? When I get this, I immediately turn to Core.Compiler.return_type, since that seems to be the best way to get good performance in cases like this. I’d much rather know the compiler is doing the best it can, and a result of Any means calling return_type myself won’t help things. That approach would make users like me less likely to use it, which many people would seem to prefer.

I mostly use eltype to get a result I can use for later dispatch in order to get better performance. Of course there might be inference regressions, but that would just mean performance regressions down the line. I’d just rather be able to trust that eltype is doing its best, so I can get the best performance and avoid digging into the compiler.

tbeason · May 4, 2022, 4:05pm

Sort of sidestepping the problem here, but in cases like the OP where you are interested in the eltype of a pretty quick operation wouldn’t it always work to do something like typeof(first(gen))? What is lost here?

cscherrer · May 4, 2022, 4:08pm

Often, but not in general:

julia> gen = (1 + x for x in (1, 1.0, 3+4im))
Generator{Tuple{Int64, Float64, Complex{Int64}}, var"#11#12"}(#11, (1, 1.0, 3+4im))

julia> typeof(first(gen))
Int64

Sukera · May 4, 2022, 4:10pm

If the iterator is stateful, you lose the first element.

You can help inference along by bypassing it via a type parameter of the iterator that contains the eltype - you then of course have the problem of your users having to specify the type ahead of time. For an example on how this can be done, check out this inferrable Flatten iterator (just noticed this has a bug - I forgot a convert in the early return). In that package, I sidestep the problem by never exposing that iterator in the first place and having it “fall into place” through the API. That’s of course not possible when it’s supposed to be front-facing user API.

stevengj · May 4, 2022, 4:42pm

So then the result of your code (and not just the performance) depends on compiler internals. This might sometimes be okay, but it’s a fragile strategy for general use.

There are ways to get both performance and better stability. e.g. look at the way collect works (or map). It is structured so that for any non-empty iterator, it eagerly allocates the array based on the type of the first element. In the unlikely event of a type-unstable iterator where they encounter a different type later on, it re-allocates the array with a wider type. This way it is fast in the common case of a type-inferred type-stable iterator, but the type inference only improves performance rather than changing the result. (For the case of an empty iterator, collect may use Core.Compiler.return_type to return the appropriately typed empty array; the alternative would be to return Any[] and have a type-unstable collect in the common case, which would be worse.)

cscherrer · May 4, 2022, 4:54pm

It depends. I was thinking of reductions, for example as in this PR

github.com/JuliaStats/LogExpFunctions.jl

sumlog

JuliaStats:master ← cscherrer:master

opened 05:08PM - 02 May 22 UTC

cscherrer

+246 -2

This PR adds `sumlog`, a more efficient way to compute `sum(log, x)`. There's mo…re discussion on this on Discourse here: https://discourse.julialang.org/t/sum-of-logs/80370 EDIT: I think we have a good enough understanding of what's possible to lay out some design criteria. That ought to be more efficient than taking each line of code in isolation. As a starting point, I suggest 1. Whenever `sum(log ∘ f, x)` is defined, `sumlog(f, x)` should give the same result (within some tolerance, etc) 2. `sumlog(x) == sumlog(identity, x)` 3. `sumlog(f, x)` should support a `dims` keyword argument whenever `sum(log ∘ f, x)` does (i.e., when `x` is an AbstractArray) 4. `sumlog` should be type-stable and compiler-friendly when possible 5. `sumlog` should use the optimized method requiring a single `log` application, whenever that's possible. @devmotion @mcabbott thoughts on these?

There, we want to use a floating point trick for types that will return an AbstractFloat, and fall back to sum(log, x) in other cases. The result won’t depend on which method is called, only performance.

Thanks. @mcabbott had a similar comment, but I didn’t see that as completely removing the benefit of eltype returning a narrower type. I’ll have a closer look and think through this some more.

Mason · May 4, 2022, 7:17pm

By the way, @tkf has a nice pacakge BangBang.jl specifically for enabling these sorts of workflows.

E.g. here is how one might implement map using BangBang.jl

using BangBang
function mymap(f, iters...)
    out = Union{}[]
    for tup in zip(iters...)
        out = push!!(out, f(tup...))
    end
    out
end

Topic		Replies	Views
Should Generators finally be given eltype Internals & Design	25	2295	February 7, 2019
Can `eltype()` deduce the element type of a generator? Internals & Design data	30	2927	December 11, 2019
Generator - iterators should deliver accurate eltype Internals & Design	3	1160	August 3, 2017
Performance issue with use of eltype()? General Usage performance	7	1068	September 7, 2017
I don't get Base.IteratorElType Internals & Design question , array , design , collection , iterators	6	377	April 30, 2024

`eltype` could be smarter

Related topics