Recommended idiom for collecting narrowest type

Suppose I have a Vector with an abstract element type. What is the recommended idiom for collecting the elements into a vector with the narrowest possible element type? MWE:

struct Foo{T}
    x::T
end
X = Foo[Foo(1), Foo(2)]
collect(X)                      # Vector{Foo}
collect(x for x in X)           # Vector{Foo{Int}}

The last one works but feels a bit clunky.

(It is understood that the operation is not type stable; this is before calling a function barrier)

1 Like

identity.(X) or map(identity, X)?

7 Likes

I second jules’ suggestion, but your last line could also be [x for x in X] instead of collect.

1 Like

Is it documented which construct actually tries to find the narrowest type? Empirically it works, but what does the language guarantee?

1 Like

I don’t think it tries to find the narrowest type. For example:

julia> x = ["a", 2, :c]
3-element Vector{Any}:
  "a"
 2
  :c

julia> identity.(x)
3-element Vector{Any}:
  "a"
 2
  :c

Julia has some heuristics which Union types it will choose in such scenarios and when it falls back to Any.

also comprehensions promote weirdly. [1, 2.0] becomes a Float64[] when you may have wanted Real.

Part of the problem is that what you want doesn’t seem uniquely defined.

For example, if you have x = Any[1.0f0, 2.0], do you want AbstractFloat[1.0f0, 2.0] (the result of map(identity, x) or [x for x in x], corresponding to typejoin), or do you want Float64[1.0, 2.0] (which is a lossless conversion, corresponding to calling promote_type on the types via mapreduce(typeof, promote_type, x), and also what you would get from [1.0f0, 2.0]). Or do you want Union{Float32,Float64}[1.0f0, 2.0], which is narrower than AbstractFloat?

What are you trying to accomplish? If it’s to improve performance, then you probably want a concrete type like Float64 or at worst a union of two or three types — whereas a typejoin like AbstractFloat[...] may be as bad as Any[...] for performance.

That’s literally what promote does (via promote_type(Int, Float64) === Float64).

It’s the difference between promotion (promote and promote_type) versus typejoin (and promote_typejoin).

You can do mapreduce(typeof, typejoin, x) versus mapreduce(typeof, promote_type, x), depending on which behavior you want, if you want to be explicit.

It would be nice to explicitly document that collect on an iterator uses typejoin (this doesn’t seem to be in the current collect docstring?), whereas array literals use promote_type. (map and comprehensions are already documented to behave like collect on iterators.)

8 Likes

Catching non-concrete container eltypes before a function barrier. Conversion is fine, but so is throwing an error so that I can investigate. In theory the computation is such that I can prove that all elements have the same type. MWE:

struct Foo{T}
    x::T
end
foos = [Foo(rand() < 0.5 ? rand('a':'z') : rand(0:20)) for _ in 1:100];
collect1(x) = isempty(x) ? Nothing[] : collect(typeof(first(x)), x)
#  ^ this does what I want
foos_char = collect1(filter(f -> f.x isa Char, foos))
foos_int = collect1(filter(f -> f.x isa Int, foos))
do_something_with(foos_char, foos_int)

In the actual example the types are rather complex and I would rather not compute them if it is avoidable.

I actually had issues with this about three months ago; I got weird results with collect but that’s because I wanted it to promote instead of typejoin. Is there a way to specify promote behaviour when using a comprehension, or a map, or a collect?

1 Like

I’ve looooong wanted a better API/name/idiom for incremental widening, somewhat akin to BangBang.jl. There’s private functionality in base that allows you to specify the allocator that collect uses, and it’d be very cool for this to also allow you to explicitly specify how it chooses the next wider value (e.g., promote, typejoin, unions, or some mix thereof).

I actually think the basics are there for this, but it needs some significant API design work.

9 Likes

In that case it shouldn’t matter whether you are doing promote or typejoin, since they are both the identity for homogeneous concrete types. You can just check whether the result of map or collect has a concrete eltype.

You may wish to read Jeff Bezanson’s response to a related issue in some code that I wrote:

Peer inside an abstract parametric type · Issue #46047 · JuliaLang/julia

1 Like

How to collect an iterator, X, into an instance of Array, Y, such that eltype(Y) is based on the typejoin of the type of each value of X:

  • Using Collects.jl, which improves upon the interface and implementation of collect:

    using Collects: collect_as
    collect_as(Array, X)
    
  • Using collect, if you want to avoid depending on Collects.jl for some reason:

    • Using collect directly:

      collect(Iterators.map(identity, X))
      
    • Using map, which is implemented using collect:

      map(identity, X)
      

The documentation says:

The element type of the returned array is based on the types of the values collected. However, if the iterator is empty then the element type of the returned (empty) array is determined by type inference.

So the docs do not specify whether types are combined with typejoin or with promotion.

In my opinion:

  • It would be breaking for Base to change this behavior of collect regarding returned eltype.

  • However, a package adding a method to collect could do this. IMO that would be a bad practice, but allowed according to the contract as specified in the docs.

In any case, there is Collects.jl, which:

  • Generalizes the collect interface.

  • Documents the behavior in detail.

In summary: just use Collects.jl if you are worried about collect for any reason. Its topic here on Discourse:

That is just typejoin promote_typejoin.

While the syntax is similar, there are multiple distinct things colloquially referred to as “comprehension”. The form you refer to ([1, 2.0]) does promotion, while the topic here, as defined in the OP, is the form that does typejoin.

@Deduction42 @mbauman creating an issue on the Collects.jl repo:

Feel free to provide more specific suggestions.

Hm would you say so? These are not typejoin but are manually chosen exceptions (what I called heuristic but it might be the wrong word):

julia> [1, nothing]
2-element Vector{Union{Nothing, Int64}}:
 1
  nothing

julia> [1, missing]
2-element Vector{Union{Missing, Int64}}:
 1
  missing

julia> [1, missing, nothing]
3-element Vector{Union{Missing, Nothing, Int64}}:
 1
  missing
  nothing

julia> typejoin(Int64, Missing)
Any

julia> identity.(Any[1, missing, nothing])
3-element Vector{Union{Missing, Nothing, Int64}}:
 1
  missing
  nothing
1 Like

Sorry, you’re right. It is not exactly typejoin. I suppose promote_typejoin is what collect uses, then.

Right: julia/base/array.jl at 1dbc40fca56998b09fd8b005a96bc0457d585d85 · JuliaLang/julia · GitHub

(I was assuming that promote_typejoin was like promote but using typejoin instead of promote_type, but instead it is a different type-promotion function.)

The public doc string says:

Compute a type that contains both T and S, which could be either a parent of both types, or a Union if appropriate. Falls back to typejoin.

So promote_typejoin is a misnomer, it is just typejoin, with additional handling for some Unions. Nothing to do with promotion.