Recommended idiom for collecting narrowest type

Suppose I have a Vector with an abstract element type. What is the recommended idiom for collecting the elements into a vector with the narrowest possible element type? MWE:

struct Foo{T}
    x::T
end
X = Foo[Foo(1), Foo(2)]
collect(X)                      # Vector{Foo}
collect(x for x in X)           # Vector{Foo{Int}}

The last one works but feels a bit clunky.

(It is understood that the operation is not type stable; this is before calling a function barrier)

identity.(X) or map(identity, X)?

4 Likes

I second jules’ suggestion, but your last line could also be [x for x in X] instead of collect.

1 Like

Is it documented which construct actually tries to find the narrowest type? Empirically it works, but what does the language guarantee?

I don’t think it tries to find the narrowest type. For example:

julia> x = ["a", 2, :c]
3-element Vector{Any}:
  "a"
 2
  :c

julia> identity.(x)
3-element Vector{Any}:
  "a"
 2
  :c

Julia has some heuristics which Union types it will choose in such scenarios and when it falls back to Any.

also comprehensions promote weirdly. [1, 2.0] becomes a Float64[] when you may have wanted Real.

Part of the problem is that what you want doesn’t seem uniquely defined.

For example, if you have x = Any[1.0f0, 2.0], do you want AbstractFloat[1.0f0, 2.0] (the result of map(identity, x) or [x for x in x], corresponding to typejoin), or do you want Float64[1.0, 2.0] (which is a lossless conversion, corresponding to calling promote_type on the types via mapreduce(typeof, promote_type, x), and also what you would get from [1.0f0, 2.0]). Or do you want Union{Float32,Float64}[1.0f0, 2.0], which is narrower than AbstractFloat?

What are you trying to accomplish? If it’s to improve performance, then you probably want a concrete type like Float64 or at worst a union of two or three types — whereas a typejoin like AbstractFloat[...] may be as bad as Any[...] for performance.

That’s literally what promote does (via promote_type(Int, Float64) === Float64).

It’s the difference between promotion (promote and promote_type) versus typejoin (and promote_typejoin).

You can do mapreduce(typeof, typejoin, x) versus mapreduce(typeof, promote_type, x), depending on which behavior you want, if you want to be explicit.

It would be nice to explicitly document that collect on an iterator uses typejoin (this doesn’t seem to be in the current collect docstring?), whereas array literals use promote_type. (map and comprehensions are already documented to behave like collect on iterators.)

1 Like

Catching non-concrete container eltypes before a function barrier. Conversion is fine, but so is throwing an error so that I can investigate. In theory the computation is such that I can prove that all elements have the same type. MWE:

struct Foo{T}
    x::T
end
foos = [Foo(rand() < 0.5 ? rand('a':'z') : rand(0:20)) for _ in 1:100];
collect1(x) = isempty(x) ? Nothing[] : collect(typeof(first(x)), x)
#  ^ this does what I want
foos_char = collect1(filter(f -> f.x isa Char, foos))
foos_int = collect1(filter(f -> f.x isa Int, foos))
do_something_with(foos_char, foos_int)

In the actual example the types are rather complex and I would rather not compute them if it is avoidable.