Small union type failure


#1

I thought that missing wasn’t treated specially, and that all small type unions would be Unioned. Have I misunderstood?

julia> struct Blag end

julia> typeof([ifelse(randn() < 0, 1.0, Missing()) for _ in 1:100])
Array{Union{Missing, Float64},1}

julia> typeof([ifelse(randn() < 0, 1.0, Blag()) for _ in 1:100])
Array{Any,1}

Why is that not Union{Blag, Float64}?


#2

Missing is not special in the sense that the representation of Vector{Union{Blag, Float64}} is represented efficiently just as Vector{Union{Missing, Float64}} is. The promotion machinery treats Missing “specially” in that it has this promotion rule:

promote_rule(::Type{Missing}, ::Type{T}) where {T} = Union{Missing, T}

If you define a similar promotion rule for Blag then I would have thought it should behave similarly, but for some reason that doesn’t work:

julia> Base.promote_rule(::Type{Blag}, ::Type{T}) where {T} = Union{Blag, T}

julia> typeof([ifelse(randn() < 0, 1.0, Blag()) for _ in 1:100])
Array{Any,1}

Perhaps @nalimilan who (IIRC) implemented this special behavior knows what’s up.


#3

Are you sure that promotion is actually involved in this operation?

If anybody’s interested, to see the relevant code_warntype, run:

struct Blag end
genblag = Base.Generator(i -> ifelse(randn() < 0, Blag(), 1.0), 1:100)
@code_warntype collect(genblag)
genmissing = Base.Generator(i -> ifelse(randn() < 0, Missing(), 1.0), 1:100)
@code_warntype collect(genmissing)

Not sure why, but the difference appears to be related to collect_to_with_first!:

One interesting thing is that some code blocks seem to be reordered. Another is that the inference result for genblag is actually tighter than than for genmissing (Array vs. AbstractArray).


#4

Ah, found the difference I think:

julia> Base.promote_typejoin(Float64, Blag)
Any

julia> Base.promote_typejoin(Float64, Missing)
Union{Missing, Float64}

That’s being called here:

There are overloads for Missing:

So after defining

Base._promote_typejoin(::Type{Blag}, ::Type{T}) where {T} =
    isconcretetype(T) || T === Union{} ? Union{T, Blag} : Any
Base._promote_typejoin(::Type{T}, ::Type{Blag}) where {T} =
    isconcretetype(T) || T === Union{} ? Union{T, Blag} : Any

you get

julia> typeof([ifelse(randn() < 0, 1.0, Blag()) for _ in 1:100])
Array{Union{Blag, Float64},1}

#5

Yes, currently Missing and Nothing are special-cased in promote_typejoin, because it wasn’t clear how to generalize it: what is a “small” type union? We could pick an arbitrary threshold, but that would give inconsistent behaviors when you add more types.

For now I guess you can overload promote_typejoin if you define a special type which needs the same treatment, but beware that it’s unexported and can therefore change at any point. It would be nice to decide whether that mechanism is there to stay, or whether we can find better, more general rules.


#6

I wonder if something along the lines of

function Base._promote_typejoin(::Type{S}, ::Type{T}) where {S, T}
    _cs(T, S) = (isconcretetype(T) || T ≡ Union{}) && Base.issingletontype(S)
    if _cs(T, S) || _cs(S, T)
        Union{T, S}
    else
        typejoin(S, T)
    end
end

would work, generalizing the existing cases to the union of a concrete and a singleton type.


#7

This is probably very basic, but I can’t quite figure it out. Why is promote_typejoin necessary given that promote_type(Int, Missing) = Union{T, Missing}?

I’m asking because I’m implementing the collection mechanism to collect an iterator of structs into a struct of arrays and have been using promote_type and it seemed to work fine for Missing as well (code here).


#8

promote_type is only used in some places, like cat, but other functions like collect and map don’t use it since preserve the types of elements and therefore merely choose an eltype which is a supertype of all entries’ types. IOW no promotion/conversion happens. promote_typejoin is used in that case: it’s really just typejoin, which a special case for Missing and Nothing (the “promote” in the name is a bit misleading since it doesn’t use promotion).


#9

I see, so basically it’s kind of a judgement call in the collection mechanism whether we want to automatically promote (so that mixing Int and Float64 gives Float64, like in vcat) or whether we take the supertype (so mixing Int and Float64 gives Real, like in collect), with the option of taking the supertype expect when one of the types is Missing or Nothing (using Base.promote_typejoin). I understand that before this is explicitly documented it’s better to be 100% sure whether this is our definite solution or if there is somehow a simpler solution.