Small union type failure

I thought that missing wasn’t treated specially, and that all small type unions would be Unioned. Have I misunderstood?

julia> struct Blag end

julia> typeof([ifelse(randn() < 0, 1.0, Missing()) for _ in 1:100])
Array{Union{Missing, Float64},1}

julia> typeof([ifelse(randn() < 0, 1.0, Blag()) for _ in 1:100])
Array{Any,1}

Why is that not Union{Blag, Float64}?

1 Like

Missing is not special in the sense that the representation of Vector{Union{Blag, Float64}} is represented efficiently just as Vector{Union{Missing, Float64}} is. The promotion machinery treats Missing “specially” in that it has this promotion rule:

promote_rule(::Type{Missing}, ::Type{T}) where {T} = Union{Missing, T}

If you define a similar promotion rule for Blag then I would have thought it should behave similarly, but for some reason that doesn’t work:

julia> Base.promote_rule(::Type{Blag}, ::Type{T}) where {T} = Union{Blag, T}

julia> typeof([ifelse(randn() < 0, 1.0, Blag()) for _ in 1:100])
Array{Any,1}

Perhaps @nalimilan who (IIRC) implemented this special behavior knows what’s up.

Are you sure that promotion is actually involved in this operation?

If anybody’s interested, to see the relevant code_warntype, run:

struct Blag end
genblag = Base.Generator(i -> ifelse(randn() < 0, Blag(), 1.0), 1:100)
@code_warntype collect(genblag)
genmissing = Base.Generator(i -> ifelse(randn() < 0, Missing(), 1.0), 1:100)
@code_warntype collect(genmissing)

Not sure why, but the difference appears to be related to collect_to_with_first!:

One interesting thing is that some code blocks seem to be reordered. Another is that the inference result for genblag is actually tighter than than for genmissing (Array vs. AbstractArray).

Ah, found the difference I think:

julia> Base.promote_typejoin(Float64, Blag)
Any

julia> Base.promote_typejoin(Float64, Missing)
Union{Missing, Float64}

That’s being called here:

There are overloads for Missing:

So after defining

Base._promote_typejoin(::Type{Blag}, ::Type{T}) where {T} =
    isconcretetype(T) || T === Union{} ? Union{T, Blag} : Any
Base._promote_typejoin(::Type{T}, ::Type{Blag}) where {T} =
    isconcretetype(T) || T === Union{} ? Union{T, Blag} : Any

you get

julia> typeof([ifelse(randn() < 0, 1.0, Blag()) for _ in 1:100])
Array{Union{Blag, Float64},1}
4 Likes

Yes, currently Missing and Nothing are special-cased in promote_typejoin, because it wasn’t clear how to generalize it: what is a “small” type union? We could pick an arbitrary threshold, but that would give inconsistent behaviors when you add more types.

For now I guess you can overload promote_typejoin if you define a special type which needs the same treatment, but beware that it’s unexported and can therefore change at any point. It would be nice to decide whether that mechanism is there to stay, or whether we can find better, more general rules.

2 Likes

I wonder if something along the lines of

function Base._promote_typejoin(::Type{S}, ::Type{T}) where {S, T}
    _cs(T, S) = (isconcretetype(T) || T ≡ Union{}) && Base.issingletontype(S)
    if _cs(T, S) || _cs(S, T)
        Union{T, S}
    else
        typejoin(S, T)
    end
end

would work, generalizing the existing cases to the union of a concrete and a singleton type.

This is probably very basic, but I can’t quite figure it out. Why is promote_typejoin necessary given that promote_type(Int, Missing) = Union{T, Missing}?

I’m asking because I’m implementing the collection mechanism to collect an iterator of structs into a struct of arrays and have been using promote_type and it seemed to work fine for Missing as well (code here).

promote_type is only used in some places, like cat, but other functions like collect and map don’t use it since preserve the types of elements and therefore merely choose an eltype which is a supertype of all entries’ types. IOW no promotion/conversion happens. promote_typejoin is used in that case: it’s really just typejoin, which a special case for Missing and Nothing (the “promote” in the name is a bit misleading since it doesn’t use promotion).

I see, so basically it’s kind of a judgement call in the collection mechanism whether we want to automatically promote (so that mixing Int and Float64 gives Float64, like in vcat) or whether we take the supertype (so mixing Int and Float64 gives Real, like in collect), with the option of taking the supertype expect when one of the types is Missing or Nothing (using Base.promote_typejoin). I understand that before this is explicitly documented it’s better to be 100% sure whether this is our definite solution or if there is somehow a simpler solution.

1 Like
julia> [1.0, missing, nothing]
3-element Array{Union{Missing, Nothing, Float64},1}:
 1.0     
  missing
  nothing

julia> identity.([1.0, missing, nothing])
3-element Array{Any,1}:
 1.0     
  missing
  nothing

This one is a bummer, too. It seems that broadcasting/comprehensions give up on the type whenever there’s more than two involved.

The doom bell has rung, Preserve non-concrete types in promote_typejoin by vtjnash · Pull Request #37019 · JuliaLang/julia · GitHub removed _promote_typejoin. Does anyone know what should be overloaded now? promote_typejoin?

2 Likes

I guess we need to find a way to allow custom types to opt-in to be included into the list of special types used by _promote_typesubtract. Not sure how to do that.

Ouch, yeah, looks like we need to pirate the base method at the moment?

Hi, using Julia 1.6.0 consider the following

julia> struct Blag end

# N.B. now `promote_typejoin` not `_promote_typejoin`
julia> Base.promote_typejoin(::Type{Blag}, ::Type{T}) where {T} =
           isconcretetype(T) || T === Union{} ? Union{T, Blag} : Any
julia> Base.promote_typejoin(::Type{T}, ::Type{Blag}) where {T} =
           isconcretetype(T) || T === Union{} ? Union{T, Blag} : Any

# we get the desired outcome
julia> typeof([ifelse(randn() < 0, 1.0, Blag()) for _ in 1:100])
Vector{Union{Blag, Float64}} (alias for Array{Union{Blag, Float64}, 1})

So, is it a bad idea to overload Base.promote_typejoin?

1 Like

Avoid overloading Base.<anything>. with an alternate implementation for the same signature.
If you feel you have a better way, submit a PR with the revision.

1 Like

In general, overloading Base methods with custom types is very much intended. Type piracy is pretty discouraged, though. (type piracy = overloading methods owned by other modules with types that are not owned by your module)

This specific case is unfortunately bad, because it does not compose:

julia> struct Blag2 end

julia> Base.promote_typejoin(::Type{Blag2}, ::Type{T}) where {T} =
       isconcretetype(T) || T === Union{} ? Union{T, Blag2} : Any

julia> Base.promote_typejoin(::Type{T}, ::Type{Blag2}) where {T} =
       isconcretetype(T) || T === Union{} ? Union{T, Blag2} : Any

julia> typeof([ifelse(randn() < 0, Blag(), Blag2()) for _ in 1:100])
ERROR: MethodError: promote_typejoin(::Type{Blag2}, ::Type{Blag}) is ambiguous. Candidates:
  promote_typejoin(::Type{T}, ::Type{Blag}) where T in Main at REPL[4]:1
  promote_typejoin(::Type{Blag2}, ::Type{T}) where T in Main at REPL[13]:1

This is not really bad type piracy (changing behavior of unrelated code just by loading your module), but it is not pretty either (two independent modules might interact suboptimally – i.e. it is up to people who want to (transitively) import both Blag and Blag2 to resolve the conflict, and this resolution must engage in actual “officially discouraged” type piracy)

4 Likes