Missing is not special in the sense that the representation of Vector{Union{Blag, Float64}} is represented efficiently just as Vector{Union{Missing, Float64}} is. The promotion machinery treats Missing “specially” in that it has this promotion rule:
promote_rule(::Type{Missing}, ::Type{T}) where {T} = Union{Missing, T}
If you define a similar promotion rule for Blag then I would have thought it should behave similarly, but for some reason that doesn’t work:
julia> Base.promote_rule(::Type{Blag}, ::Type{T}) where {T} = Union{Blag, T}
julia> typeof([ifelse(randn() < 0, 1.0, Blag()) for _ in 1:100])
Array{Any,1}
Perhaps @nalimilan who (IIRC) implemented this special behavior knows what’s up.
Not sure why, but the difference appears to be related to collect_to_with_first!:
One interesting thing is that some code blocks seem to be reordered. Another is that the inference result for genblag is actually tighter than than for genmissing (Array vs. AbstractArray).
julia> Base.promote_typejoin(Float64, Blag)
Any
julia> Base.promote_typejoin(Float64, Missing)
Union{Missing, Float64}
That’s being called here:
There are overloads for Missing:
So after defining
Base._promote_typejoin(::Type{Blag}, ::Type{T}) where {T} =
isconcretetype(T) || T === Union{} ? Union{T, Blag} : Any
Base._promote_typejoin(::Type{T}, ::Type{Blag}) where {T} =
isconcretetype(T) || T === Union{} ? Union{T, Blag} : Any
you get
julia> typeof([ifelse(randn() < 0, 1.0, Blag()) for _ in 1:100])
Array{Union{Blag, Float64},1}
Yes, currently Missing and Nothing are special-cased in promote_typejoin, because it wasn’t clear how to generalize it: what is a “small” type union? We could pick an arbitrary threshold, but that would give inconsistent behaviors when you add more types.
For now I guess you can overload promote_typejoin if you define a special type which needs the same treatment, but beware that it’s unexported and can therefore change at any point. It would be nice to decide whether that mechanism is there to stay, or whether we can find better, more general rules.
function Base._promote_typejoin(::Type{S}, ::Type{T}) where {S, T}
_cs(T, S) = (isconcretetype(T) || T ≡ Union{}) && Base.issingletontype(S)
if _cs(T, S) || _cs(S, T)
Union{T, S}
else
typejoin(S, T)
end
end
would work, generalizing the existing cases to the union of a concrete and a singleton type.
This is probably very basic, but I can’t quite figure it out. Why is promote_typejoin necessary given that promote_type(Int, Missing) = Union{T, Missing}?
I’m asking because I’m implementing the collection mechanism to collect an iterator of structs into a struct of arrays and have been using promote_type and it seemed to work fine for Missing as well (code here).
promote_type is only used in some places, like cat, but other functions like collect and map don’t use it since preserve the types of elements and therefore merely choose an eltype which is a supertype of all entries’ types. IOW no promotion/conversion happens. promote_typejoin is used in that case: it’s really just typejoin, which a special case for Missing and Nothing (the “promote” in the name is a bit misleading since it doesn’t use promotion).
I see, so basically it’s kind of a judgement call in the collection mechanism whether we want to automatically promote (so that mixing Int and Float64 gives Float64, like in vcat) or whether we take the supertype (so mixing Int and Float64 gives Real, like in collect), with the option of taking the supertype expect when one of the types is Missing or Nothing (using Base.promote_typejoin). I understand that before this is explicitly documented it’s better to be 100% sure whether this is our definite solution or if there is somehow a simpler solution.
I guess we need to find a way to allow custom types to opt-in to be included into the list of special types used by _promote_typesubtract. Not sure how to do that.
julia> struct Blag end
# N.B. now `promote_typejoin` not `_promote_typejoin`
julia> Base.promote_typejoin(::Type{Blag}, ::Type{T}) where {T} =
isconcretetype(T) || T === Union{} ? Union{T, Blag} : Any
julia> Base.promote_typejoin(::Type{T}, ::Type{Blag}) where {T} =
isconcretetype(T) || T === Union{} ? Union{T, Blag} : Any
# we get the desired outcome
julia> typeof([ifelse(randn() < 0, 1.0, Blag()) for _ in 1:100])
Vector{Union{Blag, Float64}} (alias for Array{Union{Blag, Float64}, 1})
So, is it a bad idea to overload Base.promote_typejoin?
Avoid overloading Base.<anything>. with an alternate implementation for the same signature.
If you feel you have a better way, submit a PR with the revision.
In general, overloading Base methods with custom types is very much intended. Type piracy is pretty discouraged, though. (type piracy = overloading methods owned by other modules with types that are not owned by your module)
This specific case is unfortunately bad, because it does not compose:
julia> struct Blag2 end
julia> Base.promote_typejoin(::Type{Blag2}, ::Type{T}) where {T} =
isconcretetype(T) || T === Union{} ? Union{T, Blag2} : Any
julia> Base.promote_typejoin(::Type{T}, ::Type{Blag2}) where {T} =
isconcretetype(T) || T === Union{} ? Union{T, Blag2} : Any
julia> typeof([ifelse(randn() < 0, Blag(), Blag2()) for _ in 1:100])
ERROR: MethodError: promote_typejoin(::Type{Blag2}, ::Type{Blag}) is ambiguous. Candidates:
promote_typejoin(::Type{T}, ::Type{Blag}) where T in Main at REPL[4]:1
promote_typejoin(::Type{Blag2}, ::Type{T}) where T in Main at REPL[13]:1
This is not really bad type piracy (changing behavior of unrelated code just by loading your module), but it is not pretty either (two independent modules might interact suboptimally – i.e. it is up to people who want to (transitively) import both Blag and Blag2 to resolve the conflict, and this resolution must engage in actual “officially discouraged” type piracy)