I want to annotate a type-unstable variable with a type-parameter constraint to improve the chance of union splitting during compilation. In general, should I do Tuple of Union or Union of Tuple?
For example, which of the following struct is better, assuming T1 and T2 have different underlying data structures?
struct TofU{T1, T2}
a::NTuple{2, Union{T1, T2}}
end
struct UofT{T1, T2}
a::Union{Tuple{T1, T2}, Tuple{T2, T1}, NTuple{2, T1}, NTuple{2, T2}}
end
Note that for any given T1 and T2 (T1 != T2), though NTuple{2, Union{T1, T2}} and Union{Tuple{T1, T2}, Tuple{T2, T1}, NTuple{2, T1}, NTuple{2, T2}} represent two “equal” types, they are not identical (egal):
For some silly reason, tuples in Julia are covariant. Hence, Tuple{Union{A, B}} is a UnionAll type. I think this is basically a straight up design flaw in Julia (see Invariant Tuples · Issue #24614 · JuliaLang/julia · GitHub), which stems from the inadvisable idea of having function parameters (which must be covariant) be the same thing as tuples.
So use unions of tuples? Well, those suffer from combinatorial explosions. If your tuple have 6 fields each with 3 possibilities, the resulting union contains 3^6 = 729 concrete tuple types. In this case, you should use named tuples, which, unlike ordinary tuples, are invariant.
So:
Use named tuples of unions if individual fields of your union are union-typed
Use unions of tuples if you really have distinct tuple types, and not the same tuple type with union-typed fields
Not to mention that despite making this weird choice, function parameters are not the same as Tuples in at least a few observable ways! e.g. function parameters can specialize on Type{<:T}, whereas Tuple only specializes to DataType:
julia> f(::Type{T}) where {T} = T;
julia> f(Int)
Int64
julia> typeof((Int,))
Tuple{DataType}
julia> let args = (Int,)
@btime f($args...)
end
157.771 ns (1 allocation: 32 bytes)
Int64
For a given pair of (A::DataType, B::DataType), I think Tuple{Union{A, B}} is just a Tuple rather than a UnionAll.
Personally, I’m fine with having Tuple being covariant by construction, but it would be nice if Tuple of Union can be automatically transpiled into Union of Tuple for better type inference and optimization. But then again, you have pointed out the risk/footgun of such an exponentially expressive type construction enabled by Tuple and Union:
I’m struggling to see how the covariance or combinatorial explosions are relevant here. The small union optimizations only handle small unions — N=4 or less — which is just fitting the use-case here. It’s not going to be advantageous to do a big union anyhow.
Doing better for tuples of small unions (which, yes, would necessarily need to net to small combinatorics) is:
I agree that the discussion of “combinatorial explosions” is not directly related to the OP if we focus only on small unions.
If we do want to extend the discussion, I have a slightly more relevant question. Say we do have a Tuple of Union that, when rewritten into Union of Tuple, will result in a Union of more than four concrete types–meaning we can no longer utilize the benefit of union splitting (by default). Is it possible to directly optimize the code stability with a Tuple of Union?
It’s not a UnionAll in the sense that it’s an instance of UnionAll, but it behaves exactly like one (i.e. it has an equivalent UnionAll, as it refers to the same set of types) :
julia> UA = Tuple{T} where T <: Union{Int, UInt};
julia> UA isa UnionAll
true
julia> UA == Tuple{Union{Int, UInt}}
true
As you very well pointed out, tuples of unions cannot be converted to unions of tuples without a combinatorial explosion. Note that this problem is unique for tuples, which is one reason the covariance is such a bad idea.
It’s the covariance that causes tuples of unions to be non-concrete types, which is what causes the whole issue of OP in the first place. If they were invariant, like almost every other type in Julia[1]NTuple{2, Union{T1, T2}} would be concrete, and there would be no problem. Issue 57054 would not be a thing.
You’re right that the OP example case happens to lie just on the threshold for (the internal compiler heuristic of) union splitting, but in the general case, which I what I assumed OP was interested in, one cannot rely on union splitting, as the number of possible types quickly exceeds the threshold.