Non-deterministic normalization of Tuples of Unions

One problem I wanted to point at is:

julia> sumN(n) = Union{[ Tuple{[ifelse(i & (1<<j) != 0, Int8, UInt8 ) for j=0:n-1]...}  for i = 0:(2^n-1)]...}
sumN (generic function with 1 method)

julia> prodN(n) = NTuple{n, Union{Int8, UInt8}}
prodN (generic function with 1 method)

julia> A0=Vector{sumN(3)}(undef, 1)
1-element Vector{Union{Tuple{Int8, Int8, Int8}, Tuple{Int8, Int8, UInt8}, Tuple{Int8, UInt8, Int8}, Tuple{Int8, UInt8, UInt8}, Tuple{UInt8, Int8, Int8}, Tuple{UInt8, Int8, UInt8}, Tuple{UInt8, UInt8, Int8}, Tuple{UInt8, UInt8, UInt8}}}:
 (0, 0, 0)

julia> A1=Vector{prodN(3)}(undef, 1)
1-element Vector{Union{Tuple{Int8, Int8, Int8}, Tuple{Int8, Int8, UInt8}, Tuple{Int8, UInt8, Int8}, Tuple{Int8, UInt8, UInt8}, Tuple{UInt8, Int8, Int8}, Tuple{UInt8, Int8, UInt8}, Tuple{UInt8, UInt8, Int8}, Tuple{UInt8, UInt8, UInt8}}}:
 (0, 0, 0)

vs

julia> sumN(n) = Union{[ Tuple{[ifelse(i & (1<<j) != 0, Int8, UInt8 ) for j=0:n-1]...}  for i = 0:(2^n-1)]...}
sumN (generic function with 1 method)

julia> prodN(n) = NTuple{n, Union{Int8, UInt8}}
prodN (generic function with 1 method)

julia> A1=Vector{prodN(3)}(undef, 1)
1-element Vector{Tuple{Union{Int8, UInt8}, Union{Int8, UInt8}, Union{Int8, UInt8}}}:
 #undef

julia> A0=Vector{sumN(3)}(undef, 1)
1-element Vector{Tuple{Union{Int8, UInt8}, Union{Int8, UInt8}, Union{Int8, UInt8}}}:
 #undef

julia> r = Ref{sumN(3)}()
Base.RefValue{Union{Tuple{Int8, Int8, Int8}, Tuple{Int8, Int8, UInt8}, Tuple{Int8, UInt8, Int8}, Tuple{Int8, UInt8, UInt8}, Tuple{UInt8, Int8, Int8}, Tuple{UInt8, Int8, UInt8}, Tuple{UInt8, UInt8, Int8}, Tuple{UInt8, UInt8, UInt8}}}((0, 0, 0))

From that example we learn that the data layout of arrays / types is not deterministic in julia (e.g. suppose that the first compile/call events for A0/A1 raced against each other). Unfortunate!

It depends on the description that is first seen and then cached in the method tables. If we later come with a different description, the compiler must look whether it already knows an equivalent description of the type and use the data layout it already fixed. It must do this because “equivalent type” is, to some extent, speced in stone – we can’t really decide before 2.0 that sumN(10) != prodN(10).

2 Likes

Is there an issue that tracks this behavior?

2 Likes

No idea how this was related to the other thread, but I can replicate this on v1.10.0. FWIW, regardless of which order the methods were called, this happens:

julia> prodN(3)
Tuple{Union{Int8, UInt8}, Union{Int8, UInt8}, Union{Int8, UInt8}}

julia> sumN(3)
Union{Tuple{Int8, Int8, Int8}, Tuple{Int8, Int8, UInt8}, Tuple{Int8, UInt8, Int8}, Tuple{Int8, UInt8, UInt8}, Tuple{UInt8, Int8, Int8}, Tuple{UInt8, Int8, UInt8}, Tuple{UInt8, UInt8, Int8}, Tuple{UInt8, UInt8, UInt8}}

julia> prodN(3) === prodN(3), sumN(3) === sumN(3)
(true, true)

julia> prodN(3) === sumN(3), prodN(3) == sumN(3)
(false, true)

So it seems to me the compiler picks one of a tuple of unions or an equivalent but not identical union of tuples, no matter which was actually provided, but only the latter gets to have the isbits Union inline optimization.

Afaiu not, but I thought this was more or less known? Like the examples for undecideable type inequality that I can’t remember? An unfortunate fact that is unlikely to be fixed anytime soon, but quite educational on how julia works internally.

But there probably should be an issue, a lot of typing-related code was cleaned up in the last years.

I don’t think this is a bug (I struggle to formulate a desired behavior!), but it is especially unfortunate with respect to the naggling fear of performance issues: package A works fine and fast in isolation, but it becomes dog slow when package Y does some work beforehand, super hard to debug interaction.

Ah, it is the struggle between the 100% intensional === between types, and the semi-extensional == (it is extensional on tuple/union types). Dispatch uses the semi-extensional ==, but data layout uses the intensional variant, leading to these inconsistencies that don’t blow up correctness due to the magnificience of the dispatch system / caching.

We’re going to have to agree to disagree on our usage of extensional vs intensional, I don’t even think those are related to what’s going on here.

julia> Union{sumN(3), prodN(3)} === Union{prodN(3), sumN(3)} === prodN(3)
true

julia> foo(::Type{prodN(3)}) = "success"
foo (generic function with 1 method)

julia> foo(sumN(3))
"success"

I think this at least suggests that switching one type out for the other won’t affect instances and dispatch. So the fix is probably just making the compiler more aggressively infer and implement Union of Tuples under the optimization limit. I think I’d prefer it print the shorter version regardless of what is implemented.

The underlying problem used to be that we do not have a good normalization / canonization algorithm for types.

That is, a function normalizeType that, given a description of a type, produces a normalized description: The function needs to be deterministic, and must have that normalizeType(A) === normalizeType(B) if and only if A == B.

For tuple/unions this is equivalent to normalizing multivariate polynomials (Union means +, each element of the union in each position gets a variable). Detecting whether two such guys are equivalent is simple (e.g. polynomial identity testing), but normalizing seems harder! Not trivial to find something subexponential.

But maybe it is actually trivial for people who know more algebra?

So what julia does is give up on the “deterministic” thing. Then identity testing is good enough to ensure consistency within the same program run – just cache all the answers we gave so far, and check whether any of them applies. This scales worst case with the number of actual types encountered during program execution, no exponentials in sight.

If this approach stays, then the bar for “more aggresive tuple optimization” thing is really high: Since we don’t fix the general problem that causes the naggling fear of bad package interactions, we would need to demonstrate that the specific tuple thing was a real-world problem before.

@mbauman I found some old issues. One is that hash breaks the contract that isequal implies same hash Typing unsoundness · Issue #32085 · JuliaLang/julia · GitHub Some concrete types are not properly interned · Issue #31696 · JuliaLang/julia · GitHub

I opened the original bug report on that on 0.62 :wink:

julia> hash(sumN(4))
0xa486830eb3c30e23

julia> hash(prodN(4))
0x6e366bbdda0de960
2 Likes

Is improving normalization for types in general necessary for this specific case though? Considering other types doesn’t seem necessary because Tuples are unique in having covariant parameters (well, as far as I know) e.g. Tuple{Bool, Int8} and Tuple{Bool, UInt8} are the concrete subtypes of Tuple{Bool, Union{Int8, UInt8}}, so Union{Tuple{Bool, Int8}, Tuple{Bool, UInt8}} is equivalent. It is evident from that example, but when I say Tuple of Unions I don’t mean every parameter is a Union. Tuple of Unions versus Union of Tuples doesn’t seem to matter except for applying the isbits Union optimization for array elements or struct fields, so we only need to construct Union of Tuples for those cases. Tuple types associated with call signatures should definitely be excepted, I don’t know how types caches work exactly.

Even before constructing the Union, we can go through a Tuple once to check if it’s worth it. I think a Tuple is only isbits if the type parameters are (the docs says so for concreteness, at least), so we can bail if we find 1 non-isbits non-Union type. The type tag for isbits Union elements is UInt8, so we can also bail if the product of the sizes of the Unions ever exceed 256, or possibly a lower implemented limit. If the Tuple ends in Varag, we can bail if it is variadic; if possible, we should probably start the checks from the end. If we don’t ever bail, then the Union of Tuples is the Cartesian product of the Tuple of Unions.