Type promotion: Why not use more unions?

question

#1

I wanted to ask why type promotion/widening doesn’t use more unions. For example (on 0.62):

x=Any[Int32(0), 0];
typeof(collect(i for i in x))
Array{Signed,1}

More specifically: Widening takes the most specific abstract supertype. The in-memory representation is, as far as I understand, not more efficient than taking any; but dispatch on elements is faster, due to the fact that we already know better which method tables to consult. (here I must admit that I don’t fully understand the details there)

Of course a much more specific supertype is Union{Int32, Int64}. As far as I understood, union selectors are fast for up to 127 different types. So a possible different promotion rule could be “take unions of types if all are concrete, and less than 127 concrete types were encountered; otherwise use the old abstract promotion rule”, for a possibly much faster dispatch and more efficient storage.

I think we can never enumerate all subtypes of abstract types: The already allocated objects and compiled code need to stay valid if a new subtype is defined afterwards.

In this direction: Union{Int,Any} could have a place as a different thing from Any (branch on whether we have an Int; if not, dispatch via Any) in cases where we have many Int and few outliers. Of course Union{Int,Any} !== Any would look pretty weird.

edit: Of course one should cache the abstract promotion for dispatch on eltype(some_array); the unions are just supposed to speed up dispatch on typeof(some_array[i]).


#2

See also the discussion here: https://github.com/JuliaLang/julia/issues/22630#issuecomment-314947604


Best approach for runtime dispatching inside a hot loop (heterogeneous tree structure)
#3

So, you’re saying that post https://github.com/JuliaLang/julia/pull/25553 I could just give this a try; neat! I’ll see what happens and possibly reply either here or submit a PR for discussion.

(realistically I would probably take a more conservative limit on union size, e.g. 16, and remove the special handling of Missing/Nothing unions)


#4

The compiler is not currently equipped to handle such unions well.


#5

Thanks for the warning!

Simple testing showed that the compiler can currently handle Unions with 2 elements, and not three.

typecol = [ComplexF32, UInt8, Int8, UInt16, Int16, Int64, UInt64, Float64]

tp = typecol[1:2]; 
A_u = Vector{Union{tp...}}(10000);
for i=1:length(A_u) A_u[i]=rand(rand(tp)) end; 
A_uu=similar(A_u);
broadcast!(+, A_uu, A_u, A_u); @time broadcast!(+, A_uu, A_u, A_u);
#  0.000624 seconds (4 allocations: 160 bytes)

tp = typecol[1:3]; 
A_u = Vector{Union{tp...}}(10000);
for i=1:length(A_u) A_u[i]=rand(rand(tp)) end; 
A_uu=similar(A_u);
broadcast!(+, A_uu, A_u, A_u); @time broadcast!(+, A_uu, A_u, A_u);
  0.003153 seconds (19.71 k allocations: 308.078 KiB)

So the proper limit for union sizes is probably currently 2, and this could replace the special-casing for Missing and Nothing.

In other words, no PR (efficient (inline) storage makes limited sense if we can’t get non-allocating access, probably because inference doesn’t like larger unions).


#6

My example was of course terrible because I forgot the quadratic blow-up. The limit is 4.

Ab=Vector{Bool}(length(A_u));
tp = typecol[1:5];
...
@time broadcast!(iszero, Ab, A_u);
  0.001001 seconds (15.32 k allocations: 239.406 KiB)

tp = typecol[1:4];
@time broadcast!(iszero, Ab, A_u);
  0.000566 seconds (4 allocations: 160 bytes)

This does not change the fact that a promote-as-union with more than 2 elements makes limited sense.


#7

I had the same position as you before @jeff.bezanson said he would prefer special-casing Nothing and Missing, which I implemented.

I guess one of the reasons not to use Unions is that (as you’ve shown) it’s not clear where the threshold should be: currently the compiler is optimized for 2 types, which could change in the future, and yet the widening behavior (actually, that of promote_typejoin now in 0.7) cannot change because it could break existing code. Of course we could start with a higher threshold even if it doesn’t improve performance at the moment, but a high threshold could actually slow things down by forcing the compilation of methods for all these weird Unions, instead of always using methods for Any. Maybe we should pick a low threshold like 2 or 3. It feels weird to have such an arbitrary rule in user-facing APIs, though.

Maybe another way to look at the problem is to look for concrete use cases. I’m not sure it frequently happens in practice to mix Int and Int32 values in an array. Most of the time, you promote them to a common type (Int), or you want to be able to accept any kind of values, so Real, Number or Any are more appropriate. I’m not saying that there are no situations where Union is appropriate, but so far I haven’t found convincing examples to support generalizing the approach used for Nothing and Missing. Do you have ideas?


#8

Small composites are expensive to allocate; better make them bitstype and stuff them into a flat array, using e.g. Int32-offsets as pointers (e.g.: a tree where nodes are <64 bytes; bonus: I control spatial cache locality and can experiment with stuff like Van Emde Boas layouts).

If I have multiple types of nodes, with comparable sizes, then I can make it a Vector{Union{node_types...}}. If I encode the type into the pseudo-pointer, then I would need to use unsafe_load to access without consulting type-tags; but it would be quite convenient if I also could use the built-in multiple dispatch plus union tags for easier-to-read code.

That being said, I am not sure I would ever collect on such an array.

Basically, I was thinking about Cap’n Proto and how interlinked structures with small members currently kinda suck in julia (compared to C).

edit: A possible compromise could be threshold of two, as a const union_thresh=2, with the remark that an upgrade of this constant is an expected minor breaking change (that is; there are breaking changes that need major porting, and there are changes that are breaking in principle but unobserved in 99% of the code). Regardless, I now understand why it currently works the way it does (and have no objections from the peanut gallery).