How to tell if a type is an efficient small type union?


#1

Vector{Union{Missing, Int}} is efficiently represented, but Vector{Union{Int, Float64}} isn’t. Is there a predicate to distinguish between the two, similar to isconcretetype?


#2

How are you determining is/isnot “efficiently represented”?


#3

I was under the impression that the optimizations for isbits Unions apply to Vector{Union{Int, Float64}} too, since both Int and Float64 are bits types. Is this not the case?


#4

It is the case. Missing and Nothing behave a little differently in those Unions if they are present as missing or nothing in a vector (see below) – but not if they are in the Union type and not present in the vector. That’s why I asked “How is this being determined”?

julia> Base.summarysize(Union{Missing,Int32}[1, 2])
56

julia> Base.summarysize(Union{Missing,Int32}[1, missing])
52

julia> Base.summarysize(Union{Nothing,Int32}[1, nothing])
52

#5

TIL, thank you for pointing me to the docs. So I suppose it boils down to checking if all the unioned types are isbits?

Maybe the more straight-forward question is: how do I know if Array{T} is represented internally as an array of pointers?


#6

You mean how whether it should be, or to introspect what happens for a particular T in practice?

I think the current situation is that for the union of two (or more?) bits types, it is guaranteed to be represented efficiently, but this is an implementation detail and may be expanded later on.


#7

Right, but for context, I’d like to fix this performance warning: https://github.com/JuliaMath/Interpolations.jl/issues/287


#8

Base.isbitsunion(T)


#9

It also seems, that unions are only optimized if they are over less then five types:

julia> f(a) = first(a)/2
f (generic function with 1 method)

julia> types = filter(isbitstype,subtypes(Signed))
5-element Array{Any,1}:
 Int128
 Int16 
 Int32 
 Int64 
 Int8  

julia> using BenchmarkTools

julia> for i in eachindex(types)
           T = Union{types[1:i]...}
           @show T
           @show Base.isbitsunion(T)
           arr = T[one(first(types))]
           println(@btime f($arr))
       end
T = Int128
Base.isbitsunion(T) = false
  7.103 ns (0 allocations: 0 bytes)
0.5
T = Union{Int128, Int16}
Base.isbitsunion(T) = true
  9.050 ns (0 allocations: 0 bytes)
0.5
T = Union{Int128, Int16, Int32}
Base.isbitsunion(T) = true
  9.349 ns (0 allocations: 0 bytes)
0.5
T = Union{Int128, Int16, Int32, Int64}
Base.isbitsunion(T) = true
  8.787 ns (0 allocations: 0 bytes)
0.5
T = Union{Int128, Int16, Int32, Int64, Int8}
Base.isbitsunion(T) = true
  33.701 ns (2 allocations: 48 bytes)
0.5


#10

What is an efficient way to determine the number of constituents in a Union of concrete types? Or, better yet, get them as a tuple quickly.

just found the answer in a post by TPapp : Base.uniontypes(x)

how about

isfastunion(::Type{T}) where {U,T<:Union{U}} =
    Base.isbitsunion(T) && length(Base.uniontypes(T)) < 5

#11

Why is that?


#12

because if not 5 then 4or6or6ish – my guess:
it has to be < lbits_ina_byte for bitset speed without too much typeish mem overhead (which would be slowing overall). As we do not yet know how best to utilize the possibles, it was deemed smart to hold a bit or two in reserve while the better use comes to the fore. .there is more involved than altering a constant. tbd.