Why are unions of concrete types not concrete?

I am looking to use Julia for a high performance application where I need to iterate through many items. The items can be of various types, but those types are all concrete. I can make a union of those types. It appears to me that Julia is optimizing this data structure and is using C unions to make access fast. Moreover, if I allocate a vector with a union with concreate types using the undef constructor I get a vector with data (which indicates to me that Julia is using C unions and not storing locations to pointers). However, if I check if a vector of unions isbits it returns false. Same for isconcretetype. Why is this the case?

julia> using InlineStrings

julia> #show that it is allocated when using undef
       println(Vector{Union{Int64, String31}}(undef, 10))
Union{Int64, String31}[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

julia> println(isconcretetype(Union{Int64, String31}))

julia> println(isbitstype(Union{Int64, String31}))

julia> println(isbits(Vector{Union{Int64, String31}}))

Have you tried this?

isconcretetype(Vector{Union{Int64, String31}})
1 Like

Hello and welcome!

Don’t worry, your code probably works as expected. The problem is what you assume of your checks shown here. Let’s take it apart:

  1. Quoting the manual of isconcretetype: Determine whether type T is a concrete type, meaning it could have direct instances (values x such that typeof(x) === T). No specific value (either 12 or String31("foo")) will have the type Union{Int64, String31}. Either one, yes, but not the union (e.g., typeof(12) == Int64). So a union is never a concrete type.
  2. For the same reason, it will never be a bitstype. Its instances, however, can be values of bitstypes, just like 12 and String31("foo") are bits values:
julia> isbits(12) && isbits(String31("foo"))
  1. As for the last test, a type (Vector{Union{Int64, String31}} is just a type) is never a bits value, no wonder it returns false. The individual items in a vector of such type can and will be bits values:
julia> v = Vector{Union{Int64, String31}}(undef, 10);
julia> all(isbits, v)

So overall, you don’t need to worry about the performance of your code when using such a vector. You just need to be more careful with your checks :slight_smile:



Hey HTH! A follow up to this would be how could I compress such a vector. Previously I was using Blosc to compress large vectors for file storage but now with the Union type I can no longer do that as the union type is not an isbits typel

Keep in mind that the performance of working with unions depends on the optimizations that the compiler performs, which can vary a lot depending on the size of the union (and the Julia version).

Working with small unions can be very fast, but AFAIK large unions almost always incur a loss of performance. This is for example the time it takes to sum 1000 floats, displayed as a function of the element type of the vector they are stored in:

                                  ┌                                        ┐ 
                          Float64 ┤ 8.39198e⁻⁷                               
               Union{T1, Float64} ┤ 8.89393e⁻⁷                               
           Union{T1, T2, Float64} ┤ 8.9998e⁻⁷                                
       Union{T1, T2, T3, Float64} ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 7.8316e⁻⁵   
   Union{T1, T2, T3, T4, Float64} ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 7.9389e⁻⁵   
                                  └                                        ┘

So it seems that in this instance, working with unions of 3 types or less is optimized. (This is with Julia 1.8.5; I seem to remember that not so long ago only unions of 2 types were optimized).

Complete benchmarking code

x = rand(1000);

function my_sum(xs)
res = 0.0
for x in xs
res += x

struct T1 end
struct T2 end
struct T3 end
struct T4 end
struct T5 end

types = [
Union{Float64, T1},
Union{Float64, T1, T2},
Union{Float64, T1, T2, T3},
Union{Float64, T1, T2, T3, T4},

using BenchmarkTools
times = map(types) do t
y = t
append!(y, x)
@belapsed my_sum($y)

using UnicodePlots
barplot(string.(types), times)


Regarding the compression:

Maybe you could try JLSO instead

julia> v = Vector{Union{Int, String31}}(undef, 1000000);

julia> JLSO.save("large_vector.dat", :v => v);

julia> Base.summarysize(v)

julia> filesize("large_vector.dat")

julia> JLSO.load("large_vector.dat")[:v] == v

# Or keeping it in memory
julia> buf = IOBuffer();

julia> JLSO.save(buf, :v => v);

julia> JLSO.load(seekstart(buf))[:v] == v

Note that the giant compression ratio in this case is just because the vector with undefs contains all zeros :sweat_smile:

Thank you so much for the recommendation! I’ll give it a shot now and will report back.

1 Like