Why does `Union{Int,...}` in a `Vector` not cause allocations but `Union{Float64,...}` does?

I have a custom missing type which stores a symbol and a reference to an object whose type is not predetermined, so it’s an Any field. I need to store vectors of these, some values will be missing but most won’t be. Here’s a test function that looks at the allocations that happen when writing values and missings into such a vector.

The peculiar thing to me is that when I write ten missings unioned with Int, I get around 10 allocations, so one per missing. But if I union with Float64, every value in the vector needs an allocation. Why is the storage behavior different if both Int and Float64 are primitives with 8 bytes size?

struct MyMissing
  location::Symbol
  reason
end

function missing_vector(T, n)
  v = Vector{Union{MyMissing,T}}(undef, n)
  for i in 1:n
    v[i] = i > 10 ? T(i) : MyMissing(:missing_vector, "under ten")
  end
  return count(x -> x isa MyMissing, v)
end
julia> @time missing_vector(Int, 100)
  0.000008 seconds (12 allocations: 1.219 KiB)
10

julia> @time missing_vector(Float64, 100)
  0.000018 seconds (102 allocations: 2.625 KiB)
10

Julia seems to have special codegen for boxing integers specifically: The Float64 calls ijl_gc_small_alloc, the Int case calls ijl_box_int64.

3 Likes

Does this mean that boxed integers aren’t counted for allocations?

Oh yeah I heard somewhere that Julia has a cache of small Ints pre-boxed as an optimisation. So in your code no allocation needs to happen for the Ints.

However if you replace the code with T(100000+i) or make the loop bigger the difference should go away.

5 Likes

Yeah, indeed the difference goes away when bigger integers are used:

julia> function missing_vector(T, n)
         v = Vector{Union{MyMissing,T}}(undef, n)
         for i in 10000000:n+10000000-1
           v[i - 10000000 + 1] = i > 10 ? T(i) : MyMissing(:missing_vector, "under ten")
         end
       end
missing_vector (generic function with 1 method)

julia> @time missing_vector(Int, 100)
  0.000003 seconds (101 allocations: 2.438 KiB)

Oh interesting, pre-boxed integers.. Well then I picked a really unfortunate example when I checked early on if my union approach was ok in terms of allocation behavior. I thought that julia could store unions inline with type tags but maybe only if they are all isbits.

But overall it’s still very fast even allocating millions of these small values. So practically it probably doesn’t matter in my case.

As always, profile realistic uses of your code to identify the actual bottlenecks before trying to optimise anything.

Vector does have a more efficient storage for unions than boxing, but only for really small unions like Union{Nothing,Int}. I think it internally stores just the Ints inline plus has a BitVector to store which ones are actually Nothing.

That’s only for small unions where every subtype is isbits. It would be really nice to have it for pointer-ful (i.e. GC-tracked) types also. I believe that would enable implementing something like smallvec.

A related limitation is that a function that returns a union with a pointer-ful type can’t store its return values on the stack, but must heap-allocate it. This can be a problem if you lean heavily into error return types such that your code often returns unions.

I’m not sure whether these two limitations (Memory storage versus ABI for pointerful unions) are the same.

You might be interested in this package by @nalimilan