# Allocation with Union Nothing

Hello,

I discovered this behavior that created some weird allocations in my code. I feel like I get it now, but I would be interested in some relevant comments/explanations.

``````f_allocates(x) = isempty(x) ? nothing : (0, x)
f_does_not_allocate(x) = isempty(x) ? nothing : x
f_does_not_allocate_either(x) = (0, x)
``````

The first one allocates when x is heap allocated.
The @code_warntype is not very explicit (or do I read it wrong?). The respective outputs are,

``````Variables
#self#::Core.Const(f_allocates)
x::Vector{Int64}

Body::Union{Nothing, Tuple{Int64, Vector{Int64}}}
1 β %1 = Main.isempty(x)::Bool
βββ      goto #3 if not %1
2 β      return Main.nothing
3 β %4 = Core.tuple(0, x)::Core.PartialStruct(Tuple{Int64, Vector{Int64}}, Any[Core.Const(0), Vector{Int64}])
βββ      return %4
``````
``````Variables
#self#::Core.Const(f_does_not_allocate)
x::Vector{Int64}

Body::Union{Nothing, Vector{Int64}}
1 β %1 = Main.isempty(x)::Bool
βββ      goto #3 if not %1
2 β      return Main.nothing
3 β      return x
``````
``````Variables
#self#::Core.Const(f_does_not_allocate_either)
x::Vector{Int64}

Body::Tuple{Int64, Vector{Int64}}
1 β %1 = Core.tuple(0, x)::Core.PartialStruct(Tuple{Int64, Vector{Int64}}, Any[Core.Const(0), Vector{Int64}])
βββ      return %1
``````

My understanding is that, the output of `f_allocates` has to be heap allocated to count one more reference to its input. But this is not true for `f_does_not_allocate_either` and `f_does_not_allocate` thanks to some compiler optimizations. Is that correct?

How do those optimization work? I would be curious to understand if there is an intrinsic reason that makes `f_allocates` non allocation free.

Thanks,

3 Likes

This is a good question, and hereβs my understanding:

If `x` is a heap-allocated that means itβs garbage-collected. And that the garbage collector needs to keep a track of all the references that might point to it to ensure it doesnβt trash something that can still be reachedβ¦ and it used to be that the only way for the GC to keep tabs on such a reference was to make the container GCβable by heap allocating it. The compiler team recently changed this to allow putting such structs (including tuples) on the stack in 1.5 or so. But you canβt use the stack for something that returns from function!

But, Matt, you say, why doesnβt `f_does_not_allocate_either ` β returning `(0, x)` β allocate then? Ohhh, but it does!

``````julia> x = [1]
1-element Vector{Int64}:
1

julia> @allocated f_does_not_allocate_either(x)
32
``````

Itβs tricky to measure these things; I bet the compiler is being overly zealous in optimizing away the benchmark itself in such a trivial case.

And then it turns out that `f_allocates` wonβt allocate if itβs inside another function that works directly with that return value without returning it itself.

8 Likes

Itβs worth emphasizing that βdoes this function allocate?β or βdoes this function optimize nicely?β often canβt be answered without knowing how the function will be called.

This is because the compiler is very happy to do a mixture of constant propagation, inlining, and extra type inference if some of the function arguments are constant or of known type. This can easily result in eliding allocations and boxing of variables which might otherwise be required.

Implementing `getproperty` is a good example. Consider an example like this where we add a synthetic `y` property to a structure `A`:

``````struct A
x::Int
end

function Base.getproperty(a::A, fieldname::Symbol)
if fieldname === :y
x = getfield(a, :x)
sqrt(x)
else
getfield(a, fieldname)
end
end
``````

If we look at the `@code_warntype` for `getproperty`, itβll show a bunch of branches and a Union output type, which seems like quite a mess. Itβs not at all what you want for simple field access!

``````julia> @code_typed getproperty(A(2), :x)
CodeInfo(
1 β %1  = (fieldname === :y)::Bool
βββ       goto #7 if not %1
2 β %3  = Main.getfield(a, :x)::Int64
β   %4  = Base.sitofp(Float64, %3)::Float64
β   %5  = Base.lt_float(%4, 0.0)::Bool
βββ       goto #4 if not %5
3 β       invoke Base.Math.throw_complex_domainerror(:sqrt::Symbol, %4::Float64)::Union{}
βββ       unreachable
4 β %9  = Base.Math.sqrt_llvm(%4)::Float64
βββ       goto #5
5 β       goto #6
6 β       return %9
7 β %13 = Main.getfield(a, fieldname)::Int64
βββ       return %13
) => Union{Float64, Int64}
``````

On the other hand, the `fieldname` is nearly always constant in the caller of `getproperty`, so this code misrepresents the actual code that will be generated in practice. Letβs define a wrapper function which uses `getproperty` in the typical way it will actually be used:

``````julia> foo(a) = a.x
foo (generic function with 1 method)

julia> @code_typed foo(A(2))
CodeInfo(
1 β %1 = Main.getfield(a, :x)::Int64
βββ      return %1
) => Int64
``````

Much better! The compiler has constant propagated the field name into `getproperty`, figured out that one of the branches is dead code, removed that branch and updated the type inference result to reflect the type of the `x` field.

Similarly for the access to the `y` property:

``````julia> bar(a) = a.y
bar (generic function with 1 method)

julia> @code_typed bar(A(2))
CodeInfo(
1 β %1 = Main.getfield(a, :x)::Int64
β   %2 = Base.sitofp(Float64, %1)::Float64
β   %3 = Base.lt_float(%2, 0.0)::Bool
βββ      goto #3 if not %3
2 β      invoke Base.Math.throw_complex_domainerror(:sqrt::Symbol, %2::Float64)::Union{}
βββ      unreachable
3 β %7 = Base.Math.sqrt_llvm(%2)::Float64
βββ      goto #4
4 β      goto #5
5 β      goto #6
6 β      return %7
) => Float64
``````
7 Likes