Allocation with Union Nothing

Hello,

I discovered this behavior that created some weird allocations in my code. I feel like I get it now, but I would be interested in some relevant comments/explanations.

f_allocates(x) = isempty(x) ? nothing : (0, x)
f_does_not_allocate(x) = isempty(x) ? nothing : x
f_does_not_allocate_either(x) = (0, x)

The first one allocates when x is heap allocated.
The @code_warntype is not very explicit (or do I read it wrong?). The respective outputs are,

Variables
  #self#::Core.Const(f_allocates)
  x::Vector{Int64}

Body::Union{Nothing, Tuple{Int64, Vector{Int64}}}
1 ─ %1 = Main.isempty(x)::Bool
└──      goto #3 if not %1
2 ─      return Main.nothing
3 ─ %4 = Core.tuple(0, x)::Core.PartialStruct(Tuple{Int64, Vector{Int64}}, Any[Core.Const(0), Vector{Int64}])
└──      return %4
Variables
  #self#::Core.Const(f_does_not_allocate)
  x::Vector{Int64}

Body::Union{Nothing, Vector{Int64}}
1 ─ %1 = Main.isempty(x)::Bool
└──      goto #3 if not %1
2 ─      return Main.nothing
3 ─      return x
Variables
  #self#::Core.Const(f_does_not_allocate_either)
  x::Vector{Int64}

Body::Tuple{Int64, Vector{Int64}}
1 ─ %1 = Core.tuple(0, x)::Core.PartialStruct(Tuple{Int64, Vector{Int64}}, Any[Core.Const(0), Vector{Int64}])
└──      return %1

My understanding is that, the output of f_allocates has to be heap allocated to count one more reference to its input. But this is not true for f_does_not_allocate_either and f_does_not_allocate thanks to some compiler optimizations. Is that correct?

How do those optimization work? I would be curious to understand if there is an intrinsic reason that makes f_allocates non allocation free.

Thanks,

3 Likes

This is a good question, and here’s my understanding:

If x is a heap-allocated that means it’s garbage-collected. And that the garbage collector needs to keep a track of all the references that might point to it to ensure it doesn’t trash something that can still be reached… and it used to be that the only way for the GC to keep tabs on such a reference was to make the container GC’able by heap allocating it. The compiler team recently changed this to allow putting such structs (including tuples) on the stack in 1.5 or so. But you can’t use the stack for something that returns from function!

But, Matt, you say, why doesn’t f_does_not_allocate_either β€” returning (0, x) β€” allocate then? Ohhh, but it does!

julia> x = [1]
1-element Vector{Int64}:
 1

julia> @allocated f_does_not_allocate_either(x)
32

It’s tricky to measure these things; I bet the compiler is being overly zealous in optimizing away the benchmark itself in such a trivial case.

And then it turns out that f_allocates won’t allocate if it’s inside another function that works directly with that return value without returning it itself.

9 Likes

To add to Matt’s answer a little…

It’s worth emphasizing that β€œdoes this function allocate?” or β€œdoes this function optimize nicely?” often can’t be answered without knowing how the function will be called.

This is because the compiler is very happy to do a mixture of constant propagation, inlining, and extra type inference if some of the function arguments are constant or of known type. This can easily result in eliding allocations and boxing of variables which might otherwise be required.

Implementing getproperty is a good example. Consider an example like this where we add a synthetic y property to a structure A:

struct A
    x::Int
end

function Base.getproperty(a::A, fieldname::Symbol)
    if fieldname === :y
        x = getfield(a, :x)
        sqrt(x)
    else
        getfield(a, fieldname)
    end
end

If we look at the @code_warntype for getproperty, it’ll show a bunch of branches and a Union output type, which seems like quite a mess. It’s not at all what you want for simple field access!

julia> @code_typed getproperty(A(2), :x)
CodeInfo(
1 ─ %1  = (fieldname === :y)::Bool
└──       goto #7 if not %1
2 ─ %3  = Main.getfield(a, :x)::Int64
β”‚   %4  = Base.sitofp(Float64, %3)::Float64
β”‚   %5  = Base.lt_float(%4, 0.0)::Bool
└──       goto #4 if not %5
3 ─       invoke Base.Math.throw_complex_domainerror(:sqrt::Symbol, %4::Float64)::Union{}
└──       unreachable
4 ─ %9  = Base.Math.sqrt_llvm(%4)::Float64
└──       goto #5
5 ─       goto #6
6 ─       return %9
7 ─ %13 = Main.getfield(a, fieldname)::Int64
└──       return %13
) => Union{Float64, Int64}

On the other hand, the fieldname is nearly always constant in the caller of getproperty, so this code misrepresents the actual code that will be generated in practice. Let’s define a wrapper function which uses getproperty in the typical way it will actually be used:

julia> foo(a) = a.x
foo (generic function with 1 method)

julia> @code_typed foo(A(2))
CodeInfo(
1 ─ %1 = Main.getfield(a, :x)::Int64
└──      return %1
) => Int64

Much better! The compiler has constant propagated the field name into getproperty, figured out that one of the branches is dead code, removed that branch and updated the type inference result to reflect the type of the x field.

Similarly for the access to the y property:

julia> bar(a) = a.y
bar (generic function with 1 method)

julia> @code_typed bar(A(2))
CodeInfo(
1 ─ %1 = Main.getfield(a, :x)::Int64
β”‚   %2 = Base.sitofp(Float64, %1)::Float64
β”‚   %3 = Base.lt_float(%2, 0.0)::Bool
└──      goto #3 if not %3
2 ─      invoke Base.Math.throw_complex_domainerror(:sqrt::Symbol, %2::Float64)::Union{}
└──      unreachable
3 ─ %7 = Base.Math.sqrt_llvm(%2)::Float64
└──      goto #4
4 ─      goto #5
5 ─      goto #6
6 ─      return %7
) => Float64
7 Likes