Zero allocation: tuple subsetting?

The following function allocates:

function  h()
     tup = (:a,:b,:c)
     return (getfield(tup, 1), getfield(tup, 3))
end;

See:

julia> @btime h()
  6.637 ns (1 allocation: 32 bytes)
(:a, :c)

I can’t workout how to make it not allocate.
I’ve had much more complex functions that didn’t allocate, like merging namedtuples.

This function AFAICT should be very ameniable to constant propergation.
It doesn’t even use any generic functions,
everything in it is built-ins,
as I understand it it is completely pure.

But for some reason I can’t get it to solve at compile time.

Symbols are not isbits, unfortunately. IIRC there’s a package by eschnetter which provides a work-around.

1 Like

Does getfield allocate?

As far as I understand it, it’s not getfield which allocates, it’s returning a non-isbits structure. However, note that through inlining, the compiler can sometimes omit the allocation. It’s why @view now has reasonable performance (sometimes?)

julia> @btime h()
  8.952 ns (1 allocation: 32 bytes)
(:a, :c)

julia> @btime first(h())
  1.843 ns (0 allocations: 0 bytes)
:a
3 Likes

To explain a bit more why I find it surprising that the above code allocates.

Here is some code that I think is far more complex,
that does not allocate:

julia> function findfirst_ind(names, name)
         ndims = length(names)
         this_namemap = NamedTuple{(name,), Tuple{Int}}((0,))  # 0 is default

         full_namemap=NamedTuple{names, NTuple{ndims, Int}}(1:ndims)
         return first(merge(this_namemap, full_namemap))
       end
findfirst_ind (generic function with 1 method)

julia> @btime findfirst_ind((:a, :b), (:b))
  0.029 ns (0 allocations: 0 bytes)
2

julia> @btime findfirst_ind((:a, :b), (:c))
  0.029 ns (0 allocations: 0 bytes)
0

Notice that it needs to construct a tuple (:a, :b, :c) in order to run the second, yet it still doesn’t allocate.

I guess a key difference is that it never returns that tuple.
I guess it is fair enough that functions which return a result must allocate that result.

Except if I remove the first from the above, well the following also does not allocate:

julia> function foo(names, name)
         ndims = length(names)
         this_namemap = NamedTuple{(name,), Tuple{Int}}((0,))  # 0 is default

         full_namemap=NamedTuple{names, NTuple{ndims, Int}}(1:ndims)
         return merge(this_namemap, full_namemap)
       end
foo (generic function with 1 method)

julia>

julia> @btime foo((:a, :b), (:c))
  0.028 ns (0 allocations: 0 bytes)
(c = 0, a = 1, b = 2)

julia> @btime foo((:a, :b), (:b))
  0.029 ns (0 allocations: 0 bytes)
(b = 2, a = 1)

Like has already been said, constructing the returned escaping non-isbits value is what is allocating. If this function gets inlined, the allocation will likely be elided.

That does not return a non-isbits value.

2 Likes

While this is true, the missing optimization is that the function could figure out that the return-value is Const immutable, allocate it during compilation and just return the pointer. For literals, this is done:

julia> h2()=:(:a, :c);
julia> @btime h2();
  2.205 ns (0 allocations: 0 bytes)

julia> @code_warntype h()
Body::Tuple{Symbol,Symbol}
1 ─ %1 = π ((:a, :b, :c), Core.Compiler.Const((:a, :b, :c), false))
│   %2 = (Main.getfield)(%1, 1)::Core.Compiler.Const(:a, false)
│   %3 = π ((:a, :b, :c), Core.Compiler.Const((:a, :b, :c), false))
│   %4 = (Main.getfield)(%3, 3)::Core.Compiler.Const(:c, false)
│   %5 = (Core.tuple)(%2, %4)::Core.Compiler.Const((:a, :c), false)
└──      return %5

julia> @code_warntype h2()
Body::Tuple{Symbol,Symbol}
1 ─     return (:a, :c)

We see that the compiler is aware that h() is a constant; it just fails to push its allocation to compile-time:

julia> unsafe_load(convert(Ptr{Ptr{Nothing}}, pointer([h()])))
Ptr{Nothing} @0x00007fa5dd357210

julia> unsafe_load(convert(Ptr{Ptr{Nothing}}, pointer([h()])))
Ptr{Nothing} @0x00007fa5ddb2edd0

julia> unsafe_load(convert(Ptr{Ptr{Nothing}}, pointer([h2()])))
Ptr{Nothing} @0x00007fa5ddb3afd0

julia> unsafe_load(convert(Ptr{Ptr{Nothing}}, pointer([h2()])))
Ptr{Nothing} @0x00007fa5ddb3afd0

So this clearly is a missed optimization opportunity. I doubt that this is high-priority, though.

1 Like

Ok, I see that my function h (revised to take an arguement as h1)
does indeed not allocated if it is working on bitstypes.

function h1(tup)
  return (getfield(tup, 1), getfield(tup, 3))
end;

julia> @btime h1((10,20,30))
  0.029 ns (0 allocations: 0 bytes)
(10, 30)

julia> @btime h1((:a, :b, :c))
  6.717 ns (1 allocation: 32 bytes)
(:a, :c)

If we are willing to be convoluted, this can be made nonallocating by putting everything into type params.
Putting things into typeparams just to make them isbits feels odd,
and I think is only doable with Symbols?

function  h_t(tup)
     return Val{(getfield(tup, 1), getfield(tup, 3))}
end;

h_i(::Type{Val{L}}) where L = L
h2(tup) = h_i(h_t(tup))


julia> @btime h2((10,20,30))
  0.029 ns (0 allocations: 0 bytes)
(10, 30)

julia> @btime h2((:a, :b, :c))
  1.246 ns (0 allocations: 0 bytes)
(:a, :c)

In the microbench mark it works out to be faster.
But I am dubious as to if it will in practice.
The @code_typed for h2 is just a worse version of that for h1)

julia> @code_typed h1((:a, :b, :c))
CodeInfo(
1 ─ %1 = (Main.getfield)(tup, 1)::Symbol
│   %2 = (Main.getfield)(tup, 3)::Symbol
│   %3 = (Core.tuple)(%1, %2)::Tuple{Symbol,Symbol}
└──      return %3
) => Tuple{Symbol,Symbol}

julia> @code_typed h2((:a, :b, :c))
CodeInfo(
1 ─ %1 = (Main.getfield)(tup, 1)::Symbol
│   %2 = (Main.getfield)(tup, 3)::Symbol
│   %3 = (Core.tuple)(%1, %2)::Tuple{Symbol,Symbol}
│   %4 = (Core.apply_type)(Main.Val, %3)::Type{Val{_1}} where _1
│   %5 = (Main.h_i)(%4)::Any
└──      return %5
) => Any

but it seems inpractice that h2 is actually better.

julia> @btime (()->hash(h1((:a, :b, :c))))()
  13.260 ns (1 allocation: 32 bytes)
0x3098f6b4cf50f6cc

julia> @btime (()->hash(h2((:a, :b, :c))))()
  8.031 ns (0 allocations: 0 bytes)
0x3098f6b4cf50f6cc
1 Like

I benchmarked the above code @oxinabox and it seems fixed. Your code on version 1.9.2, both the h1 and h2 functions runs 0 allocation and same speed.

julia> @btime h1((10,20,30))
  1.242 ns (0 allocations: 0 bytes)
(10, 30)

julia> @btime h1((:a, :b, :c))
  1.242 ns (0 allocations: 0 bytes)
(:a, :c)

julia> @btime h2((10,20,30))
  1.232 ns (0 allocations: 0 bytes)
(10, 30)

julia> @btime h2((:a, :b, :c))
  1.242 ns (0 allocations: 0 bytes)
(:a, :c)

I just stumbled uppon this code from: https://github.com/invenia/NamedDims.jl/blob/cd36392e440b339ae85966646554a7b7f0fc199b/src/name_core.jl#L30
Thought I note that this issue changed.

julia> function  h()
            tup = (:a,:b,:c)
            return (getfield(tup, 1), getfield(tup, 3))
       end;

julia> using BenchmarkTools

julia> @btime h()
  1.958 ns (0 allocations: 0 bytes)
(:a, :c)

julia> isbits((:a, :b))
false

I’m really curious to hear what changed since 2019. Why is this no longer allocating? Is it the way BenchmarkTools works?

julia> @allocated xx = h()
0

julia> xx
(:a, :c)

How is this even possible to be non-allocating if it’s not isbits??