Allocations from selecting fields of a named tuple

CameronBieganek · December 27, 2023, 9:14pm

There seems to be a large difference in the number allocations when selecting fields from a named tuple via two different approaches.

using BenchmarkTools
nt = (a=1, b=2, c=3)
t = (:b, :c)

If I use getindex, I get 6 allocations:

julia> @btime $nt[$t];
  478.969 ns (6 allocations: 224 bytes)

But if I use one of the NamedTuple constructors, I get zero allocations:

julia> @btime NamedTuple{(:b, :c)}($nt);
  2.551 ns (0 allocations: 0 bytes)

Does anyone know why there is a difference in the number of allocations between these two approaches?

(I also tried wrapping the nt[t] in a function, but it didn’t seem to make a difference.)

Version Info

julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 4 × Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 4 virtual cores

bertschi · December 27, 2023, 9:46pm

Seems to be due to constant-propagation:

@less nt[t] shows that getindex is implemented as

@inline getindex(t::NamedTuple, idxs::Tuple{Vararg{Symbol}}) = NamedTuple{idxs}(t)

nt[(:b, :c)] – instead of nt[t] – does not allocate

julia> @btime $nt[(:b, :c)];
   4.207 ns (0 allocations: 0 bytes)

whereas the following also allocates:

julia> @btime NamedTuple{$t}($nt);
   775.243 ns (6 allocations: 224 bytes)

Albert_de_montserrat · December 28, 2023, 11:41am

If you wrap it all in a function, it doesn’t show any allocations neither

julia> function foo()
           nt = (a=1, b=2, c=3)
           t = (:b, :c)
           nt[t]
       end
foo (generic function with 1 method)

julia> @btime foo()
  0.791 ns (0 allocations: 0 bytes)
(b = 2, c = 3)

Benny · December 28, 2023, 12:32pm

<1ns timings suggest the code was ran at compile-time or lifted out of the benchmark. It’s the former:

julia> @code_llvm foo()
;  @ REPL[14]:1 within `foo`
define void @julia_foo_896([2 x i64]* noalias nocapture sret([2 x i64]) %0) #0 {
top:
;  @ REPL[14]:4 within `foo`
  %1 = bitcast [2 x i64]* %0 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(16) %1, i8* noundef nonnull align 8 dereferenceable(16) bitcast ([2 x i64]* @_j_const1 to i8*), i64 16, i1 false)
  ret void
}

julia> @eval bar() = $(foo())
bar (generic function with 1 method)

julia> @code_lowered bar()
CodeInfo(
1 ─     return (b = 2, c = 3)
)

julia> @code_llvm bar()
;  @ REPL[20]:1 within `bar`
define void @julia_bar_920([2 x i64]* noalias nocapture sret([2 x i64]) %0) #0 {
top:
  %1 = bitcast [2 x i64]* %0 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(16) %1, i8* noundef nonnull align 8 dereferenceable(16) bitcast ([2 x i64]* @_j_const1 to i8*), i64 16, i1 false)
  ret void
}

oxinabox · December 28, 2023, 12:53pm

As @bertschi said, the difference is if it is constant folding or not.
What in particular the problem is, is unless the values of t are constant folded,
we can’t know the return type of the expression.
Since if t is (:a,) it is @NamedTuple{a::Int64} vs if it is (:a, :b) it is: @NamedTuple{a::Int64, b::Int64} etc.
And that means it is a type unstable expression and the compiler needs to allocate memory on the heap for this object of unknown size.
Where as if we do know the value, then it can be allocated on the stack (which doesn’t show up as an allocation at all).

This constant folding can be seen in @Albert_de_montserrat 's answer, because constant folding is only performed in functions – since the optimizer only runs in functions.
At that point though it also has a constant for the value on nt and so can just constant fold all the way to an answer. like @Benny said.

julia> function foo()
                 nt = (a=1, b=2, c=3)
                 t = (:b, :c)
                 nt[t]
             end
foo (generic function with 1 method)

julia> @code_llvm foo()
; Function Signature: foo()
;  @ REPL[1]:1 within `foo`
define void @julia_foo_1754([2 x i64]* noalias nocapture noundef nonnull sret([2 x i64]) align 8 dereferenceable(16) %sret_return) #0 {
top:
  %0 = bitcast [2 x i64]* %sret_return to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(16) %0, i8* noundef nonnull align 8 dereferenceable(16) bitcast ([2 x i64]* @"_j_const#1" to i8*), i64 16, i1 false)
  ret void
}

julia> using BenchmarkTools
Precompiling BenchmarkTools
  5 dependencies successfully precompiled in 11 seconds. 32 already precompiled.

julia> @btime foo()
  0.855 ns (0 allocations: 0 bytes)
(b = 2, c = 3)

If we pass it in as an interpolated value we can get the right timing:

julia> function bar(nt)
                 t = (:b, :c)
                 nt[t]
             end
bar (generic function with 1 method)

julia> @btime bar($(;a=1,b=2,c=3))
  1.498 ns (0 allocations: 0 bytes)
(b = 2, c = 3)

Topic		Replies	Views
Zero allocation: tuple subsetting? General Usage	9	987	July 20, 2023
How to avoid allocations when reading arbitrary keys from a NamedTuple? Performance memory-allocation , namedtuple , type-stability	8	1066	September 2, 2022
Reduce allocations when extracting values from NamedTuples Performance question , namedtuple	12	412	January 12, 2024
Reducing getindex allocation [edit: when using an array as index] Performance memory-allocation	17	828	April 7, 2022
Why is the NamedTuple slower? When/How would it be faster? Is it still allocated on the stack? Performance	6	245	February 14, 2025

Allocations from selecting fields of a named tuple

Related topics