Why do large NTuples Allocate?

jg-854 · December 15, 2021, 1:44pm

I was surprised to find that generating Tuples via generators causes allocations for large generators…

Using BenchmarkTools 

@btime NTuple{5,Float64}(1.0 for i=1:5)
  1.500 ns (0 allocations: 0 bytes)

@btime NTuple{50,Float64}(1.0 for i=1:50)
  1.633 μs (52 allocations: 1.70 KiB)

 @btime NTuple{500,Float64}(1.0 for i=1:500)
  15.500 μs (502 allocations: 15.88 KiB)

I would’ve thought each of the cases would not heap-allocate as the compiler knows both the length of the Tuple as well as the type of its elements. I am hoping to use this for a real-time processing application and ideally I would like 0 heap-allocations during processing.

I am using Julia 1.6.3 on an M1-MBP.

Thanks!

lmiq · December 15, 2021, 1:54pm

I’m not sure if I can answer the “why”, but for large collections as those the best option is to use preallocated vectors.

There is work in progress to allow for that pattern to be used more often: https://github.com/JuliaLang/julia/pull/41777

jg-854 · December 15, 2021, 1:57pm

Thank you for your reply! Yes, currently, I have been using StaticArrays (specifically SVectors), but the documentation mentions that they are not useful for vectors of length > 100. This is quite a restriction for my application.

lmiq · December 15, 2021, 1:59pm

Indeed, static arrays are wrappers around tuples, so the problem is the same. The standard way to deal with large collections is preallocation.

stevengj · December 15, 2021, 2:30pm

For large arrays, you are better off with Array anyway. (The whole point of StaticArray is to inline and unroll code for small vectors — this is counterproductive if the vectors are not small.)

jg-854 · December 15, 2021, 2:36pm

I see, though isn’t Array heap-allocated? I am/was under the impression it can never be pre-allocated on the stack. For real-time purposes, heap-allocation isn’t ideal.

lmiq · December 15, 2021, 3:13pm

Yes, array is heap-allocated, but the problem is having arrays being heap allocated and garbage collected all the time in functions critical for performance, and loops. If you preallocate the arrays and just use them in the critical parts of the code, that is fine.

Playing a little. In the example below, the variable under study is the aux array, which in one case will be a static array, and in the other just a standard array which is preallocated.

Using a static array, using a trick to allow the compiler to optimize for the size of the array to be generated, which is a variable of the function:

julia> using StaticArrays
       function g(x,::Val{N}) where N
           aux = zeros(SVector{N,Float64})
           for i in 1:length(x)-N # critical loop
               aux += @view(x[i:i+N-1])
           end
           return sum(aux)
       end
g (generic function with 3 methods)

julia> @btime g($x,Val(10));
  25.066 μs (0 allocations: 0 bytes)

julia> @btime g($x,Val(100));
  74.327 μs (0 allocations: 0 bytes)

julia> @btime g($x,Val(1000));
  848.970 μs (0 allocations: 0 bytes)

julia> @btime g($x,Val(5000));
  5.077 ms (0 allocations: 0 bytes)

So far so good, no allocations happening anywhere, but you will note the huge compiler time taken for the largest values of N.

Now let us simplify things and just us an allocated aux:

julia> function f(x,n)
           aux = zeros(n)
           for i in 1:length(x)-n # critical loop
               aux .+= @view(x[i:i+n-1])
           end
           return sum(aux)
       end
f (generic function with 2 methods)

julia> @btime f($x,10);
  87.694 μs (1 allocation: 144 bytes)

julia> @btime f($x,100);
  137.613 μs (1 allocation: 896 bytes)

julia> @btime f($x,1000);
  956.585 μs (1 allocation: 7.94 KiB)

julia> @btime f($x,5000);
  3.895 ms (2 allocations: 39.11 KiB)

There are allocations now (of course, because aux is being allocated). There is an important performance difference for small sizes, but for larger sizes the performance of the heap allocated array can be better (in this example the tradeoff is above 1000, but that is not typical I think). There is no compilation overhead in this second case either.

Topic		Replies	Views
Static array or tuple allocate at construction Performance memory-allocation , staticarrays , ntuple	2	253	March 11, 2024
Why StaticArrays are not recommended for large Arrays? General Usage question	2	565	November 13, 2019
Allocations in construction of composite type with StaticArray field Performance question	2	543	July 18, 2018
Allocation with StaticArrays New to Julia memory-allocation , staticarrays	19	698	September 15, 2021
Memory allocations when converting from NamedTuples to DataFrame Performance dataframes	4	880	July 17, 2020

Why do large NTuples Allocate?

Related topics