How come this is non-allocating?

If found an example which challenges my understanding of what allocations are or are not. If
one creates a vector of tuples of different lengths, it is possible to mutate the smaller ones into larger ones without allocating memory:

julia> v = [ isodd(i) ? (0,0) : (0,0,0) for i in 1:1_000_000 ];

julia> function mutate_vec!(v)
         for i in 1:length(v)
           v[i] = (1,1,1)
         end
       end
mutate_vec! (generic function with 1 method)

julia> @btime mutate_vec!($v)
  371.502 μs (0 allocations: 0 bytes)

julia> v[1]
(1, 1, 1)

julia> v[2]
(1, 1, 1)

julia>

What does that mean? If that is possible, what is not??

All tuples are and continue to be on the stack? When one mutates a tuple for another tuple of the same size, it does not remain at the same place in the stack? Is the complete pile of tuples moved - a new one created and the older one discarded - on the stack?

2 Likes

Uhm… One part of the story is that the last element of the tuple becomes a Vararg, which allows the tuple to be changed in size. Otherwise, if the vector is created with tuples of the same length, one gets an error:

julia> v = [ (0,0), (0,0,0) ]
2-element Array{Tuple{Int64,Int64,Vararg{Int64,N} where N},1}:
 (0, 0)
 (0, 0, 0)

julia> v = [ (0,0), (0,0,0) ]
2-element Array{Tuple{Int64,Int64,Vararg{Int64,N} where N},1}:
 (0, 0)
 (0, 0, 0)

julia> v[1] = (0,0,0,0)
(0, 0, 0, 0)

julia> v = [ (0,0), (0,0) ]
2-element Array{Tuple{Int64,Int64},1}:
 (0, 0)
 (0, 0)

julia> v[1] = (0,0,0,0)
ERROR: MethodError: Cannot `convert` an object of type 
  Tuple{Int64,Int64} to an object of type 
  Tuple{}

There is also one point where it seems that the compiler gives up and allocates:

This allocates:

julia> v = [ (0,0), (0,0,0) ]
2-element Array{Tuple{Int64,Int64,Vararg{Int64,N} where N},1}:
 (0, 0)
 (0, 0, 0)

julia> function increase_tuple(v)
          v[1] = (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
       end
increase_tuple (generic function with 1 method)

julia> @btime increase_tuple($v)
  20.771 ns (1 allocation: 336 bytes)

While this does not:

julia> v = [ (0,0), (0,0,0) ]
2-element Array{Tuple{Int64,Int64,Vararg{Int64,N} where N},1}:
 (0, 0)
 (0, 0, 0)

julia> function increase_tuple(v)
          v[1] = (0,0,0,0)
       end
increase_tuple (generic function with 1 method)

julia> @btime increase_tuple($v)
  2.029 ns (0 allocations: 0 bytes)
(0, 0, 0, 0)

1 Like

My guess is that Vararg is an array underneath and has some pre-allocated memory even when nothing has been added to it. That’s probably a performance thing, create the array with some memory since the most likely next step would be to add elements to the array. Therefor the tuples can be increased without any memory allocations until that buffer fills up.

Something interesting:

julia> v=Vector{Int}()
Int64[]

julia> @btime push!(v, 1)
  20.269 ns (0 allocations: 0 bytes)
10470502-element Array{Int64,1}:

I highly doubt that array was able to hit 10 million elements without ANY allocations. On the other hand:

julia> @btime begin
           for i in 1:167
              push!(x, 3)
           end
       end setup=(x=Vector{Int}())
  991.043 ns (0 allocations: 2.80 KiB)

julia> @btime begin
           for i in 1:168
              push!(x, 3)
           end
       end setup=(x=Vector{Int}())
  1.110 μs (1 allocation: 3.24 KiB)
1 Like

Since eltype(v) is abstract (and not a small union), the underlying data is stored as an array of pointers to tuples. And since tuples are immutable, when you assign them all to (1,1,1) the compiler is allowed to store the same pointer for every element:

julia> v = [ isodd(i) ? (0,0) : (0,0,0) for i in 1:5 ];

julia> mutate_vec!(v)

julia> p = unsafe_wrap(Array, Ptr{Ptr{Int}}(pointer(v)), length(v))
5-element Array{Ptr{Int64},1}:
 Ptr{Int64} @0x000000011655e470
 Ptr{Int64} @0x000000011655e470
 Ptr{Int64} @0x000000011655e470
 Ptr{Int64} @0x000000011655e470
 Ptr{Int64} @0x000000011655e470

julia> unsafe_wrap(Array, p[1], 3)
3-element Array{Int64,1}:
 1
 1
 1

Furthermore, the compiler may not need to allocate the (1,1,1) tuple on the heap — since it is constant and immutable, it might just get stored in a data segment of the compiled object code.

10 Likes

Thank you very much! I keep getting tricked by the compilers, but I guess that is because they much smarter than me :slightly_smiling_face:

1 Like

To illustrate what is going on, it is useful to also consider two variations on your code.

First, try using v[i] = (i,i+1,i+2) instead of v[i] = (1,1,1) in mutate_vec! — if you assign a different tuple for each element, then it indeed requires one heap allocation per element.

Second, if you create v as Union{NTuple{2,Int},NTuple{3,Int}}[.....], then because the eltype is a small union the tuples get stored inline rather than as pointers to objects. In this case, there are no allocations in mutate_vec! even for v[i] = (i,i+1,i+2).

2 Likes

Yes, I have checked that mutation to different values allocates.

In the case of Unions, is the initial allocation majored to the largest type? Or something else (like new allocations on the stack) are happening?

Also thanks for pointing out some other tools like unsafe_wrap to explore these things in detail. I will play with those to try to answer my own doubts :slight_smile: