Why is the array version allocating, whereas the tuple version is not?

Consider the following two functions with the following difference:

In test1, Iterators.product is called with a Tuple... and in test2, it is called with a Vector.... Benchmarking shows that test1 does not allocate any memory, whereas test2 does. Can someone kindly help me with this?

using BenchmarkTools
using FFTW
function test1(N)
    kiter = Iterators.product((rfftfreq(N,N), rfftfreq(N,N))...)
    for k in kiter
        if (k[1] < 0)
            print("hello")
        end
    end
end
function test2(N)
    kiter = Iterators.product([rfftfreq(N,N), rfftfreq(N,N)]...)
    for k in kiter
        if (k[1] < 0)
            print("hello")
        end
    end
end

test1(8)
test2(8)
@benchmark test1(512)
@benchmark test2(512)

Don’t you actually just want Iterators.product(rfftfreq(N,N), rfftfreq(N,N))

In this case, yes. But say if I want to call it in D dimensions, then I would do something like Iterators.product(kvec...) where I build kvec = (k1,k2,...,kD) in a loop.

Any help on this?

If you check the output of @code_warntype, you’ll find

julia> @code_warntype test2(2)
MethodInstance for test2(::Int64)
  from test2(N) in Main at REPL[5]:1
Arguments
  #self#::Core.Const(test2)
  N::Int64
Locals
  @_3::Union{Nothing, Tuple{Any, Union{Bool, Tuple}}}
  kiter::Base.Iterators.ProductIterator
  k::Any
Body::Nothing
1 ─ %1  = Base.getproperty(Main.Iterators, :product)::Core.Const(Base.Iterators.product)
│   %2  = Main.rfftfreq(N, N)::Frequencies{Float64}
│   %3  = Main.rfftfreq(N, N)::Frequencies{Float64}
│   %4  = Base.vect(%2, %3)::Vector{Frequencies{Float64}}
│         (kiter = Core._apply_iterate(Base.iterate, %1, %4))
│   %6  = kiter::Base.Iterators.ProductIterator
│         (@_3 = Base.iterate(%6))
│   %8  = (@_3 === nothing)::Bool
│   %9  = Base.not_int(%8)::Bool
└──       goto #6 if not %9
2 ┄ %11 = @_3::Tuple{Any, Union{Bool, Tuple}}
│         (k = Core.getfield(%11, 1))
│   %13 = Core.getfield(%11, 2)::Union{Bool, Tuple}
│   %14 = Base.getindex(k, 1)::Any
│   %15 = (%14 < 0)::Any
└──       goto #4 if not %15
3 ─       Main.print("hello")
4 ┄       (@_3 = Base.iterate(%6, %13))
│   %19 = (@_3::Union{Nothing, Tuple{Any, Tuple{Any, Vararg{Any}}}} === nothing)::Bool
│   %20 = Base.not_int(%19)::Bool
└──       goto #6 if not %20
5 ─       goto #2
6 ┄       return nothing

so the types of k and kiter are not being inferred, and also there’s the allocation of the array in the first place. You may try this instead:

julia> function test3(N, ndims) 
           kiter = Iterators.product(ntuple(_ -> rfftfreq(N,N), ndims)...)
           for k in kiter
               if (k[1] < 0)
                   print("hello")
               end
           end
       end
test3 (generic function with 2 methods)

This should be almost as fast as test1 if the number of dimensions is known at compile time:

julia> @btime test1(10);
  38.884 ns (0 allocations: 0 bytes)

julia> @btime test3(10, Val(2));
  49.298 ns (0 allocations: 0 bytes)

Otherwise, if the number of dimensions is only known at runtime, it should be about as fast as test2:

julia> @btime test2(10);
  15.831 μs (149 allocations: 5.92 KiB)

julia> @btime test3(10, 2);
  15.756 μs (149 allocations: 5.88 KiB)
1 Like

so the types of k and kiter are not being inferred, and also there’s the allocation of the array in the first place. You may try this instead:

Thank you so much for pointing this and the ntuple() method out. I understand it from the output of @code_warntype, but can you explain why it leads to allocation? Or point me to a resource where I can read about it?

Essentially, allocation usually happens if types aren’t concretely inferred, which is what happens in test2. Using the length as a compile-time constant helps avoid this.