Where do these allocations come from?

I want to write a function g(w, x...) with the following properties: If none of the arguments following w is a vector, then the tuple x is appended to w (a vector). For each of the other vector arguments, the function should loop over it, replacing the vector in turn by each of its elements. (This is a toy example.) For instance,

julia> w = Vector{NTuple{3,Int}}()
julia> g(w, [1,2],3,[4,5])
4-element Vector{Tuple{Int64, Int64, Int64}}:
 (1, 3, 4)
 (1, 3, 5)
 (2, 3, 4)
 (2, 3, 5)

Instead of writing a @generated function that produces the right number of loops for all possible inputs, I’m trying to have the compiler generate efficient code via multiple dispatch:

f(w, t) = push!(w, t)
f(w, t, x, y...) = f(w, (t..., x), y...)
f(w, t, v::Vector, y...) = for x in v ; f(w, t, x, y...) end
g(w, y...) = begin f(w, (), y...); w end

However, this leads to more allocations than a hard-coded version. For computing

v = [1,2]
w = Vector{NTuple{3,Int}}()
g(w, v,v,v)   # as above
w = Vector{NTuple{3,Int}}()
g0(w, v,v,v)  # hard-coded, see below

here is the result of using --track-allocation=user:

        - f(w, t) = push!(w, t)
       48 f(w, t, x, y...) = f(w, (t..., x), y...)
      336 f(w, t, v::Vector, y...) = for x in v ; f(w, t, x, y...) end
       48 g(w, y...) = begin f(w, (), y...); w end

        - function g0(w, v1::Vector, v2::Vector, v3::Vector)
        0     for x1 in v1, x2 in v2, x3 in v3
      208         push!(w, (x1,x2,x3))
        0     end
        0     w
        - end

Where do the additional allocations come from and how can they be avoided?

push! may need to allocate memory, because the input vector may not have enough space reserved. There’s n way around ultimately allocating that memory. If you know the target size ahead of time, you can suggest that the array holds enough space reserved with sizehint!.

This type of allocation is the same for g and g0 because they push the same number of elements of the same type. This should account for the 208 allocated bytes one can see for g0. But where do the 224 additional bytes for g come from?

It’s possible that these come from the allocation of intermediaries and dispatch.

https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing

1 Like

Thanks, that’s it! For the record, here is a version without additional allocations:

f(w, t) = push!(w, t)
f(w, t, x, y::Vararg{Any,N}) where N = f(w, (t..., x), y...)
f(w, t, v::Vector, y::Vararg{Any,N}) where N = for x in v ; f(w, t, x, y...) end
g(w, y::Vararg{Any,N}) where N = begin f(w, (), y...); w end