I want to write a function g(w, x...)
with the following properties: If none of the arguments following w
is a vector, then the tuple x
is appended to w
(a vector). For each of the other vector arguments, the function should loop over it, replacing the vector in turn by each of its elements. (This is a toy example.) For instance,
julia> w = Vector{NTuple{3,Int}}()
julia> g(w, [1,2],3,[4,5])
4-element Vector{Tuple{Int64, Int64, Int64}}:
(1, 3, 4)
(1, 3, 5)
(2, 3, 4)
(2, 3, 5)
Instead of writing a @generated
function that produces the right number of loops for all possible inputs, I’m trying to have the compiler generate efficient code via multiple dispatch:
f(w, t) = push!(w, t)
f(w, t, x, y...) = f(w, (t..., x), y...)
f(w, t, v::Vector, y...) = for x in v ; f(w, t, x, y...) end
g(w, y...) = begin f(w, (), y...); w end
However, this leads to more allocations than a hard-coded version. For computing
v = [1,2]
w = Vector{NTuple{3,Int}}()
g(w, v,v,v) # as above
w = Vector{NTuple{3,Int}}()
g0(w, v,v,v) # hard-coded, see below
here is the result of using --track-allocation=user
:
- f(w, t) = push!(w, t)
48 f(w, t, x, y...) = f(w, (t..., x), y...)
336 f(w, t, v::Vector, y...) = for x in v ; f(w, t, x, y...) end
48 g(w, y...) = begin f(w, (), y...); w end
- function g0(w, v1::Vector, v2::Vector, v3::Vector)
0 for x1 in v1, x2 in v2, x3 in v3
208 push!(w, (x1,x2,x3))
0 end
0 w
- end
Where do the additional allocations come from and how can they be avoided?