# Summing a vector of vectors in a non-allocating way

What’s going wrong here, I get a DimensionMismatch error when I run this:

``````input = repeat([ones(2)], 4)
output = zeros(2)
sum(input) # Gives a 2-element Vector{Float64}
sum!(output, input) # Gives a DimensionMismatch error
``````

The error is:

``````DimensionMismatch("reduction on array with indices (Base.OneTo(4),) with output with indices (Base.OneTo(2),)")
``````

I can write a fix function for my specific case:

``````function sum_fixed!(output, input::Vector{Vector{T}}, n, m) where T
for j in 1:m
for i in 1:n
output[j] += input[i][j]
end
end
end
``````

which is (sometimes) non-allocating:

``````@btime sum_fixed!(output, input, 4, 2)
34.747 ns (0 allocations: 0 bytes)
``````

but I’m still curious as to why the first doesn’t work and why my function sometimes allocates memory as in the case of my function in the context that I want to use it:

``````using BenchmarkTools

nagents = 4
f_αβ = reshape(repeat([[0.0, 0.0]], nagents * nagents), nagents, nagents)
sum_f_αβ = [0.0, 0.0]
F_α0 = repeat([[0.0, 0.0]], nagents)

@btime begin
for i = 1:\$nagents
sum_fixed!(\$sum_f_αβ, \$f_αβ[:, i], \$nagents, \$2) # 4 (nagents) allocations
@. \$F_α[i] = \$F_α0[i] + \$sum_f_αβ # 0 allocations
end
end
``````

which then gives

``````297.729 ns (4 allocations: 448 bytes)
``````

Can anyone tell me why there are 4 allocations in that loop?

For the first question, the `sum!` doc suggests that `input` should be transformed first:

``````sum!(output, hcat(input...))
``````

I would not do that (since I learned that last week ) because `hcat(input...)` also allocates quite a bit

``````julia> using BenchmarkTools

julia> input = repeat([ones(2)], 100000); # make it larger

julia> output = zeros(2);

julia> @btime sum!(\$output, hcat(\$input...))
1.884 ms (7 allocations: 3.05 MiB)
2-element Vector{Float64}:
100000.0
100000.0

julia> @btime sum!(\$output, reduce(hcat, input))
801.142 μs (2 allocations: 1.53 MiB)
2-element Vector{Float64}:
100000.0
100000.0
``````

Certainly the different approach would be to use a matrix instead of a `Vector{Vector{Float32}}`
Is that possible?

Probably like this

``````julia> input = repeat([1.0, 2.0], outer=(1,2))
2×2 Matrix{Float64}:
1.0  1.0
2.0  2.0

julia> output = sum(input, dims=2) # to see the shape, it is 2x1!
2×1 Matrix{Float64}:
2.0
4.0

julia> @btime sum!(\$output, \$input)
14.685 ns (0 allocations: 0 bytes)
2×1 Matrix{Float64}:
2.0
4.0
``````

I don’t know why `sum!` doesn’t work in this case.
Here is an allocation-free solution without manually implementing the sum:

``````using SplitApplyCombine
sum!(output, combinedimsview(input))
``````

It has a somewhat worse perfromance than your `sum_fixed!` though, about a factor of 1.5 for me.

In any case, be aware of this:

``````julia> x = repeat([[0,0]],3)
3-element Vector{Vector{Int64}}:
[0, 0]
[0, 0]
[0, 0]

julia> x[1][1] = 1
1

julia> x
3-element Vector{Vector{Int64}}:
[1, 0]
[1, 0]
[1, 0]

``````

use a comprehension probably:

``````julia> x = [ [0,0] for _ in 1:3 ]
3-element Vector{Vector{Int64}}:
[0, 0]
[0, 0]
[0, 0]

julia> x[1][1] = 1
1

julia> x
3-element Vector{Vector{Int64}}:
[1, 0]
[0, 0]
[0, 0]

``````
4 Likes

Oh my goodness I had no idea that this was the behaviour…

I’ve been looking for the bug in my code for days and it was caused by this all along.

2 Likes

Using an array of Static Vectors as input, a performant solution could also be obtained:

``````using StaticArrays

sum_reinterpret!(output, input, n, m) = sum!(output, reshape(reinterpret(Float64,input), (n, m)))

n = 2; m = 1_000
input = [ones(SVector{n, Float64}) for _ in 1:m]
output = zeros(n)

using BenchmarkTools
@btime sum_reinterpret!(\$output, \$input, \$n, \$m)  # 2.333 μs (0 allocations: 0 bytes)

# which compares to the loops on standard array of arrays input:
@btime sum_fixed!(\$output, \$input, 1_000, 2)      # 2.089 μs (0 allocations: 0 bytes)
``````

Can anyone tell me why there are 4 allocations in that loop?

To partially answer my own question, it’s because doing `vec[i][j]` is an allocation from `j`.

`vec[i]` doesn’t allocate, but accessing the `j`th element sometimes does.

`f_αβ[:, i]` also seems to allocate because it constructs a vector when called.

Not entirely allocation free but perhaps still useful:

``````using RecursiveArrayTools
sum!(output, VectorOfArray(input))
``````

Just need to modify your code to use a view at this point:
` sum_fixed!(sum_f_αβ, view(f_αβ, :, i), nagents, 2)`

and maybe relax the type definition in ` sum_fixed!()`

Then 0 allocations are obtained.
``````function sum_fixed!(output, input, n, m)
for j in 1:m, i in 1:n
output[j] += input[i][j]
end
end

function jac!(sum_f_αβ, F_α, F_α0, f_αβ, nagents)
for i in 1:nagents
sum_fixed!(sum_f_αβ, view(f_αβ, :, i), nagents, 2)
@. F_α[i] = F_α0[i] + sum_f_αβ
end
end

nagents = 4
f_αβ = [[0.0, 0.0] for _ in 1:nagents, _ in 1:nagents]
sum_f_αβ = [0.0, 0.0]
F_α0 = [[0.0, 0.0] for _ in 1:nagents ]
F_α = copy(F_α0)

using BenchmarkTools
@benchmark jac!(sum_f_αβ, F_α, F_α0, f_αβ, nagents)

@benchmark jac!(sum_f_αβ, F_α, F_α0, f_αβ, nagents)
BenchmarkTools.Trial: 10000 samples with 957 evaluations.
Range (min … max):   86.102 ns … 401.985 ns  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):      90.073 ns               ┊ GC (median):    0.00%
Time  (mean ± σ):   100.627 ns ±  26.710 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

█▆▄▁▂▇▃▂▃▂▁▂       ▂▁▁▁                                       ▁
█████████████▇▇▆▇▇███████▆▇▆▆▆▇▆▆▆▆▅▅▆▇▇▇▇▅▇▆▆▇▇▆▆▆▅▆▆▆▅▆▆▅▄▅ █
86.1 ns       Histogram: log(frequency) by time        213 ns <

Memory estimate: 0 bytes, allocs estimate: 0.
``````
1 Like

Thanks, I tried views but forgot that running my code outside a function would lead to spurious allocations!