I’ve run into a kind-of counterintuitive behavior when coding some example. I have an array A_field
, which I want to update by adding certain values in certain positions. These are provided by a an array of arrays of (flattened) indices Part
and an array of arrays of values A
, as follows:
N = 10
shapeA = (N, N)
A_field = zeros(shapeA)
Part = [[i, i+1] for i in 1:2:prod(shapeA)]
A = [rand(length(J)) for J in Part]
You may want to think of this as having split some big problem for A_field
into small chunks, solved them independently and then wanting to join the results.
The most straightforward implementation of the updating I can think of is the following:
function update1!(A_field, Part, A)
for (i, J) in enumerate(Part)
A_field[J] += A[i]
end
end
But this strangely results in as many allocations as entries of A_field
:
julia> @time update1!(A_field, Part, A)
0.000010 seconds (100 allocations: 9.375 KiB)
Which of course is not nice for big problems. (Disclaimer: I used all combinations of @views
, using .+=
instead of +=
and so on I could think of, and this is the one that allocated the least). A second possibility is to introduce a second loop over each J
:
function update2!(A_field, Part, A)
for (k, J) in enumerate(Part)
for (i, j) in enumerate(J)
A_field[j] += A[k][i]
end
end
end
Which is slightly more cumbersome to read and understand. This of course introduces no allocations:
julia> @time update2!(A_field, Part, A)
0.000006 seconds
Finally, reshaping the array and using views seems to get rid of the allocations (apart from the ones needed for the reshaping, of course):
function update3!(A_field, Part, A)
A_temp = reshape(A_field, :)
for (i, J) in enumerate(Part)
@views A_temp[J] .+= A[i]
end
end
julia> @time update3!(A_field, Part, A)
0.000009 seconds (2 allocations: 80 bytes)
So at this point I am a bit confused and have a lot of questions.
- Why does the 3rd version work well and not the 1st one?
- Is there any way (macro, trick) to obtain the performance of the 2nd version without sacrificing the readability of the 1st version?
Thanks!