I’ve run into a kind-of counterintuitive behavior when coding some example. I have an array `A_field`

, which I want to update by adding certain values in certain positions. These are provided by a an array of arrays of (flattened) indices `Part`

and an array of arrays of values `A`

, as follows:

```
N = 10
shapeA = (N, N)
A_field = zeros(shapeA)
Part = [[i, i+1] for i in 1:2:prod(shapeA)]
A = [rand(length(J)) for J in Part]
```

You may want to think of this as having split some big problem for `A_field`

into small chunks, solved them independently and then wanting to join the results.

The most straightforward implementation of the updating I can think of is the following:

```
function update1!(A_field, Part, A)
for (i, J) in enumerate(Part)
A_field[J] += A[i]
end
end
```

But this strangely results in as many allocations as entries of `A_field`

:

```
julia> @time update1!(A_field, Part, A)
0.000010 seconds (100 allocations: 9.375 KiB)
```

Which of course is not nice for big problems. (Disclaimer: I used all combinations of `@views`

, using `.+=`

instead of `+=`

and so on I could think of, and this is the one that allocated the least). A second possibility is to introduce a second loop over each `J`

:

```
function update2!(A_field, Part, A)
for (k, J) in enumerate(Part)
for (i, j) in enumerate(J)
A_field[j] += A[k][i]
end
end
end
```

Which is slightly more cumbersome to read and understand. This of course introduces no allocations:

```
julia> @time update2!(A_field, Part, A)
0.000006 seconds
```

Finally, reshaping the array and using views seems to get rid of the allocations (apart from the ones needed for the reshaping, of course):

```
function update3!(A_field, Part, A)
A_temp = reshape(A_field, :)
for (i, J) in enumerate(Part)
@views A_temp[J] .+= A[i]
end
end
```

```
julia> @time update3!(A_field, Part, A)
0.000009 seconds (2 allocations: 80 bytes)
```

So at this point I am a bit confused and have a lot of questions.

- Why does the 3rd version work well and not the 1st one?
- Is there any way (macro, trick) to obtain the performance of the 2nd version without sacrificing the readability of the 1st version?

Thanks!