Why is a manual in-place addition so much faster than += and .+= on range-indexed arrays?

Try @views A[I,J] .+= subA.

In Julia 0.5, A[I,J] .+= subA is equivalent to A[I,J] = A[I,J] .+ subA. This first makes a copy of A[I,J] on the right hand side, since slicing makes a copy in Julia, then allocates a new array for A[I,J] .+ subA, then assigns the result to A[I,J].

In Julia 0.6, A[I,J] .+= subA is equivalent to A[I,J] .= A[I,J] .+ subA. The slice A[I,J] still allocates a new array for a copy of the slice. Because the .= and .+ are fusing operations in Julia 0.6, however, no new array is allocated for the result of .+, and instead it is written in-place into A[I,J].

If you use @views A[I,J] .+= subA, then the slice A[I,J] on the right-hand side will instead produce a view, so the entire operation will occur in a single in-place loop with no temporary arrays allocated. See the Consider using views for slices section of the performance tips.

That being said, the “manual” loops will still be a bit faster, even with @views. First, they avoid the overhead of allocating the small view object. More importantly, your manual loop can take advantage of the fact that the A[i,j] on the left and right-hand sides of the assignment operation are the same, and can probably share more operations in the compiled version of the indexing operations. The array accesses may also be slightly more efficient than going through a view object.

13 Likes