Most efficient way of adding elements within matrices in loops

BMI_OR · November 18, 2024, 9:14pm

I have a vector A and I need to update its values with the ones of matrix B. Each element of B needs to be added to some values in A. The mapping of indeces from matrix to vector is given by a matrix of vectors C. I have a piece of code that currently does the work but it is inefficient:

for i in 1:I, j in 1:J, k in 1:K
    vector_indices_to_update = C[i,j,k];
    A[vector_indices_to_update] .+= B[i,j,k]
end

What is the most efficient way to implement this operation?
As additional info, the length of vector A is significantly larger that the size of the dimensions of the matrices,
Thanks!

foobar_lv2 · November 18, 2024, 9:41pm

Welcome to this forum!

We can better engage with your question if you follow the PSA on questions.

The thing that your example is missing is “self contained runnable + mimics your true usecase”. To give an example in style how this could look like:

using BenchmarkTools

I=10;J=11;K=12; N=500; A = rand(N); C=rand(1:N, I, J, K); B=rand(I,J,K);

function update_indices!(A, B, C)
  for i in 1:I, j in 1:J, k in 1:K
    vector_indices_to_update = C[i,j,k];
    A[vector_indices_to_update] += B[i,j,k]
  end
end

@btime update_indices!($A, $B, $C)

What you need to do in order to get useful help is write code snippets that initialize A,B,C. “the length of A is significantly larger…” is all fine and dandy, but it is better to simply incorporate that into the sample problem you show us.

Please also think about all the details, and make sure that they all mirror your true usecase. If your real data is not sorted, then you SHOULD NOT give sorted samples; if your real data is Float32 then you give us Float32, etc etc.

Don’t be go :surprised_pikachu: if you post collect(1:10_000) as a test vector and proposed solutions delete the collect, a la

julia> @btime sum($(1:1000_000))
  2.016 ns (0 allocations: 0 bytes)

sgaure · November 18, 2024, 10:38pm

BMI_OR:

I have a piece of code that currently does the work but it is inefficient:
for i in 1:I, j in 1:J, k in 1:K
    vector_indices_to_update = C[i,j,k];
    A[vector_indices_to_update] .+= B[i,j,k]
end
What is the most efficient way to implement this operation?

This looks very efficient to me, provided I, J, K, A, B, and C are either local variables, or const global variables, and A and B have the same element type. Otherwise, there may be considerable performance penalties. Preferably, the vectors in C should be sorted, but I’m uncertain how much that matters. If you are sure no index is out of bounds you can add @inbounds before the for.

Just one thing. The A[v] .+= B[i,j,k] translates to A[v] .= A[v] .+ B[i,j,k], so when v is a vector, there will be a copy on the right hand side. You can do @view(A[v]) .+= B[i,j,k] to avoid that.

DNF · November 18, 2024, 11:27pm

If each vector_indices_to_update is a vector, then defitely add @views or iterate over it in a loop.

Also, your iteration order is suboptimal, since Julia arrays are column major. Either change the loop order, or, even better, get the correct order automatically:

for i in eachindex(B, C) 
    vector_indices_to_update = C[i]
    @views A[vector_indices_to_update] .+= B[i]
end

or

for i in (val, inds) in zip(B, C)
    @views A[inds] .+= val
end

DNF · November 18, 2024, 11:31pm

I don’t think that is going to be safe in this case, since loaded indices don’t have the same guarantees as the ones coming from eachindex.

Topic		Replies	Views
High performance vector/matrix/tensor linear algebra operations Performance question , performance , linearalgebra	9	543	January 20, 2023
Add rows from one matrix to another by index New to Julia	3	675	September 29, 2019
Adding vectors of different lengths New to Julia performance	12	2246	February 8, 2017
Best practice for element-wise vector value update without memory allocation Numerics question	3	291	March 12, 2023
Matrix-vector multiplication slower than a 'naive' for loop? Performance vector	7	1653	July 30, 2020

Most efficient way of adding elements within matrices in loops

Related topics