Parallelization of for loop

Germinator · October 28, 2021, 2:36pm

Dear Julia users,

I am trying to parallelize the following loop, what do you think is the best approach?

         for n = 1:N
             kron!(temp,A[n,:],B[n,:])
             C.+= temp.*temp'
             Cx .+= temp.*x[n]
         end

I tried using FLoops, but the outputs differ, and the code is way slower:

        for n = 1:N
            kron!(temp,A[n,:],B[n,:])
            @reduce(C= zeros(M,M) + temp*temp', Cx = zeros(M) + temp*x[n])
        end

I am stuck and would appreciate any kind of help!

carstenbauer · October 29, 2021, 6:23am

Generally, before you parallelize a code, you want to make sure that the serial performance is decent. In particular, you want to avoid unnecessary allocations. Matrix slices allocate in Julia by default, so you want to put @views in front of the kron!.

Also you might want to transpose A and B, since operating on a sliced column should be faster than operating on a sliced row. Like Fortran/MATLAB and in contrast to C/Python, Julia is column major order.

mcabbott · October 29, 2021, 12:00pm

The difficulty with parallelising the loop over n is that you need somewhere to accumulate the results of each thread, to combine at the end.

I think it’ll work better to do the other loops in parallel. And to make the one that is summing the innermost. In which case this layout of the arrays probably isn’t a bad thing. But you will want to avoid making all these slices, and deal directly with the elements.

Here’s one way:

N = 5; M = 2;
A = rand(N,M); B = rand(N,M); x = rand(N); 
C = zeros(M^2, M^2); Cx = zeros(M^2); temp = rand(M^2);

function orig!(C,Cx,A,B,temp)
    for n in 1:size(A,1)
        @views kron!(temp,A[n,:],B[n,:])
        C.+= temp.*temp'
        Cx .+= temp.*x[n]
    end
end
orig!(C,Cx,A,B,temp)

using Tullio  # should parallelise, almost 4x quicker for me
# using LoopVectoriztion  # another 3x or so speed up
@tullio C4[j,i,l,k] := A[n,i] * B[n,j] * A[n,k] * B[n,l];
@tullio Cx4[j,i] := A[n,i] * B[n,j] * x[n];

reshape(C4, size(C)) ≈ C
reshape(Cx4, size(Cx)) ≈ Cx

function alt!(C,Cx,A,B)
    C4 = reshape(C, M,M,M,M)  # writing into original C
    Cx4 = reshape(Cx, M,M)
    @tullio C4[j,i,l,k] = A[n,i] * B[n,j] * A[n,k] * B[n,l]
    @tullio Cx4[j,i] = A[n,i] * B[n,j] * x[n]
    nothing
end

Germinator · October 31, 2021, 7:52pm

Amazing, thanks a lot!

Topic		Replies	Views
Parallelizing a for-loop with a matrix Performance	2	339	June 2, 2021
How to execute tasks in parallel in a for loop Performance parallel , multithreading , juliapro , optimization	27	1955	November 29, 2023
Trying to write a parallel for loop in Julia New to Julia	2	3165	June 6, 2020
How to code faster parallel for loop New to Julia parallel	8	7259	January 18, 2019
Need help understanding how to run a for loop in parallel General Usage parallel	2	446	July 27, 2020

Parallelization of for loop

Related topics