Have a time issue, need advises on my parallel code

Hello everyone.

I am working on a computation fluid dynamics code which needs to solve some subdomains parallel defined in function ASPINS1Sub.
To do this, I wrote the code given below which consists of many if statements.

g1 = @spawnat 2  ASPINS1Sub(COMP, u, Fparam, N, Tln, UTln, MIln, SOLll,
            Tll, UTll, Sis, Sin, 1, G, INs1)
if nsubs > 1
                g2 = @spawnat 3 ASPINS1Sub(COMP, u, Fparam, N, Tln, UTln, MIln, SOLll,
                Tll, UTll, Sis, Sin, 2, G, INs1)  end
 if nsubs > 2
                g3 = @spawnat 4 ASPINS1Sub(COMP, u, Fparam, N, Tln, UTln, MIln, SOLll,
                Tll, UTll, Sis, Sin, 3, G, INs1)  end
if nsubs > 3
                g4 = @spawnat 5 ASPINS1Sub(COMP, u, Fparam, N, Tln, UTln, MIln, SOLll,
                Tll, UTll, Sis, Sin, 4, G, INs1)  end
.
.
.

nsubs is number of subdomains. So I write many if2 like this.
At the end I collect data and sum them as:

G = fetch(g1)
if nsubs > 1    G += fetch(g2)   end
if nsubs > 2    G += fetch(g3)   end
if nsubs > 3    G += fetch(g4)   end
if nsubs > 4    G += fetch(g5)   end
.
.
.

I also have 2 functions like in this patterns, many if statements.
However I have an issue with having too much if statemets.
It makes it take many time to enter to 2 upper function, 30-60 sec.

There should be an easier/alternative way to do this.
I need advises to make it better.

It’s hard to determine how to structure the parallelism without knowing what your solver’s doing. I’m having a hard time imagining a fluids code where you’d simply sum the results from each subdomain. Can you put together a minimal working example?

This section of the manual may be helpful: https://docs.julialang.org/en/stable/manual/parallel-computing#man-shared-arrays-1

If you really want a parallel reduction, you can use @parallel or pmap.

See https://docs.julialang.org/en/stable/manual/parallel-computing#Parallel-Map-and-Loops-1

Something like:

function f(nsubs)
    Gsum = @parallel (+) for subdomain in 1:nsubs
        ASPINS1Sub(COMP, u, Fparam, N, Tln, UTln, MIln, SOLll, Tll, UTll, Sis, Sin, subdomain, G, INs1)
    end
    return Gsum
end

function g(nsubs)
    Gs = pmap(subdomain -> ASPINS1Sub(COMP, u, Fparam, N, Tln, UTln, MIln, SOLll, Tll, UTll, Sis, Sin, subdomain, G, INs1), 1:nsubs)
    return sum(Gs)
end

Thank you. I have checked that page and this worked well. I finally got rid of many if statements.

The code is a kind of nonlinear preconditioner. I have fixed that issue thanks to greg_plowman.

On the other hand, I also need advise on Shared Arrays.
In one routine, I am generationg very large and sparse matrix in a for loop. I tried to parallelise but couldnt do it because of that fact that ın Julia there is no Shared Sparse Matrix.
I have seen that a package(ParallelSparseMatMul support it but package did not worked for me.

Any ideas on this?

Is your plan to assemble the sparse matrix in parallel? I advise against it (sorry if misread your question).

Remember that changing the sparsity structure of sparse matrices is O(N); hence, get an O(N^2) algorithm if you generate your matrix by something like

A=sparse([],[],Float64[],100,100);
for idx,val in keyvals
A[idx...]=val
end

You should rather collect the keys and values and call a sparse matrix constructor sparse(Is, Js, Vs, m, n) on it. I think there is no value in trying to assemble in parallel; the built-in single-threaded algorithm should be able to fill all your available memory very rapidly. If the computation of the coefficients is the expensive part, then just store them somewhere and join the sets before calling the constructors; if the computation of sparsity structure is cheap, you can also use the next paragraph.

Of course, mutating the sparse matrix post-construction in multiple threads makes perfect sense, as long as you never modify the sparsity structure. In this case, make sure to create slots for entries which are initially zero but might become nonzero later (by having an i,j,0.0 entry during construction). I am not sure about modifying values in different processes (@parallel instead of @threads).

If you are instead asking for parallel sparse BLAS, sorry, no idea.

Dagger (GitHub - JuliaParallel/Dagger.jl: A framework for out-of-core and parallel execution) has support for distributed sparse matrices. Also JuliaDB.jl has a distributed n-dimensional sparse array, but I don’t think it has the typical array operations implemented on it since it’s used in the context of databases.

I have checked Dagger package. I saw the only usage is s1 = sprand(Blocks(4000,4000), 10^4, 10^4, 0.01).
I could not impelement it on my usage that is:

A = spzeros(N,N)
then, fill some of the element of A within a for loop.

The pattern of A matrix is not known and is changable for grid size.

Typically when constructing sparse matrices, you would accumulate the indices and values in separate arrays and then construct the matrix at the end like foobar said above. But if you want to construct a sparse array using getindex and setindex!, the SparseMatrixCSC data structure will be slow since setindex! is an O(nnz) operation for it. I made a sparse matrix datastructure (https://github.com/vvjn/SparseMatrixLILs.jl) a while ago that does setindex! in O(d) where d is the average number of elements in a column. It’s good for constructing sparse matrices conveniently. I guess if Dagger had setindex! it could do what you’re thinking of.

As other suggested, you should assemble the [i,j,v] rows (i and j are indices and v are values) in parallel (one matrix on each processor) and then vertically concatenate them on the master process and convert that matrix into a sparse matrix. Thus, most of the sparse matrix creation can be done in parallel while only the last conversion is done by a single processor. (This last part could be made more efficient if there is a particular known sparsity structure such as the matrix being block-diagonal.)

Thank you all.

I better try this. I understand what you mean.