Advice for parallel computation


I am trying to solve a big PDE, and I want to parallelize a finite difference. I am seeking some advice on how to do this.

At some point I do the following

using PDEPackage
# struct that contains the parameters
pde = PDEPackage.PDE()
# compute difference
sol1 = rand(100); sol2 = rand(100)
dsol = pde(sol1) - pde(sol2) 

The computation of pde(sol) takes minutes. Hence, I am considering using distributed computing. One issue I have is that pde is a callable struct with overwritten internal field upon call pde(sol). So I wil need to duplicate the internal data.

I am considering doing the following

pde1 = pde
pde2 = deepcopy(pde)
data = [(pde1, sol1), (pde2, sol2)]
result = pmap(x -> x[1](x[2]), data)
dsol = result[1] - result[2]

Does anyone has better suggestion?

Threaded Assembly · Ferrite.jl has a small example of a parallel FEM computation. And each thread has its own “ScratchValues” which is data that the thread owns and can mutate freely: Threaded Assembly · Ferrite.jl. So it is quite similar to what you consider.


Here some course notes to do finite-diff on the GPU (or threaded CPU): GitHub - luraess/geo-hpc-course: Parallel CPU and GPU high-performance computing - short course . Maybe good for inspiration.

1 Like

Thank you for your suggestions!

It took me some time to analyse your answers. I forgot to mention that I am targeting distributed computing because each pde is threaded. I guess I will have to modify @kristoffer.carlsson suggestion.

Just FYI, the idiom scratches[Threads.threadid()] only works with Threads.@threads for (i.e., not with Threads.@spawn) and it also relies on the implementation detail of Threads.@threads. It’s possible that this idiom stops working at some point (though maybe we’d need to keep it as-is, given that many programs rely on it). I discussed this a bit in FAQ · FLoops

For new library functions, I suggest avoiding this idiom.

From what I understand, you will be able to set the scheduler to static “which creates one task per thread and divides the iterations equally among them.”.

1 Like

Yes, that’s why said library code. Library functions cannot use :static since they can be called from any task. I think it’s OK to use scratches[Threads.threadid()] application/script code since you can control how the functions using @threads :static are called. But, I think it also makes sense to avoid scratches[Threads.threadid()] even in script code since @threads :static makes it impossible to nest your function into parallel higher-order functions (e.g., you may want to parallel map over some inputs using your parallel function).