I am trying to solve a big PDE, and I want to parallelize a finite difference. I am seeking some advice on how to do this.
At some point I do the following
using PDEPackage
# struct that contains the parameters
pde = PDEPackage.PDE()
# compute difference
sol1 = rand(100); sol2 = rand(100)
dsol = pde(sol1) - pde(sol2)
The computation of pde(sol) takes minutes. Hence, I am considering using distributed computing. One issue I have is that pde is a callable struct with overwritten internal field upon call pde(sol). So I wil need to duplicate the internal data.
Threaded Assembly · Ferrite.jl has a small example of a parallel FEM computation. And each thread has its own “ScratchValues” which is data that the thread owns and can mutate freely: Threaded Assembly · Ferrite.jl. So it is quite similar to what you consider.
It took me some time to analyse your answers. I forgot to mention that I am targeting distributed computing because each pde is threaded. I guess I will have to modify @kristoffer.carlsson suggestion.
Just FYI, the idiom scratches[Threads.threadid()] only works with Threads.@threads for (i.e., not with Threads.@spawn) and it also relies on the implementation detail of Threads.@threads. It’s possible that this idiom stops working at some point (though maybe we’d need to keep it as-is, given that many programs rely on it). I discussed this a bit in FAQ · FLoops
For new library functions, I suggest avoiding this idiom.
From what I understand, you will be able to set the scheduler to static “which creates one task per thread and divides the iterations equally among them.”.
Yes, that’s why said library code. Library functions cannot use :static since they can be called from any task. I think it’s OK to use scratches[Threads.threadid()] application/script code since you can control how the functions using @threads :static are called. But, I think it also makes sense to avoid scratches[Threads.threadid()] even in script code since @threads :static makes it impossible to nest your function into parallel higher-order functions (e.g., you may want to parallel map over some inputs using your parallel function).