I am learning the distributed computing capabilities of Julia as applied to numerical methods. I am currently interested in learning how to use the standard
Distributed capabilities rather than solutions from
MPI.jl and other packages. (Later on I may do a comparison with packages.) To this end, I am trying to solve the Poisson differential equation on a two dimensional structured grid.
If I were to solve this problem using one process, I would typically assemble a sparse matrix
A representing the discretized poisson operator, and the discretized source term would be a vector
b. The discrete solution
u is then obtained by solving the linear system
A*u = b.
In a parallel context, my approach is as follows. Consider for example a master process having 4 worker processes. The master process decomposes the domain into 4 blocks:
The global domain has 5 \times 5 = 25 degrees of freedom. Each worker has a local domain has 3 \times 3 = 9 degrees of freedom. Assume that every worker has a local array
u which contains its chunk of the solution. Further, a worker can assemble a local sparse matrix representing its portion of the linear operator.
If I were solving this problem iteratively, for example with jacobi iterations, every worker can perform a local matrix-vector multiplication. Since the poisson operator is a nearest neighbor operation, for the nodes on a boundary between workers, a worker will have to communicate with its neighbors to get the corresponding values of the nodes in the neighboring process so it can update its own value. For example, in the figure above
u from worker 2 matches
u from worker 3.
How do I go about getting worker 2 to access
u, u, u from worker 3 and adding it to its local vector at
u, u, u, for example?
I realized it would be easier to assemble my local sparse operator if I used “ghost nodes”. Basically expanding the grid by one position in each direction so that the value from a neighboring process can be copied into the local solution vector prior to performing a matrix-vector multiplication. From the perspective of “worker 2” in the above figure, this would turn into:
where the nodes have been renumbered and the corresponding solution from a neighboring process is shown with the appropriate color.
However, the question in general remains the same: how do I perform the “get from worker xyz” part of the operation?
is one answer. If your real question is that you want to know what’s happening under the hood so you can implement these things yourself, looking at its source code is a great way to learn.
Keep in mind also that if all nodes are on the same machine, SharedArrays will be considerably more efficient.
I did have a look at
SharedArrays. Since I am doing this as part of a project for a class, I would like to first have a version where I explicitly handle most of the communication calls. Subsequently I would like to compare it to a version which uses
getindex is the function that gets called to access elements, you can use the techniques in the Parallel Computing chapter. And, as suggested above, you can read through the source code of
DistriubtedArrays to learn how it works.
First of all, sincere congratulations for your second post if you found that out by yourself! What you are describing and showing in your nice figure is most fundamental for scientific supercomputing. It is commonly referred to as ghost node update or halo update.
I think that it is absolutely good that you initiate your project with doing all the communication explicitly yourself. This way, you will really understand how it works and learn the most. Then, in a second step you can do the same with
SharedArrays and compare performance etc.
The obvious choice for doing explicitely all the communication for the halo updates is MPI. You need to do essentially the following:
Set up a Cartesian process topology with MPI. From the resulting communicator, you will be able to retrieve for each process its coordinates in the process grid, its neighboring processes, and so on…
Do a time loop where you do:
2.1 Do a time step to solve the Poisson equation.
2.2. Update the halo with MPI: get from the neighbor what I could not compute myself (from the view of a process)
As Julia is a very recent language, documentation is often only partially available. Search on www.mpi-forum.org what the different MPI functions do.
PS: I would start even with a fake distributed parallelization. That means, inside the time loop do the following:
solve the poisson equation on each fake process
for p = 1:nprocs
#solve poisson equation
update the halo on each fake process
for p = 1:nprocs
This way you can first in the most simple manner learn wich process has to get which data from where without facing the difficulties of debugging distributed parallelization. Once this works as it should, you can use MPI to do the things really in parallel.
Check out the Life cellular automata example in the DistributedArrays documentation. Doing this yourself is just a lot of bookkeeping. Note that you don’t need MPI to do this — the primitives in the Distributed standard library are all you need.