Running a process on several nodes on cluster

Hi everyone,

I am trying to run a process that requires a huge amount of memory on several nodes on cluster. I am trying to use the Distributed.jl package, but it does not seem to work, so I would like to seek some advice on how to do this properly.

A MWE is as follows. What I want to do is to construct a matrix a recursively, say, via a function f. In this example, I am appending the vector a in each step of the for-loop with an extra element. Step i of the loop depends on step i-1. So, the steps are not independent of each other.

The reason that I need to use several nodes on the cluster is that the actual code I am running requires large memory so that each node does not have enough memory to run the entire task. The actual task is more complicated than what I show below, but I think I have summarized the main idea in the example.

A related question is: I have been using qsub to submit jobs on a cluster. If I need to use multiple nodes with the Distributed.jl package, would I need to use another command?

using Distributed

@everywhere function f(a, i)
  b = [a; 0.0]
  b[i] = i + b[i - 1]
  return b
end

a = [0.0]
@distributed for i = 2:5
  a = f(a, i)
end

Any thoughts are appreciated. Thanks!!

1 Like

I would take a look at ClusterManagers.jl to see if there is a configuration there that matches your system. Otherwise, you will need to tell us more about your cluster.

Thanks @mkitti! I will take a look at ClsuterManagers.jl.

But I think the main issue here is that the MWE that I attached does not work on my computer. So, I am wondering there is something wrong with setting up the code for the problem at hand. Thanks!

A related question is: I have been using qsub to submit jobs on a cluster. If I need to use multiple nodes with the Distributed.jl package, would I need to use another command?

Hi. Are you still working on this problem? I understand you are on PBS Torque. Also, as I understand the code provided by you, the way you are currently shaping it, is that you are using only the cores available on one of your nodes. In order to utilize several nodes, first you have to start the processes on all of them. And now, I fully agree with @mkitti that more info would be useful as differences between the clusters are sometimes significant. Based on my own experience with PBS, for example, I was not able to use PBS features of ClusterManagers.jl because of the way my login node was configured. I had more luck with “julia --machine-file” option that is listed here: GitHub - Arpeggeo/julia-distributed-computing: The ultimate guide to distributed computing in Julia but still it was not a smooth ride as my cluster used a cshell. So, to sum up, nothing is easy in the Julia HPC World but definitely worth giving it a try. I believe a kind of authoritative guide on running distributed jobs is provided by @vchuravy here Struggling to Run Distributed I/O Operations on SLURM Cluster - #2 by vchuravy. Unfortunately, it uses SLURM not PBS.

This is certainly easier on SLURM, to the extent that packages such as GitHub - kleinhenz/SlurmClusterManager.jl: julia package for running code on slurm clusters and GitHub - jishnub/SlurmAddAllocatedProcs.jl: Julia package to easily add workers while using Slurm in batch mode exist to make adding processes convenient. PBS, from what I understand, is not as well supported, but I have had luck using julia --machine-file in the past, so it’s worth exploring that option. Note that in any case, you need to allocate the cores through your job scheduler batch script, and not through Julia. Once the cores have been allocated, Julia may create processes that use these cores.

I am going to go out on thin ice here…
First talk to your friendly cluster admins to understand how to set up a job in PBS

IF you cannot make progress there is a facility to run am interactive job in PBS - use qsub -i
You can reserve a certain amount of nodes/cpu cores that way.
Then I think you should be good to go with ssh into the first node and julia --machine-file
You can get the machine file from PBS as $PBS_NODEFILE

Do not complain if your friendly cluster admins become unfriendly and come after you with a pitchfork.

1 Like

It would be best to focus on a single question with a post. What I do not understand about your problem is how much information does step i need from step i-1. If it needs all the information, I’m not sure how are you going to save memory by going distributed. You either need to do some sort of chunking and/or serialization to disk. As of right now, you’re algorithm is inherently serial.

3 Likes

Thanks @j_u! Yes I am on PBS.

I am trying to use the cores on two nodes, if that makes sense. This is because my jobs require a large memory and the available memory on one node is not enough.

I will read the links that you’ve shared!:slight_smile:

1 Like

Thanks @jishnub! Correct me if I am wrong, but reading the description packages, am I right that the SLURM packages require the jobs on each node to be independent of each other?

Thanks! Yes, that’s right. My algorithm is serial. The problem is that when i gets large, the memory used is very large. That’s why I am trying to see whether it is possible to use memory from more than 1 node so the job can keep on running, if this makes sense.

No, this isn’t necessary. MPI-based or MPI-style programs with data transfer across nodes are certainly permitted.

I see. I am not familiar with MPI. Would the package MPI.jl be the way to get started? Thanks!!

Yes, that’s the package for MPI