Which workflow to launch jobs on a cluster?


#1

Hi, I am currently learning Julia and looking at the ecosystem to understand how to rewrite some existing code. I launch my code on a cluster, and therefore one part I’m curious about is how to use Julia features to simplify my current workflow.

What I’m currently doing (with Matlab code) is pretty laborious:

  1. modify the code on my laptop, and generate the list of jobs/parameters to launch;
  2. run a script to rsync the code to a location available to the cluster’s nodes;
  3. ssh to the frontend (no direct access to the nodes);
  4. launch from the frontend an interactive session on a node, “compile” the code, run a script to generate a wrapper setting the environment.
  5. from the frontend, call a launcher script that will submit a job for each desired set of parameters. Given that some jobs can crash for various reasons, the script will check the output files and the running jobs to detect which ones should be (re)launched.
  6. Wait for the jobs to be scheduled and run.
    (6bis. Realize there was a human error somewhere and go back to 1.)
  7. run a script to fetch the results (csv files) on the laptop.

Now, I guess I could have a similar workflow with Julia, but it looks like some Julia features/packages might help to simplify that. How would you do that?

For instance I read about ClusterManager, which I understand as an abstract layer for launching jobs on various clusters (step 6.). Would that make sense / be possible to also use that directly from the laptop, and do everything from there? Are these features only supposed to be useful when each job is distributed over multiple cpus/cores, or does it also makes sense to use them when each job is launched on 1 cpu?


#2

@dapz I work in HPC. You have a classic workflow there. I hope I can be of some help. Maybe not!

You could only submit jobs directly from your laptop if the laptop is registered as a submit node in your batch system. ie the cluster will only ‘recognize’ certain hosts as valid hosts to run jobs for. Also there is the problem of synchronising userids. Your numeric UID on your laptop probably is not the same as that on the cluster.
(True - you could write a ‘wrapper script’ which would ssh into the cluster frontend and submit the jobs)

(True - the UID problem is fixable, by using Samba or Centrify or sssd)

You do not say what OS is running on your laptop.
To help with the ‘transfer CSV files back and forth’ part of the workflow there are utilities available.
For Windows I cannot recommend MobaXTerm highly enough https://mobaxterm.mobatek.net/
The Graphical SFT Browser is what you are looking for.

Oh, and you should be able to mount the storage from the HPC cluster directly as either a Samba share or an NFS share (Windows comes with NFS these days)
If not you have ssh available - so can mount as an ssh filesystem. Message me off list and I can help.

Can you say which OS on the laptop please?


#3

In a second answer to your question, yes you can automate the submission of jobs.
Firstly, if you are submitting a lot of jobs you are aware of Array Jobs, which are a feature of most batch systems.
Secondly there is a rather old API for job submission called DRMAA
http://drmaa-python.readthedocs.io/en/latest/
Most batch systems will understand DRMAA.
May I ask which batch system is used on your cluster?


#4

And, sorry to be such a boring person, there are web front ends available for many batch systems.
These allow you to submit jobs and to look at results files while the job is running.
As a for instance
https://pbsworks.com/PBSProduct.aspx?n=Altair-Access&c=Overview-and-Capabilities

None of my replies actually mention Julia… but I hope I can help simplify things for you.
And yes, clustermanager.jl will be useful to you. Depends again on which exact batch system you use.


#5

This is interesting, but probably iPython specific

there is an adaptation layer for magics in Julia https://github.com/JuliaLang/IJulia.jl/blob/master/src/magics.jl


#6

Thank you for your answers. My question was indeed only half julia-specific.

The laptop is running linux (fedora).
The cluster is OAR-based (https://oar.imag.fr/).
It looks like the UID is the same (This is a professional laptop and the the same LDAP account is used, not sure exactly how it works but “id -u …” gives the same number. Not sure what happens in asynchronously launched jobs.). But for sure it would be better to have a generic solution if I need to launch everything from a different computer someday.

The ‘transfer the code’ and ‘transfer the results’ parts are not complex in the sense that it’s one line of rsync in a script (and I could indeed even maybe mount the storage). I am just wondering what would be the best way to automate it (at least for sending the code, because for the results I have to wait for the jobs to finish anyway so I guess doing by hand is the best option.)

To come back to ClusterManager: suppose that there is a ClusterManager for OAR clusters, which part of the process would be easier using it? (I’m not sure to fully understand what the use-cases are, except that it is more abstract and this is probable desirable if someone else wants to run your code on a different type of clusters.)

What would be interesting would be to make the connection through the frontend “invisible”. I guess the ideal goal would be to build a julia function taking as argument one function and one object representing a parameter space, which for each set of parameters, submits a job on a node of the cluster (through the frontend) calling the given function with the given parameters, without having to exit julia or open any other terminal? Such a function would start by copying the “last” version of the code to the cluster.

I have heard of Jobs arrays but never used them, I should definitely look into that. (I’m frequently running 100s or 1000s of independent jobs at once.)
Regarding DRMAA, not sure to understand how it fits in the workflow. This is an API designed to unify the commands used by the different cluster managers, is that it?
The “Altair Access” that you suggested seems powerful, but I actually don’t even need a GUI, just launching everything from the REPL would be great.
Regarding magics, again not sure to understand exactly where it fits in the workflow. Is it a collection of “macros” to automize some tasks (possibly related to a cluster)?


#7

@dapz I do not know OAR. But it does look interesting.

DRMAA yes is designed to unify the commands used by different cluster managers.

Regarding ClusterManager,jl there is no interface between ClusterManager and OAR.
Here now is where I make a stupid statement - perhaps we could collaborate and work on ClusterManager and adapt it to OAR? It may be easy to do that.

I would like a Julia project, and as I know cluster managers this would be a good idea,


#8

Indeed writing a ClusterManager interface for OAR would be a requirement and I can try to do that, but before doing so I want to understand how to use it properly (i.e. understand why it is actually helpful).

After reading a bit more, here is what I have in mind:

  1. have a function/script to copy the relevant parts of the code from the laptop to the cluster;
  2. on the laptop, call addprocs through ssh to have a worker on the frontend;
  3. then have a function submitting all the jobs on the cluster (using the correct interface of ClusterManager). This function would be called on the frontend worker using @spawnat.
  4. have a function fetching the useful information about the submitted jobs (e.g. scheduled time), to have an idea of what is happening on the cluster without exiting the laptop’s REPL.

Now I still don’t really understand part 3: if I use addprocs, all the new workers will be connected to the current process, right? What I need to do is, for each job, to launch/submit a new process (possibly himself launching several workers if it is distributed). This should be done in an asynchronous way, I just want to add the jobs to the queue, but don’t wait for the results. So does it still makes sense to use addprocs in this context? If it does, then what should the manage() function of ClusterManager do in this case, nothing?

Sorry if my questions sound a bit silly, but this is new for me.
I’m starting to think that the previous workflow was not so laborious in the end :).


#9

@dapz I am going to be a bit rude in my answer. Please forgive me - very often on forums the emotion of a reply is not transmitted.

I do actually think your workflow is laborious. As I tried to say, especially the part regarding file transfer backwards and forwards to the cluster.
Does it really not work for you to ssh into the cluster and keep all your data files there? You can use a remote X display for any graphical tools. There are ways to speed up remote X if you use 3D, and to use remote GPUs.

Giving you an answer to the use of ClusterManagers will mena me reading your response more closely!
I think we are slightly merging together two topics.
One - how to use your laptop as a ‘front end’ to computational resources. Ie the laptop assembles and launches compute jobs. This is great - and what we should be doing

Two - how to run compute jobs on the OAR cluster.

As I say, sorry if I am sounding rude. I really do not mean to be.


#10

Have a look at this documentation for Slurm
http://jpfairbanks.net/2017/12/27/running-julia-on-slurm-cluster/
The ‘addprocs’ is called within the job script.
That job script slightly hard wires in 4, but we can work on that.

I think no.
Searching around, it looks like there is a set of Docker images which will build a test OAR cluster.
It might be worth playing around with these.


#11

Looking at OAR more closely, there is a project called Taktuk

https://www.grid5000.fr/mediawiki/index.php/Using_TakTuk

I am sot sure how that would integrate with Julia parallelism.


#12

Indeed, I think these are two different problems. Thank you for your suggestions.

Regarding the first one (improving the workflow), there will always be some code/data transfer as I code locally on my laptop. It doesn’t make sense for me to work directly on the cluster through ssh if this is what you meant (slower, not always available, etc.), although it is indeed technically possible. I think this part is just a matter of putting things and scripts together, it should not be too technical.

The second part is probably more challenging. I think I’m not familiar enough with cluster systems to understand what is really specific to OAR or not. In my mind, I thought that once the resources were allocated by the cluster manager, the job had directly access to the allocated resources, but I understand it might be a more tricky if the goal of the ClusterManager interface is (among other things) to deal with that and connect properly to other cores/nodes. I might come back with a suggestion of interface for OAR, but I’m afraid I have other issues to fix before that, so it might not be right now.