How to set up / use a Kubernetes-cluster for distributed-computing?

I want to use the nodes of a Kubernetes-cluster to act as remote workers for my Julia script.

Coming from a workplace, where we mostly develop enterprise-software, I thought that virtually every cluster is based on Kubernetes nowadays. I know about the distributed-computing-capabilities of Julia and wanted to learn how to use these capabilities in a Kubernetes context.

However, I don’t really find documentation on how to do that! I read the README in this repository, which suggests to use SLURM, PBS or LSF as a job scheduler. Also, there’s K8sClusterManager.jl, which seems like it could do what I wanted - I’m just surprised, that it is such a small project! I expected distributed computing via Kubernetes to be a big topic in Julia, yet I can’t seem to find good documentation on how to actually set this up.

I already asked this on Reddit, but since I didn’t receive any answers there, i figured it might be a good question for the official forum.

@tim-hilt this is going to be a good discussion. HPC at the moment is running on two tracks - the traditional job schedulers and the new kids on the block with kubernetes. Me, i say dont discard the traditional job schedulers. They are under active development and the community there is vibrant.
Also worth saying that Docker is not the only kid on the block.
HPC users use Singularity containers and charliecloud

https://hpc.github.io/charliecloud/

If I may be allowed a small plug for the company I work for, Dell’s Omnia system allows a flexible infrastructure which can deploy OpenHPC with job schedulers or kubernetes

https://dellhpc.github.io/omnia/

1 Like

Nice! That’s already a great response. Thank you very much! I didn’t know about singularity-containers.

Maybe Kubernetes just isn’t the right tool for HPC with Julia, but maybe it should become it! Kubernetes is just such a big player and there’s so much know how in the community versus “traditional job-schedulers”. But then again, maybe that’s my view becaus of the filter-bubble I currently live in. I never was an active researcher; only a bit for my bachelors thesis, which didn’t even require a GPU for computing the models I needed, so I never had the requirement to use distributed HPC in the first place.