How to set up / use a Kubernetes-cluster for distributed-computing?

tim-hilt · February 17, 2022, 8:55am

I want to use the nodes of a Kubernetes-cluster to act as remote workers for my Julia script.

Coming from a workplace, where we mostly develop enterprise-software, I thought that virtually every cluster is based on Kubernetes nowadays. I know about the distributed-computing-capabilities of Julia and wanted to learn how to use these capabilities in a Kubernetes context.

However, I don’t really find documentation on how to do that! I read the README in this repository, which suggests to use SLURM, PBS or LSF as a job scheduler. Also, there’s K8sClusterManager.jl, which seems like it could do what I wanted - I’m just surprised, that it is such a small project! I expected distributed computing via Kubernetes to be a big topic in Julia, yet I can’t seem to find good documentation on how to actually set this up.

I already asked this on Reddit, but since I didn’t receive any answers there, i figured it might be a good question for the official forum.

johnh · February 17, 2022, 9:07am

@tim-hilt this is going to be a good discussion. HPC at the moment is running on two tracks - the traditional job schedulers and the new kids on the block with kubernetes. Me, i say dont discard the traditional job schedulers. They are under active development and the community there is vibrant.
Also worth saying that Docker is not the only kid on the block.
HPC users use Singularity containers and charliecloud

https://hpc.github.io/charliecloud/

If I may be allowed a small plug for the company I work for, Dell’s Omnia system allows a flexible infrastructure which can deploy OpenHPC with job schedulers or kubernetes

https://dellhpc.github.io/omnia/

tim-hilt · February 17, 2022, 9:18am

Nice! That’s already a great response. Thank you very much! I didn’t know about singularity-containers.

Maybe Kubernetes just isn’t the right tool for HPC with Julia, but maybe it should become it! Kubernetes is just such a big player and there’s so much know how in the community versus “traditional job-schedulers”. But then again, maybe that’s my view becaus of the filter-bubble I currently live in. I never was an active researcher; only a bit for my bachelors thesis, which didn’t even require a GPU for computing the models I needed, so I never had the requirement to use distributed HPC in the first place.

Topic		Replies	Views
Deployment at scale Tooling	8	2577	May 2, 2020
Getting started with HPC and Julia General Usage distributed	23	1029	September 28, 2023
Distributed Computing with Slurm and Julia Julia at Scale	9	3548	February 10, 2022
How to get started with distributed memory parallel programming? New to Julia	3	695	June 9, 2021
Running Julia in a SLURM Cluster Performance parallel , cluster , distributed	6	7655	April 11, 2024

How to set up / use a Kubernetes-cluster for distributed-computing?

Related topics