Julia cluster using docker

torce · December 5, 2016, 5:51pm

Hi,
I have asked this in stackoverflow.com and I was recommended to post it here:

I am trying to connect to docker containers using the default SSHManager. These containers only have a running sshd, with public key authentication, and julia installed.

Here is my dockerfile:

FROM rastasheep/ubuntu-sshd
RUN apt-get update && apt-get install -y julia
RUN mkdir -p /root/.ssh
ADD id_rsa.pub /root/.ssh/authorized_keys

I am running the container using:

sudo docker run -d -p 3333:22 -it --name julia-sshd julia-sshd

And then in the host machine, using the julia repl, I get the following error:

julia> import Base:SSHManager
julia> addprocs(["root@localhost:3333"])
stdin: is not a tty
Worker 2 terminated.
ERROR (unhandled task failure): EOFError: read end of file
Master process (id 1) could not connect within 60.0 seconds.
exiting.

I have tested that I can connect to the container via ssh without password.

I have also tested that in julia repl I can add a regular machine with julia installed to the cluster and it works fine.

But I cannot get this two things working together. Any help or suggestions will be apreciated.

lage · September 8, 2018, 4:22am

Hello @torce,

I recommend you to also deploy the Master in a Docker container. It makes your environment easily and fully reproducible.

I’m working on a way of deploying Workers in Docker containers on-demand. i.e., the Master deployed in a container can deploy further DockerizedJuliaWorkers. It is similar to GitHub - gsd-ufal/Infra.jl: Julia interface to launch cloud workers through Azure VM but assuming that Master and Workers run on the same host, which makes things not so hard.

It is an on-going work and I plan to finish next weeks. In a nutshell:

You’ll need a simple DockerBackend and a wrapper to transparently run containers, set up SSH, call addprocs with all the low-level parameters, get containers runtime info, etc. (i.e., the DockerizedJuliaWorker.jl file):

Read here how to build the Docker image (Dockerfile is included):

Please tell me if you have any suggestion on how to improve it.

Best,

André Lage.

Mikkel-Holm · October 19, 2018, 10:55am

Hi André Lage,
I have been looking at your project on github. Been thinking about such a setup ever since I started using Julia for my heavy compute workloads.

I am curious to get your thoughts on using your setup or Infra.jl on docker swarm.
Should one expect your scripts to work inside a docker swarm cluster?

Looking forward to getting your thoughts on this

Best Mikkel Holm
Data Scientist.

lage · October 21, 2018, 1:53pm

hi Mikkel,

I would divide the whole problem into three parts:

Infrastructure: VMs, bare metal servers, etc. for deploying containers.
Docker Backend: code for deploying Docker containers and making them reach each other (complete graph virtual network topology).
Dockerized Worker: create each Julia Worker in a container, it means to use addprocs with specific parameters and provide a set of handy management functions.

Docker Swarm Clusters on Azure addresses 1 and 2.

Infra.jl addresses 1, 2, and 3, but it works on Julia 0.4 (it is the result of Raphael Ribeiro’s bachelor thesis). Moreover, if you want to make Infra.jl scalable, you’ll need to decentralize its HTTP server which provides single ports for configuring SSH servers for each deployed container (or thinking of a smarter solution for this). I don’t know how Azure addresses this, but they might have a good decentralized solution as the demo video says that one can deploy hundreds of Docker container in 5 minutes.

DockerizedJuliaWorker.jl addresses 1, 2, and 3, but it works only for a single machine (VM, bare metal server, etc.).

Now, answering your question “Should one expect your scripts to work inside a docker swarm cluster?”: Yes, since one implements a specific DockerBackend.jl by only changing specific lines on the existent one. For instance, to implement an AzureDockerBackend.jl which runs Docker containers, one should only change this line.

I wrote a long answer to welcome yours and others comments and/or further contributions on the problem

Best,

André Lage.

Topic		Replies	Views
Julia on Cluster with SSH Restriction General Usage question , cluster	18	3949	January 16, 2021
Building cluster for Julia parallel computations New to Julia parallel	10	2273	June 22, 2017
What is the use case for Julia docker images? Tooling question , docker	10	5280	May 9, 2020
Addprocs gives connection refused to a docker container on raspberry pi New to Julia question	2	817	January 28, 2021
Deployment at scale Tooling	8	2577	May 2, 2020

Julia cluster using docker

Related topics