HPC Workflow with VS remote extension

Hi all,

I would like to hear from people about their workflow in VS code while working with a HPC. In my experience, I use VS code (with remote extension) to be able to edit my code. I share my code via Git (and GitHub), so I have access to it on all my machines.

The issue I run into is that the integrated REPL with the Julia VS code extension runs on the login node, which is not really desirable. Instead, I launch an interactive session via Slurm in one of the terminals and run my code there. This has a significant disadvantage of having no connection to the Plots pane, or the other Julia extension features.

Does anyone have a way of connecting the VS code extension to a REPL running on a compute node?

2 Likes

After reading this discussion, I’ve tweaked the content of my .ssh/config file:

Host hpc
    HostName hpc.yourorganisation.com
    User username

Host hpc-job
	ForwardAgent yes
	StrictHostKeyChecking no
	UserKnownHostsFile=/dev/null
	ProxyCommand ssh hpc "salloc --nodes=1 --ntasks-per-node=8 --mem=16G --time=00:00:00 /bin/bash -c 'nc \$SLURM_NODELIST 22'"
	User username

This way when I connect to the hpc-job host in VSCode, it allocates ressources on a compute node and starts the VSCode session on this node.

This is still not ideal for a couple of reasons:

  • you’d want to use srun instead of salloc to better restrict the available resources
  • you still need to manually create a scratch directory, then transfer the files back and forth, so that you don’t write in home which is usually a network drive.

There are more complete solutions linked in the VSCode issue, however they seem quite complex to adapt to a generic HPC.

3 Likes

I just tried this and it doesn’t seem to work for me. This could be for a number of reasons:

  1. nc command is not available on the nodes
  2. I’m using Windows as the host machine
  3. The HPC I’m using has 2FA enabled

After some more investigation, nc is not available on the compute nodes, but is on the login node.

I haven’t tested on Windows but I don’t think this is the reason it doesn’t work. Note that you need to have setup passwordless ssh login on hpc for this to work. I’m not sure about 2FA though.

What happens if you run ssh hpc-job from a terminal on your machine?

Can you also try salloc /bin/bash -c 'nc \$SLURM_NODELIST 22' from the login node?

I can not quite fully understand your problem as I do not have much experience with Slurm and on my PBS/TORQUE cluster I do not have any problems at all with interactive connections. Thus I will mention two topics that may potentially be to your interest and you may want to investigate them further:

a) Connect to External REPL feature of VS Code Julia extension, as the name suggests, provides you with the option to connect to external Julia REPL session (Vscode 'send to active terminal' for persistence and access to scheduler-controlled HPC nodes? - #2 by pfitzseb)

and

b) Quick Cloudflared Tunnels that give you the ability to ssh to any machine without public ip (Quick Tunnels · Cloudflare Zero Trust docs)

Hope it may be of any help.

Thanks for your insight! To answer this:
a) This helps when just running the code, but being able to do this is not an issue for me, as I usually just include the file and then type out the functions I want anyway, or just copy and paste. It’s nice to know there is a feature for this built in natively.
b) I am not sure this is an option at all as I imagine they (the HPC admins) would view this as a bit of a security risk and would likely not allow this.

The main feature I would like is to be able to connect the process to the extension, to allow for debugging, plot viewer etc. All of these features you lose when not connected. However, I will try the “Connect to external REPL” feature mentioned in your first link!

Trying the “Connect External REPL” is the feature to use, however, it fails on my HPC:

julia> pushfirst!(LOAD_PATH, raw"/home/p/USERNAME/.vscode-server/extensions/julialang.language-julia-1.6.10/scripts/packages");using VSCodeServer;popfirst!(LOAD_PATH);VSCodeServer.serve(raw"/tmp/vsc-jl-repl-94200bcf-7554-4b0b-9a6f-80e42c63f4b8"; is_dev = "DEBUG_MODE=true" in Base.ARGS, crashreporting_pipename = raw"/tmp/vsc-jl-cr-3f8b0bc5-8d82-422c-9318-3684a8b173fa");nothing # re-establishing connection with VSCode
ERROR: IOError: connect: no such file or directory (ENOENT)
Stacktrace:
 [1] wait_connected(x::Base.PipeEndpoint)
   @ Sockets /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Sockets/src/Sockets.jl:532
 [2] connect
   @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Sockets/src/Sockets.jl:567 [inlined]
 [3] connect
   @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Sockets/src/PipeServer.jl:97 [inlined]
 [4] serve(args::String; is_dev::Bool, crashreporting_pipename::String)
   @ VSCodeServer ~/.vscode-server/extensions/julialang.language-julia-1.6.10/scripts/packages/VSCodeServer/src/VSCodeServer.jl:101
 [5] top-level scope
   @ REPL[1]:1

I think the issue is that we dont have access to /tmp, but only the directory in our shared drive (i.e. our home directory). I think an easy fix is to just have an option to move this into a home directory.

EDIT: After some research, it seems like this has already been discussed: https://github.com/julia-vscode/julia-vscode/issues/2423

Even though I think that the two options mentioned by me are non-standard, I used them with success in various circumstances. As for the cloudflared tunnel, in general this is rather very secure … , however, I do agree with you that its always better to discuss it with the admins. Unfortunately, there is nothing more that currently comes to my mind - I do hope that you overcome the problems soon.

1 Like

In the following thread they discuss the use of vscode-server manually started from a compute node: Cluster workflow feature to allow shell commands or script to run before remote server setup (e.g. slurm) (wrap install script) · Issue #1722 · microsoft/vscode-remote-release · GitHub

I’m currently trying out a workflow with a SLURM based cluster, but have difficulties getting the Julia Extenstion to start.

The workflow includes the steps of: ssh into the HPC, srun to allocate a job using the SLURM job manager, (once: install the vs-code-server with the shell script), module load julia to make julia available and then start the vs code server code-server. This server can then be accessed through a browser or through a local VS Code environment using the Remote Tunnels extension.

Detailed steps in the documentation at Visual Studio Code Server

2 Likes