Reclaiming a worker on a long running process

pearcemc · January 19, 2017, 5:13pm

Is the following possible:

Set a long running Julia process going on a server.
Log off the server, move on with life.
SSH back in at a later date.
Fire up a Julia repl and add the long running process as a worker or similar. E.g. to extract ad hoc information from data structures in memory without restarting.

I’m not sure how to attack this use case. Is it possible? If so, which docs/source should I read?

adamslc · January 19, 2017, 5:39pm

I’m not sure about an entirely Julia based solution, but when I need to run long simulations on a cluster, I use the screen command. Maybe that would work for you as well?

ChrisRackauckas · January 19, 2017, 5:44pm

SSH + screen works for this. Or if it’s a cluster the job scheduler, but you usually can’t make this interactive. Personally, I just use VNC on my own lab computers since when you log back in you get the same screen. This solves “getting back to the same process”.

However, for intermediate modifications and saving… what types of problems do you plan on solving? If it’s differential equations, the problem with restarting right now is because JLD has problems with saving functions. Otherwise the DifferentialEquations.jl’s integrator interface with a callback for intermediate saves to JLD would handle this just fine.

If it’s for optimization, you’d need to find a way to save some of the intermediate data. I believe the iterator interfaces which are being worked on in Optim and JuliaML have a way of letting you save and modify state.

But for details in “extract ad hoc information from data structures in memory without restarting”, this is very highly problem dependent and we’d have to know what you’re doing.

pearcemc · January 19, 2017, 9:39pm

The screen solution looks useful and wasn’t something I was aware of - might give that a go for some of my usage.

What I was imagining was more like when you call addprocs(n). In that case some Julia processes get created and we take ownership of them. Is there some way to not kill them off when exiting the REPL, and instead be able to reclaim them later?

I’m largely doing MCMC, and save my results to disk at each iteration anyhow for memory reasons. But I’m aiming for a more generic solution.

Topic		Replies	Views
How to attach to an existing remote REPL? Performance distributed	23	3638	March 22, 2021
How to isolate user General Usage	4	209	April 9, 2023
Using remote workers on Linux from Mac OS master General Usage package , hpc , parallel , cluster	0	691	June 21, 2017
Keeping Julia alive while running a web server in the background General Usage	10	3903	December 30, 2019
[ANN] Higher productivity (fewer Julia restarts) with Revise.jl Community package , announcement , productivity , development	52	8323	April 30, 2018

Reclaiming a worker on a long running process

Related topics