Kill a (distributed) child process

Let’s say I’ve spawned a distributed process, and this process has made a particular ccall that, for whatever reason, refuses to die from a standard call to interrupt(). e.g. the following fails:

using Distributed

addprocs(1)

f = @spawnat 2 begin
    @ccall sleep(20::Int)::Int  # Pretend this is an uninterruptible C call
end

println("Waiting...")
interrupt(2)
fetch(f)

I don’t control the C code, and I’m not 100% sure why it won’t interrupt. However, it can be killed (i.e. kill -9) works just fine.

My question: what’s a nice way to send a SIGKILL signal to the child process?

To reply to myself, this does work - however I’m using non-public/non-stable method from within Distributed to get the ::Process of the worker:

w = Distributed.worker_from_id(2)
kill(w.config.process, Base.SIGKILL)

Is there a way to get the worker process object without raiding internal variables?

For reference, there’s a PR ([Distributed] `kill(::LocalManager, ...)` should actually call `kill()` by staticfloat · Pull Request #45801 · JuliaLang/julia · GitHub) for this issue that appears to have stalled. If it is eventually merged, the right way to do this would be via rmprocs(pid).

2 Likes