Should Julia be able to (really) kill a worker?

There is a Base.kill method that kills a Process. Distributed has a rmprocs method that tries to kill a worker but fails in many situations.

A rmprocs call is not able to kill the worker if it is running a Julia code that does not yield or that is running C code underneath. There is no way (AFAIK) of getting a Process object from an worker. There is a Base.Libc.getpid that can be run on the worker, but there is no way to get a Process (AFAIK) from it (so Base.kill cannot be called using it).

Underneath the Base.kill there is a ccall(:uv_process_kill, Int32, (Ptr{Cvoid}, Int32), p.handle, signum), that seems to be equivalent to ccall(:uv_kill, Cint, (Cint, Cint), worker_pid, signum) (where worker_pid is the result from calling Base.Libc.getpid inside the worker). The ccall(:uv_kill, ...) is already possible to do in Julia because libuv is used internally, but this is a implementation detail, so it should not be relied upon.

My question (finally) is: as there is machinery inside Julia to kill a process in a OS independent way, and it will probably stay (as Base.kill needs to be implemented some way) would it not be nice to extend it just a little to be able to kill workers from Distributed or just anything you are able to run Base.Libc.getpid inside?

Sorry if I am missing something obvious.

4 Likes