I have a Julia program that performs some long operation on a variable, and I hope that when the program is killed by a job management system such as SLURM, the program can save the current local state of the variable to disk. Here is a minimal (or somewhat trivial) demonstration of what I want to do:
function main(x::Int, step::Int)
@assert step > 0
for i in 1:step
x += 1
sleep(1)
end
return x
end
function cleanup()
println(x)
return nothing
end
atexit(cleanup)
x = 0
println("PID = ", getpid())
for step in [10, 12, 14, 16]
global x = main(x, step)
println("Current x = ", x)
end
In this example, the function main
adds step
to x
. I hope that when the program is terminated by a SIGTERM
, it can save the local value of x
inside the function main
. If I put the atexit
outside main
as shown in the example, I will get the following error when I kill
the program in terminal:
[19672] signal 15: Terminated: 15
in expression starting at test_atexit.jl:18
kevent at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
Allocations: 10826481 (Pool: 10826120; Big: 361); GC: 9
22schedule: Task not runnable
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] schedule(t::Task, arg::Any; error::Bool)
@ Base ./task.jl:884
[3] schedule
@ ./task.jl:876 [inlined]
[4] uv_writecb_task(req::Ptr{Nothing}, status::Int32)
@ Base ./stream.jl:1200
[5] poptask(W::Base.IntrusiveLinkedListSynchronized{Task})
@ Base ./task.jl:1012
[6] wait()
@ Base ./task.jl:1021
[7] uv_write(s::Base.TTY, p::Ptr{UInt8}, n::UInt64)
@ Base ./stream.jl:1081
[8] unsafe_write(s::Base.TTY, p::Ptr{UInt8}, n::UInt64)
@ Base ./stream.jl:1154
[9] write
@ ./strings/io.jl:248 [inlined]
[10] show
@ ./show.jl:1247 [inlined]
[11] print(io::Base.TTY, x::Int64)
@ Base ./strings/io.jl:35
[12] print(::Base.TTY, ::Int64, ::String)
@ Base ./strings/io.jl:46
[13] println(io::Base.TTY, xs::Int64)
@ Base ./strings/io.jl:75
[14] println(xs::Int64)
@ Base ./coreio.jl:4
[15] cleanup()
@ Main ~/test_atexit.jl:11
[16] _atexit(exitcode::Int32)
@ Base ./initdefs.jl:459
I think this is because cleanup
does not have access to the local x
in main
. However, if I put it inside main
and register it there:
function main(x::Int, step::Int)
function cleanup()
println(x)
return nothing
end
atexit(cleanup)
@assert step > 0
for i in 1:step
x += 1
sleep(1)
end
return x
end
the for-loop in the Main
module will register cleanup
multiple times, and give the following output when not kill
ed, which is not what I want:
PID = 19709
Current x = 10
Current x = 22
Current x = 36
Current x = 52
52
36
22
10
What I want is whenever the program is killed, it can print the current local value of x
inside the main
function. How can I achieve this? Thank you!