I have a Julia program that performs some long operation on a variable, and I hope that when the program is killed by a job management system such as SLURM, the program can save the current local state of the variable to disk. Here is a minimal (or somewhat trivial) demonstration of what I want to do:
function main(x::Int, step::Int)
@assert step > 0
for i in 1:step
x += 1
sleep(1)
end
return x
end
function cleanup()
println(x)
return nothing
end
atexit(cleanup)
x = 0
println("PID = ", getpid())
for step in [10, 12, 14, 16]
global x = main(x, step)
println("Current x = ", x)
end
In this example, the function main adds step to x. I hope that when the program is terminated by a SIGTERM, it can save the local value of x inside the function main. If I put the atexit outside main as shown in the example, I will get the following error when I kill the program in terminal:
[19672] signal 15: Terminated: 15
in expression starting at test_atexit.jl:18
kevent at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
Allocations: 10826481 (Pool: 10826120; Big: 361); GC: 9
22schedule: Task not runnable
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] schedule(t::Task, arg::Any; error::Bool)
@ Base ./task.jl:884
[3] schedule
@ ./task.jl:876 [inlined]
[4] uv_writecb_task(req::Ptr{Nothing}, status::Int32)
@ Base ./stream.jl:1200
[5] poptask(W::Base.IntrusiveLinkedListSynchronized{Task})
@ Base ./task.jl:1012
[6] wait()
@ Base ./task.jl:1021
[7] uv_write(s::Base.TTY, p::Ptr{UInt8}, n::UInt64)
@ Base ./stream.jl:1081
[8] unsafe_write(s::Base.TTY, p::Ptr{UInt8}, n::UInt64)
@ Base ./stream.jl:1154
[9] write
@ ./strings/io.jl:248 [inlined]
[10] show
@ ./show.jl:1247 [inlined]
[11] print(io::Base.TTY, x::Int64)
@ Base ./strings/io.jl:35
[12] print(::Base.TTY, ::Int64, ::String)
@ Base ./strings/io.jl:46
[13] println(io::Base.TTY, xs::Int64)
@ Base ./strings/io.jl:75
[14] println(xs::Int64)
@ Base ./coreio.jl:4
[15] cleanup()
@ Main ~/test_atexit.jl:11
[16] _atexit(exitcode::Int32)
@ Base ./initdefs.jl:459
I think this is because cleanup does not have access to the local x in main. However, if I put it inside main and register it there:
function main(x::Int, step::Int)
function cleanup()
println(x)
return nothing
end
atexit(cleanup)
@assert step > 0
for i in 1:step
x += 1
sleep(1)
end
return x
end
the for-loop in the Main module will register cleanup multiple times, and give the following output when not killed, which is not what I want:
PID = 19709
Current x = 10
Current x = 22
Current x = 36
Current x = 52
52
36
22
10
What I want is whenever the program is killed, it can print the current local value of x inside the main function. How can I achieve this? Thank you!