Looking for the correct way to do a clean julia process shutdown

I’m not able to capture correctly the SIGINT signal and invoke a controlled shutdown.

It seems that atexit should be the answer, but in my (semplified) minimal example, with FileWatching:

using FileWatching

function watch(dir::String)
    try
        while true
            filename, event = watch_folder(dir, -1.0)
            @info "[$filename] event: $event"
        end
    catch e
        @warn "error watchdir: $e"
    end
end

function shutdown(dir::String)
    @info "stop watching $dir"
    unwatch_folder(dir)
end

dir = "/tmp"

shtdown() = shutdown(dir)
atexit(shtdown)

watch(dir)

I expect that before exiting the process will invoke shutdown, unwatch the folder and exit.

Instead this is what happens with Ctrl-C:

signal (2): Interrupt
in expression starting at /home/adona/dev/SINT/sint/sintjl/Sint/mve.jl:25
epoll_pwait at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
uv__io_poll at /workspace/srcdir/libuv/src/unix/linux-core.c:270
uv_run at /workspace/srcdir/libuv/src/unix/core.c:359
jl_task_get_next at /buildworker/worker/package_linux64/build/src/partr.c:473
poptask at ./task.jl:704
wait at ./task.jl:712 [inlined]
wait at ./condition.jl:106
take_buffered at ./channels.jl:387
take! at ./channels.jl:381 [inlined]
wait at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/FileWatching/src/FileWatching.jl:620
watch_folder at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/FileWatching/src/FileWatching.jl:747
watch at /home/adona/dev/SINT/sint/sintjl/Sint/mve.jl:6
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2231 [inlined]
...
Allocations: 2544 (Pool: 2533; Big: 11); GC: 0
[ Info: stop watching /tmp
β”Œ Error: Exception while generating log record in module Main at /home/adona/dev/SINT/sint/sintjl/Sint/mve.jl:15
β”‚   exception =
β”‚    schedule: Task not runnable
β”‚    Stacktrace:
β”‚     [1] error(::String) at ./error.jl:33
β”‚     [2] schedule(::Task, ::Any; error::Bool) at ./task.jl:591
β”‚     [3] schedule at ./task.jl:586 [inlined]
β”‚     [4] uv_writecb_task(::Ptr{Nothing}, ::Int32) at ./stream.jl:1051
β”‚     [5] poptask(::Base.InvasiveLinkedListSynchronized{Task}) at ./task.jl:704
β”‚     [6] wait at ./task.jl:712 [inlined]
β”‚     [7] uv_write(::Base.TTY, ::Ptr{UInt8}, ::UInt64) at ./stream.jl:933
β”‚     [8] unsafe_write(::Base.TTY, ::Ptr{UInt8}, ::UInt64) at ./stream.jl:1005
β”‚     [9] unsafe_write at ./io.jl:622 [inlined]
β”‚     [10] write(::Base.TTY, ::Array{UInt8,1}) at ./io.jl:645
β”‚     [11] handle_message(::Logging.ConsoleLogger, ::Base.CoreLogging.LogLevel, ::String, ::Module, ::Symbol, ::Symbol, ::String, ::Int64; maxlog::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Logging/src/ConsoleLogger.jl:161
β”‚     [12] handle_message at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Logging/src/ConsoleLogger.jl:100 [inlined]
β”‚     [13] macro expansion at ./logging.jl:332 [inlined]
β”‚     [14] shutdown(::String) at /home/adona/dev/SINT/sint/sintjl/Sint/mve.jl:15
β”‚     [15] shtdown() at /home/adona/dev/SINT/sint/sintjl/Sint/mve.jl:21
β”‚     [16] _atexit() at ./initdefs.jl:316
β”” @ Main ~/dev/SINT/sint/sintjl/Sint/mve.jl:15
a

FileWatching is just my first problem, after that will be db connections and sockets to close …

How to manage correctly an ordered shutdown in this scenario?

1 Like

The above example works making SIGINT capturable using Base.exit_on_sigint:

Base.exit_on_sigint(false)

But a more convoluted example, similar to my real use case, still raises an uncatched exception:

using FileWatching

Base.exit_on_sigint(false)

function cb(args)
   @info "hello from timer cb"
end

function callme()
    Timer(cb, 3)
end

function watch(dir::String, callback::Function)
    try
        while true
            @info "again ..."
            #sleep(3)
            filename, event = watch_folder(dir, -1.0)
            @info "[$filename] event: $event"
            if filename !== ""
                callback()
            end
        end
    catch e
        @warn "error watchdir $e"
    end
end

dir = "/tmp"

# Timer(cb, 3)
watch(dir, callme)

After a file event is captured and the timer callback runs A Ctrl-C gives:

^Cfatal: error thrown and no exception handler available.
InterruptException()
jl_mutex_unlock at /buildworker/worker/package_linux64/build/src/locks.h:143    [inlined]
jl_task_get_next at /buildworker/worker/package_linux64/build/src/partr.c:476
1 Like

I have the same problem here… I also use the Timer, that I think could be the cause of the problem. I’m quite new to the Julia language … This is my the minimal example:

Base.exit_on_sigint(false)

loop = true

atexit() do
    @info "cleaning before exit..."
    loop = false
end

function wkcb(args)
   @info "Wk CB"
end

function worker()
    try
        @info "Worker"
        t = Timer(wkcb, 2)
        wait(t)
        sleep(0.5)
        close(t)
        @info "Worker end"
    catch e
        @warn "worker error: $e"
    end
end

function run()
    try
        while loop
            @info "next start"
            @async worker()
            sleep(10)
        end
    catch e
        @warn "run error: $e"
    end
end

run()

When I press Ctr-C I get:

❯ julia mydem.jl
[ Info: next start
[ Info: Worker
[ Info: Wk CB
[ Info: Worker end
^Cfatal: error thrown and no exception handler available.
InterruptException()
jl_mutex_unlock at /buildworker/worker/package_linux64/build/src/locks.h:143 [inlined]
jl_task_get_next at /buildworker/worker/package_linux64/build/src/partr.c:476
poptask at ./task.jl:704
wait at ./task.jl:712 [inlined]
task_done_hook at ./task.jl:442
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1690 [inlined]
jl_finish_task at /buildworker/worker/package_linux64/build/src/task.c:198
start_task at /buildworker/worker/package_linux64/build/src/task.c:717
unknown function (ip: (nil))
[ Info: cleaning before exit...
β”Œ Error: Exception while generating log record in module Main at /home/claudio/SD/jl/mydem.jl:7
β”‚   exception =
β”‚    schedule: Task not runnable
β”‚    Stacktrace:
β”‚     [1] error(::String) at ./error.jl:33
β”‚     [2] schedule(::Task, ::Any; error::Bool) at ./task.jl:586
β”‚     [3] schedule at ./task.jl:586 [inlined]
β”‚     [4] uv_writecb_task(::Ptr{Nothing}, ::Int32) at ./stream.jl:1051
β”‚     [5] poptask(::Base.InvasiveLinkedListSynchronized{Task}) at ./task.jl:704
β”‚     [6] wait at ./task.jl:712 [inlined]
β”‚     [7] uv_write(::Base.TTY, ::Ptr{UInt8}, ::UInt64) at ./stream.jl:933
β”‚     [8] unsafe_write(::Base.TTY, ::Ptr{UInt8}, ::UInt64) at ./stream.jl:1005
β”‚     [9] unsafe_write at ./io.jl:622 [inlined]
β”‚     [10] write(::Base.TTY, ::Array{UInt8,1}) at ./io.jl:645
β”‚     [11] handle_message(::Logging.ConsoleLogger, ::Base.CoreLogging.LogLevel, ::String, ::Module, ::Symbol, ::Symbol, ::String, ::Int64; maxlog::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /bu
ildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Logging/src/ConsoleLogger.jl:161
β”‚     [12] handle_message at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Logging/src/ConsoleLogger.jl:100 [inlined]
β”‚     [13] macro expansion at ./logging.jl:332 [inlined]
β”‚     [14] (::var"#1#2")() at /home/claudio/SD/jl/mydem.jl:7
β”‚     [15] _atexit() at ./initdefs.jl:316
β”” @ Main ~/SD/jl/mydem.jl:7
Exception handling log message: ⏎

Any suggestion? Thanks.

1 Like

Below my findings about the reason of the fatal error thrown and no exception handler available exception when capturing SIGINT signal.

The problem is not specific to Timer objects but to Julia task system design and the handling of kernel signals.

As the docs says it is possible to capture a SIGINT signal as an InterruptableException using:

Base.exit_on_sigint(false)

But what it is not clearly documented is that the InterruptableException is captured by only one task: the currently task marked as β€œactive” by the internal task scheduler.

The problem arises when the active task is a task that has finished executing: the runtime deliver the SIGINT to a terminated task and the fatal error thrown and no exception handler available shows up and terminate abnormally the process.

This is the simplest example that exposes this behavoir:

Base.exit_on_sigint(false)

function one_shot_task()
    println("doing something and terminate task")
end

try
    while true
        @async one_shot_task()
        sleep(2)
    end
catch e
      @info "captured $e"
end

A Timer is just a task and in the case the timer callback it is the last task run by the scheduler it is not possible to capture the SIGINT.

Now that the cause of the problem is discovered a solution for capturing SIGINT in all circumstancies may be worked out using an appropriate design pattern.

One such design pattern may be to put messages on a Channel before exiting one-shot task to force the scheduler to consider as active a non terminated task.

3 Likes

I have the same issues. Thanks to the previous post from @attdona I understand that it is because my application is running more tasks which prevents closing the application properly when SIGINT is received. Are there any solutions/workarounds how to cope with this?

Someone asked me this question offline and directed my attention here, so I thought I would post a reply here too. This exact problem came up in Julia’s base test/runtests.jl driver script. I’ve extracted the relevant code, and modified it some for clarity. Modify this to meet your needs, if this is applicable to you.

The key pieces are:
raw! – tells the OS to send the ^C keypresses synchronously instead of interrupting the currently running task with a signal
user_close_function – whatever you want it to be

using REPL
...
monitor = @async begin
    term = REPL.Terminals.TTYTerminal("xterm", stdin, stdout, stderr)
    try
        REPL.Terminals.raw!(term, true)
        while !eof(term)
            c = read(term, Char)
            if c == '\x3' # ^C
                print("^C\n")
                break
            elseif c == '\x4' # ^D
                break
            else
                Base.escape_string(stdout, string(c))
            end
        end
    finally
        REPL.Terminals.raw!(term, false)
    end
    user_close_function()
end
...
wait(monitor)
2 Likes