Will Julia exits only after all threads get their tasks done?

From this link
Can the threads in the :interactive pool also undertake heavy computing tasks? - #9 by WalterMadelim


I find that this behavior is puzzling:
Define this file

s.jl

function bzwt(t)
    tdue = time_ns() + 1e9t
    while true
        try
            time_ns() < tdue || error()
        catch
            return
        end
    end
end;
Threads.@spawn(bzwt(3.9))

Then execute this file in the shell

❯ julia --threads=2,2 s.jl
~/somedir                        some@some
❯ julia --threads=2,1 s.jl
~/somedir                        some@some
❯ julia --threads=1,2 s.jl
~/somedir                     4s some@some
❯ julia --threads=1,1 s.jl
~/somedir                     4s some@some
❯ 

Notice that the previous two tests exit immediately, whereas the last two tests doesn’t exit until the bzwt(3.9) is done.

Hmm, there’s something slightly wrong here. I modified the spin wait:

function bzwt(t)
    @ccall printf("%s\n"::Cstring; string(Threads.threadpool())::Cstring)::Cint
    tdue = time_ns() + 1e9t
    while time_ns() < tdue
        GC.safepoint()
    end
end;
Threads.@spawn(bzwt(3.9))
@ccall printf("done\n"::Cstring)::Cint

and it sometimes throws

$ time julia --startup=no -t 2,1 s.jl
default
done

real    0m0,267s
user    0m1,240s
sys     0m0,102s
$ time julia --startup=no -t 2,1 s.jl
done
terminate called after throwing an instance of 'std::system_error'
  what():  Invalid argument

[18879] signal 6 (-6): Aborted
in expression starting at none:0

[18879] signal 11 (1): Segmentation fault
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
in expression starting at none:0
unknown function (ip: 0x7afde6cab05c) at /lib/x86_64-linux-gnu/libstdc++.so.6
unknown function (ip: 0x7afde6cc11e9) at /lib/x86_64-linux-gnu/libstdc++.so.6
_ZSt9terminatev at /lib/x86_64-linux-gnu/libstdc++.so.6 (unknown line)
__cxa_throw at /lib/x86_64-linux-gnu/libstdc++.so.6 (unknown line)
unknown function (ip: 0x7afde70b138d) at /lib/x86_64-linux-gnu/libc.so.6
_ZSt20__throw_system_errori at /lib/x86_64-linux-gnu/libstdc++.so.6 (unknown line)
lock at /usr/local/x86_64-linux-gnu/include/c++/9.1.0/mutex:110 [inlined]
lock at /usr/local/x86_64-linux-gnu/include/c++/9.1.0/bits/unique_lock.h:141 [inlined]
unique_lock at /usr/local/x86_64-linux-gnu/include/c++/9.1.0/bits/unique_lock.h:71 [inlined]
Lock at /cache/build/tester-amdci4-14/julialang/julia-release-1-dot-12/usr/include/llvm/ExecutionEngine/Orc/ThreadSafeModule.h:42 [inlined]
getLock at /cache/build/tester-amdci4-14/julialang/julia-release-1-dot-12/usr/include/llvm/ExecutionEngine/Orc/ThreadSafeModule.h:69
unknown function (ip: 0x7afde70b14c4) at /lib/x86_64-linux-gnu/libc.so.6
jl_compile_codeinst_now at /cache/build/tester-amdci4-14/julialang/julia-release-1-dot-12/src/jitlayers.cpp:623
jl_compile_codeinst_impl at /cache/build/tester-amdci4-14/julialang/julia-release-1-dot-12/src/jitlayers.cpp:824
unknown function (ip: 0x7afde70b24ac) at /lib/x86_64-linux-gnu/libc.so.6
jl_compile_method_internal at /cache/build/tester-amdci4-14/julialang/julia-release-1-dot-12/src/gf.c:3524
unknown function (ip: 0x7afde70b2a70) at /lib/x86_64-linux-gnu/libc.so.6
_jl_invoke at /cache/build/tester-amdci4-14/julialang/julia-release-1-dot-12/src/gf.c:4002 [inlined]
ijl_apply_generic at /cache/build/tester-amdci4-14/julialang/julia-release-1-dot-12/src/gf.c:4210
jl_apply at /cache/build/tester-amdci4-14/julialang/julia-release-1-dot-12/src/julia.h:2391 [inlined]
start_task at /cache/build/tester-amdci4-14/julialang/julia-release-1-dot-12/src/task.c:1249
Allocations: 1 (Pool: 1; Big: 0); GC: 0
Aborted (core dumped)

real    0m1,409s
user    0m1,254s
sys     0m0,128s

I think this is a consequence of the sad fact that there’s no reliable way to stop a task.

2 Likes

It appears that I cannot reproduce on my zsh, Linux.

How can it be? Is this the design of julia’s multithreading/Task module?

I’m not sure why it’s difficult to implement, probably a combination of how julia’s task scheduling and libUV’s thread handling interacts.

It’s sort of the same with e.g. linux threads. The linux kernel can’t just kill a thread/process, because it may require resource cleanup, and for that the thread’s memory mappings and other resources must be used. And those reside in the thread. So the kernel sends a SIGKILL signal, the thread’s signal handler, which runs in the thread’s context, does the cleanup, and stops the thread.

To implement something similar for julia tasks would require some kind of signaling, but one can’t use OS-signals directly because those are sent to a thread, not to a julia task. Also, julia does not have a separate “kernel” which does independent scheduling, rather the scheduling happens only when yield() is called, and it’s the yield() call which does the scheduling, i.e. finds a new task to run in the thread, and does the required context switching.

1 Like

You can achieve that by adding a lightweight wrapper around a Julia Task that supports signaling, for example, via a Channel.

I’ve built Visor.jl, a package inspired by this principle.

# s.jl
using Visor

function bzwt(pd, t)
    t0 = time_ns()
    while true
        delta = time_ns() - t0
        # isshutdown(pd) becomes true when Ctrl-C is pressed
        if isshutdown(pd) || delta > 1e9t
            println("delta: $delta")
            break
        end
    end
end;

supervise(process(bzwt; args=(10,)))

3 Likes