I wanted to create a task that runs in parallel with the rest of the code. Seems pretty straightforward, but I’m running into issues I don’t understand. For some reason the sleep in the following code gives an exception, which I then cannot print. Additionally, the code behaves quite differently inside the REPL and non-interactively. Seems like a scheduling issue…
function memusage()
l = ReentrantLock()
peak_usage = 0
done = Threads.Atomic{Bool}(false)
t = Threads.@spawn begin
try
while !done[]
try
lock(l)
peak_usage = max(peak_usage, get_memusage())
# ccall(:printf, Int, (Ptr{UInt8},Int64), "%lu\n", peak_usage)
finally
unlock(l)
end
# sleep(0.01) # does not work, scheduling issue?
ccall(:usleep, Int, (Int32,), 1000) # works but ugly and stalls thread
end
catch e # exception from the sleep
ccall(:printf, Int, (Ptr{UInt8},), "error\n")
# println(e) # does not work
end
end
function query_memusage(stop::Bool=false)
rv = try
lock(l)
rv = peak_usage
peak_usage = get_memusage()
if stop
done[] = true
end
rv
finally
unlock(l)
end
rv
end
end
Of course, this one doesn’t seem to contribute to the problem. It’s mostly sleep throwing some unknown exception. BTW: printf also fails to print the exception type. I guess the interpolation fails.
function get_memusage(pid=getpid())
open("/proc/$pid/statm") do io
line = readline(io)
parts = split(line)
vmrss = parse(Int64, parts[2])
4096 * vmrss
end
end
If I define get_memusage at the top, this works perfectly fine for me with sleep and println with any number of threads. How are you running this, and on what Julia version? I’m on the 1.6 release branch.
If you keep the reference to the task t somewhere (e.g., save it to a global variable global TASK = t), you can later display it on the REPL (or call display(TASK) when not using REPL). If it had an error, it should be possible to see the stacktrace. Sharing it can help people here debug it. Demo:
julia> TASK = Threads.@spawn error("hello");
julia> # Then later see the stacktrace:
julia> TASK
Task (failed) @0x00007f66678e9e40
hello
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] (::var"#53#54")()
@ Main ./threadingconstructs.jl:169
Also, let me note that calling usleep brings down an entire worker thread in the Julia runtime. It can’t execute other Julia Tasks anymore.
REPL creates an additional task so something like this can happen sometimes.
I should clarify: with usleep it “works” (but rather badly) and with sleep I sometimes get errors.
Looking further into this, it looks like the main task doing actual work (reading/computing) is blocking the task (in jl_task_get_next) from progressing. It gets stuck in sleep for hundreds of seconds, as long as it takes for the main task to do some IO.
gdb backtrace of the worker thread:
futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7efe534970c0) at ../sysdeps/nptl/futex-internal.h:183
__pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7efe53497070, cond=0x7efe53497098) at pthread_cond_wait.c:508
__pthread_cond_wait (cond=0x7efe53497098, mutex=0x7efe53497070) at pthread_cond_wait.c:638
0x00007efe674330d8 in uv_cond_wait (cond=0x7efe53497098, mutex=0x7efe53497070) at src/unix/thread.c:847
0x00007efe67348fac in jl_task_get_next (trypoptask=0x7efe5a985d90 <jl_system_image_data+15342288>, q=0x7efe549a2970) at /buildworker/worker/package_linux64/build/src/partr.c:508
0x00007efe590333d2 in julia_poptask_29151 () at task.jl:760
0x00007efe594b19c2 in wait () at task.jl:768
julia_wait_33659 () at condition.jl:106
0x00007efe58eeb14c in japi1__trywait_30682 () at asyncevent.jl:111
0x00007efe58eeb45a in wait () at asyncevent.jl:129
julia_sleep_29475 () at asyncevent.jl:214