How to create a parallel task?

I wanted to create a task that runs in parallel with the rest of the code. Seems pretty straightforward, but I’m running into issues I don’t understand. For some reason the sleep in the following code gives an exception, which I then cannot print. Additionally, the code behaves quite differently inside the REPL and non-interactively. Seems like a scheduling issue…

function memusage()
    l = ReentrantLock()
    peak_usage = 0 
    done = Threads.Atomic{Bool}(false)

    t = Threads.@spawn begin
        try
        while !done[]
            try 
                lock(l)
                peak_usage = max(peak_usage, get_memusage())
 #                ccall(:printf, Int, (Ptr{UInt8},Int64), "%lu\n", peak_usage)
            finally
                unlock(l)
            end

 #            sleep(0.01) # does not work, scheduling issue?
            ccall(:usleep, Int, (Int32,), 1000) # works but ugly and stalls thread
        end 
        catch e # exception from the sleep
          ccall(:printf, Int, (Ptr{UInt8},), "error\n")
 #          println(e) # does not work
        end 
    end

    function query_memusage(stop::Bool=false)
        rv = try 
            lock(l)
            rv = peak_usage
            peak_usage = get_memusage()

            if stop
              done[] = true
            end 

            rv  
        finally
            unlock(l)
        end
        rv  
    end
end

Any help would be useful. Thanks

Can you also provide the definition of get_memusage()? Otherwise it’s not clear where the problem would be.

Of course, this one doesn’t seem to contribute to the problem. It’s mostly sleep throwing some unknown exception. BTW: printf also fails to print the exception type. I guess the interpolation fails.

function get_memusage(pid=getpid())
    open("/proc/$pid/statm") do io
        line = readline(io)
        parts = split(line)
        vmrss = parse(Int64, parts[2])
        4096 * vmrss
    end 
end

If I define get_memusage at the top, this works perfectly fine for me with sleep and println with any number of threads. How are you running this, and on what Julia version? I’m on the 1.6 release branch.

hmm, can you uncomment the ccall to printf to see if the thread didn’t quietly die as it does for me?

I’m running 1.6.3 and 1.6.1

Yeah, they work fine for me, including the call to usleep. What do you see that makes you think sleep is throwing an error?

If you keep the reference to the task t somewhere (e.g., save it to a global variable global TASK = t), you can later display it on the REPL (or call display(TASK) when not using REPL). If it had an error, it should be possible to see the stacktrace. Sharing it can help people here debug it. Demo:

julia> TASK = Threads.@spawn error("hello");

julia> # Then later see the stacktrace:

julia> TASK
Task (failed) @0x00007f66678e9e40
hello
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] (::var"#53#54")()
   @ Main ./threadingconstructs.jl:169

Also, let me note that calling usleep brings down an entire worker thread in the Julia runtime. It can’t execute other Julia Tasks anymore.

REPL creates an additional task so something like this can happen sometimes.

I should clarify: with usleep it “works” (but rather badly) and with sleep I sometimes get errors.

Looking further into this, it looks like the main task doing actual work (reading/computing) is blocking the task (in jl_task_get_next) from progressing. It gets stuck in sleep for hundreds of seconds, as long as it takes for the main task to do some IO.

gdb backtrace of the worker thread:

futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7efe534970c0) at ../sysdeps/nptl/futex-internal.h:183
__pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7efe53497070, cond=0x7efe53497098) at pthread_cond_wait.c:508
__pthread_cond_wait (cond=0x7efe53497098, mutex=0x7efe53497070) at pthread_cond_wait.c:638
0x00007efe674330d8 in uv_cond_wait (cond=0x7efe53497098, mutex=0x7efe53497070) at src/unix/thread.c:847
0x00007efe67348fac in jl_task_get_next (trypoptask=0x7efe5a985d90 <jl_system_image_data+15342288>, q=0x7efe549a2970) at /buildworker/worker/package_linux64/build/src/partr.c:508
0x00007efe590333d2 in julia_poptask_29151 () at task.jl:760
0x00007efe594b19c2 in wait () at task.jl:768
julia_wait_33659 () at condition.jl:106
0x00007efe58eeb14c in japi1__trywait_30682 () at asyncevent.jl:111
0x00007efe58eeb45a in wait () at asyncevent.jl:129
julia_sleep_29475 () at asyncevent.jl:214

No idea if it will solve the sleep problem but you have the lock inside the try

https://docs.julialang.org/en/v1/base/parallel/#Base.ReentrantLock

but beware of inverting the try/lock order or missing the try block entirely (e.g. attempting to return with the lock still held):

lock(l)
try
    <atomic work>
finally
    unlock(l)
end
1 Like