Multithreaded program hangs without explict GC.gc()

danielwe · July 20, 2023, 12:22am

Coming in late here, but I think this is a textbook case of what the docs for GC.safepoint are talking about. The solution is to insert a safepoint within the loop in f_wait (not within test), as follows:

using Base.Threads
using ProgressBars

function f_wait(a, b)
    while !a[]
        GC.safepoint()
    end
    return a[] && b[]
end;

function test(n, x = Atomic{Bool}(false), y = Atomic{Bool}(false))
    for i in ProgressBar(1:n)
        x[] = y[] = false
        t_wx = @spawn f_wait(x, y);
        t_wy = @spawn f_wait(y, x);
        x[] = y[] = true

        wait.([t_wx, t_wy])
    end
    return true
end

test(100000)

In more detail: The problem is that f_wait has a potentially infinite loop with no allocations, IO, or task switches, hence no implicit GC safepoints, thus blocking GC for a potentially infinite time. Meanwhile, your test function performs an allocation when it creates the t_wy task, and this happens after t_wx has been scheduled, but before its termination condition x[] is set to true. Whenever this particular allocation triggers a GC run, you have a deadlock—the main thread is waiting for every other thread to reach a safepoint so the GC can do its sweep, while t_wx is waiting for the main thread to set x[] to true, never encountering a safepoint during the wait. The solution is to introduce a safepoint explicitly as shown.