When can tasks yield? Or, how to use SpinLocks safely

The following causes a deadlock when running Julia with one thread:

l = Threads.SpinLock()

function f(l)
    lock(l)
    sleep(5)
    unlock(l)
end

Threads.@spawn f(l)
sleep(1)
lock(l)

I understand that this is because the thread switches tasks when sleep is called and tries to lock the SpinLock again.

Is there a list of operations that can cause a task to yield? For example, I know that sleep and i/o operations often cause yields, but in order to write any code that uses SpinLocks safely, I need to guarantee that the thread will not switch tasks between locking and unlocking.

Use case is here: https://github.com/JuliaPOMDP/MCTS.jl/pull/86

1 Like

I’d say avoid spinning as much as possible especially it’s a library function. I believe there are almost always better approaches for async-safe programming in Julia. It’s also tedious to use it correctly, as you’ve suggested in the OP; you have to look at all possible code paths and make sure that they never yield. The only option at the moment is to read the source code.

That said, if you want to discuss a spin-based approach, I think it is helpful to explain why you think you need/want to use spinlock.

If it is for performance, you shouldn’t use a simple spinlock. You should at least specify the duration of initial spinning and then fall back to the cooperative approach that ReentrantLock takes. It’s also important to benchmark this in various settings, preferably in a larger application combined with other components using Julia tasks. It’s very easy to optimize for spin-based solutions that may look good in microbenchmarks but it’s possible that spinning hurts performance when there are other tasks that want the CPU.

If it is for complex interop with external (non-Julia) library, it’s conceivable that there are cases this is unavoidable. But I’d imagine there is also a large class of programs that you still can avoid spinlocks by using some kind of lock-free algorithms. If you can drop Julia ≤ 1.6 support, have a look at https://github.com/JuliaConcurrent/ConcurrentCollections.jl

1 Like

@tkf Thanks for the info!

I realize that spinning is rarely the right approach. Our application is parallel Monte Carlo tree search (see link in OP). Since the nodes in the tree are only locked for a very short time, it might make sense. In one test we saw a 2x speedup using SpinLocks over ReentrantLocks. Each lock is only held for a very short amount of time while a node in the tree is modified. But, as you mentioned, it is complicated, and this is probably a pre-mature optimization.

you have to look at all possible code paths and make sure that they never yield. The only option at the moment is to read the source code.

In this case, what would I be looking for deep in the code here? Is it only yield(), sleep(), and I/O that can yield, or can memory allocations, for instance, yield?

Thanks also for your work on ConcurrentCollections!

2x speedup is not negligible so I totally understand that you’d want to grab that. I think we need more tooling around synchronizations.

I don’t think memory allocations yield. I believe functions that can yield calls Base.set_next_task(task) typically via wait() (no arguments) somewhere in its implementation. For example, sleep calls it via wait(::Threads.Condition) which in turn calls wait() internally.

1 Like

One thing worth considering for parallel MCTS specifically is that there are a few lockless approaches that may be worth exploring. Specifically, there are both atomic based lock free algorithms, and channel based algorithms that avoid locks by having 1 thread handle all the updates near the root of the tree and have workers do updates in their own individual parts of the tree.

2 Likes

Thanks @Oscar_Smith , we are aware of the lockless approaches (which are not guaranteed to update the nodes correctly, but work well in practice).

channel based algorithms that avoid locks by having 1 thread handle all the updates near the root of the tree and have workers do updates in their own individual parts of the tree.

We were considering something like this, but I am not familiar with any previous implementations. Do you know of a paper or implementation that uses this approach specifically?

1 Like