Readers–writer lock using DataFrame

JesperMartinsson · July 2, 2024, 9:16pm

Hi,

I have an application where one async task updates a DataFrame with new data every minute, while multiple tasks read the DataFrame at random times. I need to make this thread-safe, possibly by using a lock, to prevent reads during updates.

A ReentrantLock() could work, but it might block concurrent reads, which DataFrame can handle inherently. I’m considering using semaphores or a perhaps a suitable readers-writer lock to address this.

Any suggestions, pointers, or examples to achieve this without compromising the performance of concurrent reads?

era127 · July 2, 2024, 9:59pm

Here is an example of concurrent readers and writers with duckdb in case you want to persist the data as well.

ericphanson · July 2, 2024, 10:47pm

I needed a RW lock for AllocArrays.jl and settled on the one in ConcurrentUtilities.jl

JesperMartinsson · July 3, 2024, 8:10am

Thanks for the pointer. That looks very interesting.

JesperMartinsson · July 3, 2024, 8:14am

Thanks, that looks very interesting. I see that julia v1.11 may have some of these functionalities in base, but perhaps not the ReadWriteLock() part.

foobar_lv2 · July 3, 2024, 10:47am

Depending on your specific requirements, also consider copy-on-write.

The idea of copy-on-write is that on updates, you construct a new dataframe (reusing unchanged arrays of the old one), and you have some object like

mutable struct AtomicContainer
@atomic contents::Any
end

and after your new dataframe is constructed, you insert it into the AtomicContainer. Readers have a long-lived reference to the AtomicContainer, and can then read the contents and dispatch (function barrier to cure the type instability!) with the current immutable snapshot of the data.

The relevant considerations for this are:

If you have a reader who wants to do stuff and an update is underway, is it preferable to block until the update is done or is it preferable to do your stuff on a consistent-but-potentially-stale version?
What is your relation between readers and writers, in terms of volume?
Can you afford the additional GC pressure from copy-on-write? Especially, how real-time-ish are your readers?

The big problems with Reader-writer-locks are that:

Multiple concurrent readers don’t block each other. But the responsible cpu cores still need to play tug-of-war on which core owns the cacheline of the lock
If your update is big, then you either have a long critical section, i.e. long blocking of readers, or your readers can see inconsistent states (because you relinquish the lock in the middle). Whether this is a problem depends on your answer to (1).
There is a big question for your readers: Take the lock for a long time (long critical section) or take it often for short times. Taking it for a long time may block the writer, taking it often causes tug-of-war on the cache-line between multiple readers. Your critical section of course needs to be long enough to span the required consistency (i.e. the entire “transaction”).

PS. The above example uses Any as type for the contents. This is super defensive programming of me, because consider the following:

mutable AtomicContainerYolo{T}
@atomic contents::T 
end

can introduce a lock in AtomicContainerYolo((1,2,3)) in some julia versions. Because your hardware only supports 128 bit atomics, and julialang made the imo extremely ill-considered design decision to imitate C++ in implementing atomic variables that are hardware-impossible by hidden locks instead of boxing / copy-on-write. C++ has the excuse of “no GC in the runtime, cannot cow”, but julialang doesn’t. Making the contents field abstractly typed forces the compiler to box it, which is exactly what you want for anything larger than 128 bit.

PPS. I am not recommending to use persistent (aka functional, aka non-overwriting) datastructures. Instead, use that your problem is very specific: Your updates/writes come in large batches. So for every update/write you need to identify which parts of your dataframe are modified; and you might want to modify your data layout to minimize the modified parts.

quinnj · July 4, 2024, 4:49am

Yeah, we only added Lockable to Base, but not the ReadWriteLock.

Topic		Replies	Views
Trying to implement a Read-Write Lock, but get ConcurrencyViolationError New to Julia multithreading	1	599	August 19, 2022
Would somebody suggest what is the right to work in a parallel loop and write data to the same Dataframe? New to Julia	4	902	January 5, 2020
How to implement an efficient Readers-Writer lock? Performance multithreading	6	1593	November 17, 2021
How to make JuliaDB writes thread-safe New to Julia	5	482	August 14, 2020
Read/write locks and wait/wake on atomics: are there Julia futexes? General Usage parallel , operating-system , atomic	1	50	July 10, 2025

Readers–writer lock using DataFrame

Related topics