Cached data and thread safety

Pavel_Kalouguine · September 8, 2023, 1:28pm

Consider the situation where numerous lightweight objects depend all on a heavy-weight context (something akin to the Flyweight pattern in OOP). Here’s how it looks like in my projects:

The heavy-weight context:

mutable struct LiftedComplex{N,D}
    d::Vector{Matrix{Tuple{Int, SVector{N, Int}}}} # Some important data
    # More data...
    cache::Dict{Symbol, Any}
end

The lightweight object:

struct LiftedSimplex{Ds, N, D}
    lc::LiftedComplex{N, D} # The parent LiftedComplex
    idx::Int # The index of the simplex
    trans::SVector{N, Int} # The translation
end

Some functions involving the lightweight objects can be accelerated by precomputing more data in the heavyweight context. The dictionary cache is intended to store this redundant data. When such a function is called, the dictionary is checked for the corresponding key. If the key exists, the value data is used, otherwise it is first precomputed and stored in the dictionary.

My question concerns the thread safety. I reckon that as described, the code is not thread safe, and think about adding a lock to the heavy-weight context, to be locked before checking for the key and unlocked after the data is created or used. The question is: are there caveats in this approach?

vchuravy · September 8, 2023, 7:33pm

Yeah the cache insertion/lookup would need to be protected by a lock.

cstjean · September 8, 2023, 7:58pm

What happens if you don’t put a lock? I always figured that in the worst case, the function would be evaluated more than once, but can you get worse than that? Are concurrent dictionary writes going to segfault or something?

Pavel_Kalouguine · September 9, 2023, 10:17am

With an unprotected shared mutable state, anything can happen. Imagine the situation when the scheduler switches the threads at the precise moment when the dictionary is being rehashed. The code of Base.Dict contains plenty of blocks marked by @inbounds, hence any inconsistency of the data could cause a segfault in the julia process.

Pavel_Kalouguine · September 9, 2023, 10:37am

My question was about the best practices for the situations when the performance can be improved by caching some data. Using the dictionaries allows for a very flexible caching. For instance, for a new method requiring a new format of cached data, one does not have to modify the definition of the structure, a new key in the dictionary would suffice. But maybe there is a better (and yet a thread safe) way to achieve this goal.

Topic		Replies	Views
Can dicts be threadsafe? General Usage multithreading	17	6321	September 22, 2022
Thread-safe dict? General Usage	2	1172	April 20, 2024
Length of Dict is not thread safe General Usage multithreading	5	695	January 20, 2022
Dicts in different @async blocks New to Julia question	8	528	April 23, 2020
Thread safe cache General Usage	4	775	August 30, 2021

Cached data and thread safety

Related topics