Cached data and thread safety

Consider the situation where numerous lightweight objects depend all on a heavy-weight context (something akin to the Flyweight pattern in OOP). Here’s how it looks like in my projects:

The heavy-weight context:

mutable struct LiftedComplex{N,D}
    d::Vector{Matrix{Tuple{Int, SVector{N, Int}}}} # Some important data
    # More data...
    cache::Dict{Symbol, Any}
end

The lightweight object:

struct LiftedSimplex{Ds, N, D}
    lc::LiftedComplex{N, D} # The parent LiftedComplex
    idx::Int # The index of the simplex
    trans::SVector{N, Int} # The translation
end

Some functions involving the lightweight objects can be accelerated by precomputing more data in the heavyweight context. The dictionary cache is intended to store this redundant data. When such a function is called, the dictionary is checked for the corresponding key. If the key exists, the value data is used, otherwise it is first precomputed and stored in the dictionary.

My question concerns the thread safety. I reckon that as described, the code is not thread safe, and think about adding a lock to the heavy-weight context, to be locked before checking for the key and unlocked after the data is created or used. The question is: are there caveats in this approach?

Yeah the cache insertion/lookup would need to be protected by a lock.

What happens if you don’t put a lock? I always figured that in the worst case, the function would be evaluated more than once, but can you get worse than that? Are concurrent dictionary writes going to segfault or something?

With an unprotected shared mutable state, anything can happen. Imagine the situation when the scheduler switches the threads at the precise moment when the dictionary is being rehashed. The code of Base.Dict contains plenty of blocks marked by @inbounds, hence any inconsistency of the data could cause a segfault in the julia process.

1 Like

My question was about the best practices for the situations when the performance can be improved by caching some data. Using the dictionaries allows for a very flexible caching. For instance, for a new method requiring a new format of cached data, one does not have to modify the definition of the structure, a new key in the dictionary would suffice. But maybe there is a better (and yet a thread safe) way to achieve this goal.