Best practice approach for caching data in objects

Something that’s come up repeatedly for me this last week in different contexts (and also many times before) is how do you approach the case where you sometimes want to be able to store intermediary results?

For a clearly contrived MWE, let’s say we have a type wrapping an Array and in certain contexts we often evaluate the extrema. Let’s say we don’t have to worry about cache invalidation (e.g. either 1) the values of the array don’t change or 2) we can only change the value using a setter function that flushes the cache). In a real setting the function (here extrema) might be expensive.

I see a number of design patterns. The most obvious are to never cache or always cache (not what I’m after):

# 1. just the type
struct Mytype
   v::Vector
end
lims(m::Mytype) = extrema(m.v)

# 2. type and cache
struct Mytype{T}
   v::Vector{T}
   lims::Tuple(T, T)
   Mytype(a::AbstractVector{T}) where T = new{T}(a, extrema(a))
end
lims(m::Mytype) = m.lims

But what if I want to cache the object? I see two possibilities. Either I cache it the first time it is called:

# 3. Cache at first request
struct Mytype{T}
   v::Vector{T}
   lims::Ref{Union{Nothing, Tuple{T,T}}}
   Mytype(v::AbstractVector{T}) where T = new{T}(v, nothing)
end
function lims(m::Mytype)  #not `lims!` I think as the mutation doesn't change behaviour
    isnothing(m.lims[]) && (m.lims[] = extrema(m.v))
    m.lims[]
end

Or have a special type that one can invoke at will, giving full control to the user but at the cost of increased coding complexity for little gain:

# 4. Different types to signify cache

abstract type Myabstracttype end

struct Mytype <: Myabstracttype
   v::Vector
end

struct Mytype_cached{T} <: Myabstracttype
   v::Vector{T}
   lims::Tuple(T, T)
end

lims(m::Mytype) = extrema(m.v)
lims(m::Mytype_cached) = m.lims
cachelims_mytype(m::Mytype) = Mytype_cached{eltype(m.v)}(v, lims(v))

I can see pros and cons of all (as outlined above) and I do realize that coding style is a preference, but:

  1. Which of these design patterns are preferred?
  2. Is this even an ideomatic thing to do in Julia?
4 Likes

I would go with the second approach (separate cached type), simply because calculating the result type for the cached object (for use in the Ref) is, in the general case (eg something more complex than extrema) can be a very difficult problem, if one does not want to rely on inference (Base.return_types).

Ok great. In my specific cases I always know the type of the cached object though?

Then (1) I envy you :wink:, and (2) if the calculation itself is expensive and you really want to hide this from the user, the approach with Ref could be fine.

The Matrix package in R is an example of this kind of interface, caching various decompositions after first use.

That said, I always prefer to make precalculated values explicit somehow, and from a design perspective I would still prefer the second approach (trading off user convenience for a more transparent design).

2 Likes

This type of caching is also called lazy initialization. I would go with something similar to your approach 4, but extract the lazy logic to be fully independent/reusable; something like this perhaps:

mutable struct Lazy{T}
   f::Function
   value::Union{Nothing, T}
   calculated::Bool
   Lazy{T}(f) where T = new{T}(f, nothing, false)
end

function value(z::Lazy{T}) where T
    z.calculated || (z.value = z.f(); z.calculated = true)
    z.value::T
end

I used a calculated field instead of isnothing, in case T is Nothing (which could make sense if the lazy function doesn’t need a return value, e.g. if modifying some global state). Note that this particular implementation is not thread safe.

Now your example can be written like this:

struct MyType{T}
   v::Vector{T}
   lims::Lazy{Tuple{T,T}}
   MyType(v::AbstractVector{T}) where T = new{T}(v, Lazy{Tuple{T,T}}(() -> extrema(v)))
end

lims(m::MyType) = value(m.lims)

Testing it:

julia> m = MyType([11, 77, 58, 4, 25, 78, 63, 5, 97, 40]);

julia> lims(m)
(4, 97)

julia> @code_warntype lims(m)
Body::Tuple{Int64,Int64}
1 ─ %1 = (Base.getfield)(m, :lims)::Lazy{Tuple{Int64,Int64}}
│   %2 = invoke Main.value(%1::Lazy{Tuple{Int64,Int64}})::Tuple{Int64,Int64}
└──      return %2
5 Likes

That’s nice, I like the concept. I can also use this for another case I had where I take a lazy subset of an object. This is actually my approach #3, though, right, not #4?

1 Like

But what is best practice approach for caching data if the data should be updated?

What is about follow solution:

mutable struct Cache
    values_from_db::Vector{Int}
    Cache() = new(Vector{Int}())
end

struct MyData
    values_calculated::Vector{Int}
    _cache::Cache
    MyData() = new(Vector{Int}(), Cache())
end

function get(d::MyData, idx::Int)
    if idx > length(d.values_calculated)
        d.values_calculated = update(d)
    end
    d.values_calculated[idx]
end

function update(d::MyData)
    # update d._cache from database
    # update d.values_calculated according to new cached values
end

I think more common is to use

value::Union{Nothing, Some{T}}

and

z.value === nothing && z.value = Some(z.f())
1 Like