Best practice approach for caching data in objects

mkborregaard · February 4, 2019, 9:20am

Something that’s come up repeatedly for me this last week in different contexts (and also many times before) is how do you approach the case where you sometimes want to be able to store intermediary results?

For a clearly contrived MWE, let’s say we have a type wrapping an Array and in certain contexts we often evaluate the extrema. Let’s say we don’t have to worry about cache invalidation (e.g. either 1) the values of the array don’t change or 2) we can only change the value using a setter function that flushes the cache). In a real setting the function (here extrema) might be expensive.

I see a number of design patterns. The most obvious are to never cache or always cache (not what I’m after):

# 1. just the type
struct Mytype
   v::Vector
end
lims(m::Mytype) = extrema(m.v)

# 2. type and cache
struct Mytype{T}
   v::Vector{T}
   lims::Tuple(T, T)
   Mytype(a::AbstractVector{T}) where T = new{T}(a, extrema(a))
end
lims(m::Mytype) = m.lims

But what if I want to cache the object? I see two possibilities. Either I cache it the first time it is called:

# 3. Cache at first request
struct Mytype{T}
   v::Vector{T}
   lims::Ref{Union{Nothing, Tuple{T,T}}}
   Mytype(v::AbstractVector{T}) where T = new{T}(v, nothing)
end
function lims(m::Mytype)  #not `lims!` I think as the mutation doesn't change behaviour
    isnothing(m.lims[]) && (m.lims[] = extrema(m.v))
    m.lims[]
end

Or have a special type that one can invoke at will, giving full control to the user but at the cost of increased coding complexity for little gain:

# 4. Different types to signify cache

abstract type Myabstracttype end

struct Mytype <: Myabstracttype
   v::Vector
end

struct Mytype_cached{T} <: Myabstracttype
   v::Vector{T}
   lims::Tuple(T, T)
end

lims(m::Mytype) = extrema(m.v)
lims(m::Mytype_cached) = m.lims
cachelims_mytype(m::Mytype) = Mytype_cached{eltype(m.v)}(v, lims(v))

I can see pros and cons of all (as outlined above) and I do realize that coding style is a preference, but:

Which of these design patterns are preferred?
Is this even an ideomatic thing to do in Julia?

Tamas_Papp · February 4, 2019, 9:25am

I would go with the second approach (separate cached type), simply because calculating the result type for the cached object (for use in the Ref) is, in the general case (eg something more complex than extrema) can be a very difficult problem, if one does not want to rely on inference (Base.return_types).

mkborregaard · February 4, 2019, 9:43am

Ok great. In my specific cases I always know the type of the cached object though?

Tamas_Papp · February 4, 2019, 9:49am

Then (1) I envy you , and (2) if the calculation itself is expensive and you really want to hide this from the user, the approach with Ref could be fine.

The Matrix package in R is an example of this kind of interface, caching various decompositions after first use.

That said, I always prefer to make precalculated values explicit somehow, and from a design perspective I would still prefer the second approach (trading off user convenience for a more transparent design).

bennedich · February 4, 2019, 10:55pm

This type of caching is also called lazy initialization. I would go with something similar to your approach 4, but extract the lazy logic to be fully independent/reusable; something like this perhaps:

mutable struct Lazy{T}
   f::Function
   value::Union{Nothing, T}
   calculated::Bool
   Lazy{T}(f) where T = new{T}(f, nothing, false)
end

function value(z::Lazy{T}) where T
    z.calculated || (z.value = z.f(); z.calculated = true)
    z.value::T
end

I used a calculated field instead of isnothing, in case T is Nothing (which could make sense if the lazy function doesn’t need a return value, e.g. if modifying some global state). Note that this particular implementation is not thread safe.

Now your example can be written like this:

struct MyType{T}
   v::Vector{T}
   lims::Lazy{Tuple{T,T}}
   MyType(v::AbstractVector{T}) where T = new{T}(v, Lazy{Tuple{T,T}}(() -> extrema(v)))
end

lims(m::MyType) = value(m.lims)

Testing it:

julia> m = MyType([11, 77, 58, 4, 25, 78, 63, 5, 97, 40]);

julia> lims(m)
(4, 97)

julia> @code_warntype lims(m)
Body::Tuple{Int64,Int64}
1 ─ %1 = (Base.getfield)(m, :lims)::Lazy{Tuple{Int64,Int64}}
│   %2 = invoke Main.value(%1::Lazy{Tuple{Int64,Int64}})::Tuple{Int64,Int64}
└──      return %2

mkborregaard · February 4, 2019, 11:14pm

That’s nice, I like the concept. I can also use this for another case I had where I take a lazy subset of an object. This is actually my approach #3, though, right, not #4?

Ihar_Kudraucau · September 9, 2019, 9:18am

But what is best practice approach for caching data if the data should be updated?

What is about follow solution:

mutable struct Cache
    values_from_db::Vector{Int}
    Cache() = new(Vector{Int}())
end

struct MyData
    values_calculated::Vector{Int}
    _cache::Cache
    MyData() = new(Vector{Int}(), Cache())
end

function get(d::MyData, idx::Int)
    if idx > length(d.values_calculated)
        d.values_calculated = update(d)
    end
    d.values_calculated[idx]
end

function update(d::MyData)
    # update d._cache from database
    # update d.values_calculated according to new cached values
end

kristoffer.carlsson · September 9, 2019, 9:45am

I think more common is to use

value::Union{Nothing, Some{T}}

and

z.value === nothing && z.value = Some(z.f())

Topic		Replies	Views
Caching the results of a factory method New to Julia	4	673	March 29, 2019
Design patterns for generic code with reusable caches for intermediate values General Usage design-pattern	3	534	March 29, 2020
Typestable caching of multiple arrays with different eltype Performance diffeq	3	467	July 7, 2021
Recommended way to cache results of expensive functions? General Usage performance , design , optimization	6	4635	February 14, 2021
Type stable generic memoisation Performance memoisation	23	1298	February 22, 2024

Best practice approach for caching data in objects

Related topics