Struct containing a dictionary of cache buffers?

cocoa1231 · January 10, 2025, 2:09pm

I want to have a few buffers and/or dicts that serve as caches for various methods I’m implementing on my struct. This struct will basically be alive for the lifetime of the program as it’s a wrapper containing information about an MCMC simulation. I was wondering which of these two methods is more GC friendly?

Method 1: Dictionary of buffers

mutable struct Lattice{BuffType, DictType}
    # ...configuration and state fields...
    cache::Dict{Symbol, Union{BuffType, DictType}}
end

And then to get a specific buffer and/or dict. I can just index into it. The problem with this would be that BuffType and DictType need not be the same for the various caches I require. For instance, one cache has type Vector{Tuple{Int, Tuple{Float32, Float32}, Tuple{Float32, Float32}}} while another is just Vector{Float32}.

Method 2 - Dictionary of closures that capture buffers

mutable struct Lattice
    # ...configuration and state fields...

    cache::Dict{Symbol, Function}
end

L = Lattice(Dict{Symbol, Function}())

function complicatedcachegen(L::Lattice)
    # Computes size of cache using configuration in L
    cachesize = 10
    c = Vector{Tuple{ Int, Tuple{Float32, Float32}, Tuple{Float32, Float32} }}(undef, cachesize)

    # Capture `c` for future use.
    return () -> c
end

L.cache[:complicated] = complicatedcachegen(L)

# Can be used later like so
complicatedcache = L.cache[:complicated]()

This solves the problem of different types of caches while making sure that everything in the struct has a concrete type. My question would be - is this memory/GC friendly? Or is this a footgun for memory and/or performance? If I try to use this cache in a function, I get a few ::Any types in @code_warntype.

julia> function usecache(L::Lattice)
           c = L.cache[:complicated]()
           c[1] = (1, (2., 3.), (4., 5.))
           c[2] = (2, (3., 4.), (5., 6.))
           return c[1:2]
       end

julia> @code_warntype usecache(L)
MethodInstance for usecache(::Lattice)
  from usecache(L::Lattice) @ Main REPL[5]:1
Arguments
  #self#::Core.Const(Main.usecache)
  L::Lattice
Locals
  c::Any
Body::Any
1 ─ %1  = Base.getproperty(L, :cache)::Dict{Symbol, Function}
│   %2  = Base.getindex(%1, :complicated)::Function
│         (c = (%2)())
│   %4  = Core.tuple(2.0, 3.0)::Core.Const((2.0, 3.0))
│   %5  = Core.tuple(4.0, 5.0)::Core.Const((4.0, 5.0))
│   %6  = Core.tuple(1, %4, %5)::Core.Const((1, (2.0, 3.0), (4.0, 5.0)))
│   %7  = c::Any
│         Base.setindex!(%7, %6, 1)
│   %9  = Core.tuple(3.0, 4.0)::Core.Const((3.0, 4.0))
│   %10 = Core.tuple(5.0, 6.0)::Core.Const((5.0, 6.0))
│   %11 = Core.tuple(2, %9, %10)::Core.Const((2, (3.0, 4.0), (5.0, 6.0)))
│   %12 = c::Any
│         Base.setindex!(%12, %11, 2)
│   %14 = c::Any
│   %15 = Main.:(:)::Core.Const(Colon())
│   %16 = (%15)(1, 2)::Core.Const(1:2)
│   %17 = Base.getindex(%14, %16)::Any
└──       return %17

Is the compiler unable to determine the type of c even though it’s concretely defined? Should I be worried?

Edit: Here’s a simpler working example where the Body::Any situation comes up.

julia> function A()
          x = Vector{Tuple{Int, Tuple{Int, Int}}}(undef, 4)
          return () -> x
       end
A (generic function with 1 method)

julia> @code_warntype A()
MethodInstance for A()
  from A() @ Main REPL[15]:1
Arguments
  #self#::Core.Const(Main.A)
Locals
  #7::var"#7#8"{Vector{Tuple{Int64, Tuple{Int64, Int64}}}}
  x::Vector{Tuple{Int64, Tuple{Int64, Int64}}}
Body::var"#7#8"{Vector{Tuple{Int64, Tuple{Int64, Int64}}}}
1 ─ %1  = Main.Vector::Core.Const(Vector)
│   %2  = Main.Tuple::Core.Const(Tuple)
│   %3  = Main.Int::Core.Const(Int64)
│   %4  = Core.apply_type(Main.Tuple, Main.Int, Main.Int)::Core.Const(Tuple{Int64, Int64})
│   %5  = Core.apply_type(%2, %3, %4)::Core.Const(Tuple{Int64, Tuple{Int64, Int64}})
│   %6  = Core.apply_type(%1, %5)::Core.Const(Vector{Tuple{Int64, Tuple{Int64, Int64}}})
│   %7  = Main.undef::Core.Const(UndefInitializer())
│         (x = (%6)(%7, 4))
│   %9  = Main.:(var"#7#8")::Core.Const(var"#7#8")
│   %10 = x::Vector{Tuple{Int64, Tuple{Int64, Int64}}}
│   %11 = Core.typeof(%10)::Core.Const(Vector{Tuple{Int64, Tuple{Int64, Int64}}})
│   %12 = Core.apply_type(%9, %11)::Core.Const(var"#7#8"{Vector{Tuple{Int64, Tuple{Int64, Int64}}}})
│   %13 = x::Vector{Tuple{Int64, Tuple{Int64, Int64}}}
│         (#7 = %new(%12, %13))
│   %15 = #7::var"#7#8"{Vector{Tuple{Int64, Tuple{Int64, Int64}}}}
└──       return %15


julia> D = Dict{Symbol, Function}(:A => A())
Dict{Symbol, Function} with 1 entry:
  :A => #7

Using the cache outside a function vs inside, we have

julia> @code_warntype D[:A]()
MethodInstance for (::var"#7#8"{Vector{Tuple{Int64, Tuple{Int64, Int64}}}})()
  from (::var"#7#8")() @ Main REPL[15]:3
Arguments
  #self#::var"#7#8"{Vector{Tuple{Int64, Tuple{Int64, Int64}}}}
Body::Vector{Tuple{Int64, Tuple{Int64, Int64}}}
1 ─ %1 = Core.getfield(#self#, :x)::Vector{Tuple{Int64, Tuple{Int64, Int64}}}
└──      return %1


julia> @code_warntype ( () -> D[:A]()[1:2] )()
MethodInstance for (::var"#9#10")()
  from (::var"#9#10")() @ Main REPL[23]:1
Arguments
  #self#::Core.Const(var"#9#10"())
Body::Any
1 ─ %1 = Base.getindex(Main.D, :A)::Any
│   %2 = (%1)()::Any
│   %3 = (1:2)::Core.Const(1:2)
│   %4 = Base.getindex(%2, %3)::Any
└──      return %4


julia> @code_warntype ( (d) -> d[:A]()[1:2] )(D)
MethodInstance for (::var"#11#12")(::Dict{Symbol, Function})
  from (::var"#11#12")(d) @ Main REPL[24]:1
Arguments
  #self#::Core.Const(var"#11#12"())
  d::Dict{Symbol, Function}
Body::Any
1 ─ %1 = Base.getindex(d, :A)::Function
│   %2 = (%1)()::Any
│   %3 = (1:2)::Core.Const(1:2)
│   %4 = Base.getindex(%2, %3)::Any
└──      return %4

Any way to make the last two calls to D[:A]() type stable?

mikmoore · January 10, 2025, 2:52pm

First, I’ll remark that Function is an abstract type (i.e., definitely not concrete) so your closure attempt is not remotely stable. This explains why you see Any pop out the type-inferred code.

Second, there is no way that you can have dict[key] be type stable if dict contains elements of different types (note that different functions have different types). The best you can do is have a small Union of types that it will union-split for you. But if you have more than 4(ish) types, it won’t even do that.

Which is to say that you are left to things like function barrirers or index-site annotations. These aren’t the end of the world and, depending on the situation, may suffer only a negligible performance penalty.

Another option, if you insist on full stability, is to use a NamedTuple (or custom struct) instead of a Dict. A NamedTuple has keys just like a Dict but can encode every entry with its own type (and similar for the fields of a struct). Although if you have many fields it might get unwieldy and if you have very many its performance might suffer.

Topic		Replies	Views
Cached data and thread safety New to Julia multithreading	4	423	September 9, 2023
Typestable caching of multiple arrays with different eltype Performance diffeq	3	467	July 7, 2021
Clarification about memory management of immutable and mutable struct Internals & Design data_structures	12	5108	November 14, 2019
Performance issues when working with dict Performance dictionary	11	1685	November 16, 2022
Buffer to store data vectors General Usage	13	374	October 29, 2022

Struct containing a dictionary of cache buffers?

Method 1: Dictionary of buffers

Method 2 - Dictionary of closures that capture buffers

Related topics