Approach to avoid memory allocation with struct with abstract type fields

I am working on some molecular simulation code, and am hitting a road block on figuring on a good design structure to avoid memory allocation for a container that will store different concrete subtypes of a single abstract supertype. This is related to evaluation of potential energy functions within the code base.

For instance, when evaluating the energy for various molecular interactions, the type structure I have implemented looks something like:

abstract type PairPotential end

struct LJPair <: PairPotential end
struct MiePair <: PairPotential end

And then I would like to store these concrete structs in a single place to be used for potential evaluation later on. My approach would look something like this, where a Dict with integer keys holds vectors of PairPotential for given particle types:

struct InteractionInfo
   pairs :: Dict{Int, Vector{PairPotential}}
end

However, as is well known, if I were to extract the vector of PairPotentials for a given key and call some methods on those concrete structs (e.g. LJPair, MiePair, etc…), I run into memory allocation issues since the field is a vector of abstract types.

My question is, is there a straightforward design approach in Julia to avoid this type of issue? I would have a difficult time templating the InteractionInfo struct on the various potentials since a variable number may be present for any given run. From my somewhat limited C++ experience, it seems like a solution to this issue would be to store a vector of pointers to classes that all inherit from a similar base-class. Is this type of setup possible within Julia to avoid the extra memory allocations?

Thanks!

1 Like

A small Union, eg

Dict{Int, Union{LJPair,MiePair}}

may be better than an abstract type. Try and benchmark.

This might be one of the (rare) cases, where a traditional OOP language with vtable based single dispatch is is better suited then julia. If you need only a single method of LJPair... in your hot loop, you can store it in the dict instead using FunctionalWrappers.jl. This is essentially the store pointers to methods approach you sketched in your question.
See also Helping julia with runtime dispatch/arrowtypes.

This may work for the case where only two types can appear in the table. But would not be extendable to an arbitrary number of subtypes of PairPotential in the vector.

AFAIK small union optimizations work for more than 2 types.

I am not sure what “arbitrary” is in this context, for lots of different types I guess you would just have to live with dynamic dispatch and trust that a function barrier will regain some of the performance loss.

You might also find https://github.com/tkoolen/TypeSortedCollections.jl useful if the total number of distinct types is relatively small.

3 Likes

+1 to the type-sorted approach.

Dynamic dispatch is not very cheap (and has nontrivial cost even with vtable), so one way is to hoist the dispatch out of loops, making it static. That could work via typesortedcollections, or alternatively Dict{Int, Tuple{Vector{LJPair}, Vector{MiePair}}}, or alternatively Tuple{Dict{Int, Vector{LJPair}}, Dict{Int, Vector{MiePair}}}.

A very situational approach I made ok experiences with is the following: Have Dict{Int, Vector{UInt32}} (choose integer size depending on alignment of your real data), and just serialize and parse my data into this vector. In my application, the true type would have been Dict{Int, Vector{NTuple{N, Int32} where N}}, i.e. inhomogeneous in the tuple length. Since I did not need the full 32 bits everywhere, there was the possibility of stealing some leading bits to signify the lengths of tuples, and also compress the serialized storage.

While I agree with the spirit of your approach and had some similar experience when optimizing code, I would be cautious here since we know very little about the context:

  1. @sfriedowitz mentioned an “arbitrary number of subtypes”, while the actual number (or at least the ballpark) would heavily influence the design of the code
  2. we don’t know if the kernel functions called after the dispatch are trivial or computationally intensive (if yes, they will dominate dispatch so the whole issue may not be important).

I know MWEs are difficult in these contexts, but one would be useful before micro-optimizing this.

2 Likes

While I previously said “arbitrary”, I came to realize this is in the range of 4-5 different types at most, in most cases. So I’ve opted to use a Union type for the different potentials to try to mitigate dynamic dispatch.

It seems like this is working very well, actually. The difference in implementation is

struct InteractionTable
   pairs :: Dict{NTuple{2,Int}, PairInteraction}
end

versus

struct InteractionTable
   pairs :: Dict{NTuple{2,Int}, PairUnion}
end

where const PairUnion = Union{PairInt1, PairInt2, ....} and so forth.

Here is a crude, toy example of what the main loop looks like, with irrelevant pieces cut out. The InteractionTable is queried by a tuple key at each iteration to pull out various concrete PairInteraction substypes stored in the dictionary.

en = 0.0
for other in # Iterator over particles here #
    atypes = normalize_group( (atom.tid, other.tid) )
    int = pair_interaction(itx, atypes) # Query the InteractionTable itx for the PairInteraction
    # Call energy method on `int` here
    en += energy(int, atom, other)
end

The benchmarks when using the abstract PairInteraction type in the dictionary are:

@benchmark pair_energy(atom)

BenchmarkTools.Trial: 
  memory estimate:  7.42 KiB
  allocs estimate:  475
  --------------
  minimum time:     34.000 μs (0.00% GC)
  median time:      42.000 μs (0.00% GC)
  mean time:        49.120 μs (16.11% GC)
  maximum time:     61.200 ms (99.90% GC)
  --------------
  samples:          10000
  evals/sample:     1

And when using the PairUnion type, we get:

@benchmark pair_energy(atom)

BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     25.000 μs (0.00% GC)
  median time:      26.000 μs (0.00% GC)
  mean time:        26.274 μs (0.00% GC)
  maximum time:     53.000 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

So the union approach works to avoid memory allocation, and looking at @code_warntype also shows that the operation is now type stable. The relatively small difference in overall speed is likely due to the cost of the actual methods being called, which overshadow alot of the dynamic dispatch penalty.

1 Like