Identifying performant data structures

Hi, I am coding a simulation and have a couple of possibilities for structuring the data. I’d like to sort out which is likely to be the most efficient. I’ve written toy examples for some of the options, but the results have not been super informative.

The model accrues ~100k agents over time, each of which has some mutable and some immutable characteristics. The latter are updated several thousand times during a run. The final number of agents is known prior to run. I don’t have a ton of experience with Julia, so I am wondering if there are options other than those mentioned below that I should be aware of for structuring the data, or details related to some of these approaches that would be good to know about for performance.

Some of the possible strategies, which could be implemented on their own or in combination:

  • Make Agent a mutable struct, and add a const designator to those values that don’t need to change
  • Make Agent an immutable struct and use @reset to change those values that need to
  • Make an immutable struct for the unchanging values, a mutable one for the others, and link/filter them with a unique id
  • Start the run with an empty array of agents, and push agents into it as they enter the model
  • Start the run with a full array of agents and designate some as active as they enter the model
  • Put the whole thing in a dataframe or some other kind of structure.
  • Make an immutable struct for the changing values, and simply add a new element/record to the struct array each time an agent’s characteristics change

Thanks for any knowledge you can share!

I only have experience with very small systems with a handful of objects. But it’s the next day and still no responses, so I’ll offer my advice.

I find that immutable objects are much (much!) easier to work with than mutable. Replacing objects is much easier to organize and reason about than mutation. A Vector, Dict or other suitable container of immutable objects is generally a nice way to keep a lot of things around.

I also find getter/setter functions to be nicer than object.field access. They make it much easier to make changes later (especially changes related to polymorphism). So while @reset is very nice, I would recommend you wrap it in setter functions in most cases.

Mutable objects are best for when you actually require mutable semantics (mostly in data structures) or when you have very large objects (hundreds of bytes, at least) to which you frequently make changes to only small fractions of the data (e.g., matrix factorization algorithms), because sometimes this is noticeably slower than mutation. Hopefully, the compiler continues to get better at the @reset pattern and can efficiently update larger immutable objects as time goes on.

I’ve done some benchmark—if your code is type-stable and you need to modify some fields of a struct, then using a mutable struct is the fastest. An immutable struct combined with @reset is a bit slower.

As mikmoore mentioned, if the fields you need to modify are themselves mutable objects, you can simply use an immutable struct. If the fields you need to modify are immutable objects, then I think this post might help you choose an appropriate data structure.

IMO: write your simulation in the most natural way first. Often that will help you get the data structures right. Then benchmark and see if/where it’d make the most sense to optimize.