When to use a Dictionary versus a Struct in Julia?

I have been doing my best to exorcise some bad habits from the python world, like the common use of Dictionaries. As I understand Julia, Structs and namedtuples seem to perform a lot better because of the julia type system. I think these data structures also seem a bit more Julian, since there is less ambiguity about the data types on variables.

However, I was looking at the DrWatson project on Julia Dynamics, and noticed that they use Dictionaries to store parameter values and such. So I was trying to figure out some rules of thumb for when to use Dictionaries versus when to use Structs. This question partly boils down to when does using a Dictionary not impair other optimizations in the code–and how would you know if those optimizations were impaired?

Any suggestions on when to use Dictionaries in the “right” way–meaning no impairment in the performance of the rest of your code?

Oh yes, and I don’t mean to disparage DrWatson, which seems like a great package. It just made me think about using Dictionaries, but otherwise the developers of that package know a lot more about Julia programming than I do.

3 Likes

I believe the recommendation is to use Dictionaries when you don’t know the set of keys ahead of time. Basically when writing the code if you know what the keys are going to be then use a structure or a NamedTuple. If the key names are going to be based on some input then then a dictionary would be the way to go.

7 Likes

There is also a potential trade-off between compile time and runtime performance. If you have many sets of different parameters and store them in a named tuple, all methods will be specialized on each different named tuple (i.e. they will be re-compiled). Whereas for a Dict, this will not be the case. However, once compiled named tuples are likely faster. Note also that there are different kinds of dicts which have different kind of performance characteristics, e.g. LittleDict OrderedCollections.jl/little_dict.jl at master · JuliaCollections/OrderedCollections.jl · GitHub

4 Likes

@jakobnissen also posted these tips on Slack. I asked him if he was okay with me reposting these comments, and he said he was fine with it.

There are a few issues at hand here.

  • First, Dicts are slow. “Slow” here may be 1 microsecond for read/write operations. For many user-facing applications this doesn’t matter, but you probably shouldn’t have any internals relying on Dicts, because they then become impossible to optimize (looking at you, Python!)
  • Second, they’re memory inefficient. Same story as the previous point
  • Crucially, they are mutable. That makes it hard for both the programmer and the compiler to figure out exactly what kind of data they contain at a given point. With a named tuple, you know for sure which fields it contains at all times. I think this is the most important aspect: Dicts are hard to reason about
2 Likes