I have a program which ingests JSON describe a list of objects. For the sake of example, let’s say each object is a Cartesian point with a name:
struct NamedPoint x::Float64 y::Float64 name::String end
The actual use case is more complicated and not relevant here; the relevant feature is the pattern of bits fields like
Float64 + a
String. In practice, for data with millions of input points, there are only ~10 distinct names. So it seems a bit of a waste to have every
NamedPoint contain a
String - and thus not be
isbits. Instead, each
NamedPoint should be able to just contain a small integer indicating which name it has.
This seems intuitively similar to the idea behind
IndirectArrays. But I don’t want an indirect or pooled array of just names; the code uses (and is way more readable with)
My thought now is to just collect the unique names seen when parsing a JSON and build a dict
UInt8 => name at parse time. Then each
NamedPoint can have a
UInt8 instead of a
String name, and will be
isbits. Does a package implementing this better than “roll your own” already exist? Is there a different recommended solution? I don’t love this mapping dictionary approach, since the mapping has to be passed around to helper functions and a
NamedPoint is no longer meaningful without the associated name mapping dictionary.
I welcome any advice on how to maximize performance here while keeping the code well readable and maintainable. Thanks!