Efficiently Read JSON and Create DataFrame

Hi

There are a few JSON packages available: JSON.jl, JSON2.jl, JSON3.jl, LazyJSON.jl are ones I’m aware of.

See [ANN] JSON3.jl - Yet another JSON package for Julia
for some details.

My summary:

  • JSON.jl - The original and does proper handling, but slow and very memory intensive. Uses parse to parse into a dict
  • LazyJSON.jl - Memory efficient, which makes it relatively fast. Parse also gives a “dict” type interface.
  • JSON2.jl & JSON3.jl - Claims to be fast. Parses into a provided type. At some point I looked at them and got the impression one (or both?) was cheating a bit: It assumed the JSON fields would match the sequence of fields defined in the structure (to speedup, it didn’t check if label matched). This works if the JSON string is created by the same package, but could break if done by another encoder. I’m not sure if this is still the case and/or if I just misunderstood the code.

I would assume LazyJSON would shine if you only need to access a subset of the JSON, but as part of the testing of the Unmarshal.jl package I got the impression that even if I unmarshal the whole object LazyJSON could still outperform JSON, but it depends on the size and complexity of the structure.

The Unmarshal.jl package can be used to convert from the JSON.jl and LazyJSON.jl dict interface to a Julia type object, which might be an alternative to what your doing in:

simstates = vcat(DataFrame.(inDict[“simstates”])…)

It is however focused on functionality and not really performance, in particular since working with the original JSON.parse, it seemed the JSON.parse dominated timing compared to the Unmarshal step.