Efficiently Read JSON and Create DataFrame

lwabeke · June 10, 2020, 11:44am

Hi

There are a few JSON packages available: JSON.jl, JSON2.jl, JSON3.jl, LazyJSON.jl are ones I’m aware of.

See [ANN] JSON3.jl - Yet another JSON package for Julia
for some details.

My summary:

JSON.jl - The original and does proper handling, but slow and very memory intensive. Uses parse to parse into a dict
LazyJSON.jl - Memory efficient, which makes it relatively fast. Parse also gives a “dict” type interface.
JSON2.jl & JSON3.jl - Claims to be fast. Parses into a provided type. At some point I looked at them and got the impression one (or both?) was cheating a bit: It assumed the JSON fields would match the sequence of fields defined in the structure (to speedup, it didn’t check if label matched). This works if the JSON string is created by the same package, but could break if done by another encoder. I’m not sure if this is still the case and/or if I just misunderstood the code.

I would assume LazyJSON would shine if you only need to access a subset of the JSON, but as part of the testing of the Unmarshal.jl package I got the impression that even if I unmarshal the whole object LazyJSON could still outperform JSON, but it depends on the size and complexity of the structure.

The Unmarshal.jl package can be used to convert from the JSON.jl and LazyJSON.jl dict interface to a Julia type object, which might be an alternative to what your doing in:

simstates = vcat(DataFrame.(inDict[“simstates”])…)

It is however focused on functionality and not really performance, in particular since working with the original JSON.parse, it seemed the JSON.parse dominated timing compared to the Unmarshal step.

Topic		Replies	Views
DataFrames, best way to import from JSON format file Data	6	8513	October 15, 2019
Reading a large JSON file make Julia crashing Data	10	1175	December 22, 2021
How to read Panda's DataFrames from json file? New to Julia dataframes	27	1063	February 1, 2023
Announce: A different way to read JSON data, LazyJSON.jl Data	19	10100	October 2, 2018
Recommended way to save and read DataFrames in JSON format Data	8	3731	February 12, 2018

Efficiently Read JSON and Create DataFrame

Related topics