wrangling large json files

lwabeke · December 10, 2018, 8:07am

I believe the standard JSON.jl package will not work, unless you have huge amounts of RAM, since it creates a complete dictionary of the JSON object. I haven’t tried the other JSON packages (JSON2.jl and Json2.jl on Github).

It might be worthwhile to have a look at LazyJSON it description here of seeming to only parse the parts that you need seems to match your requirement to be able to operate on 100GB files:

I’m assuming there are many entries on each level of the hierarchy. I would think that what you need conceptually is to maintain pointers to the start (and maybe end) of each level of the hierarchy you enter in a depth first type manner, then you can try to minimise what you need to reprocess when you want to jump to a different part. That might be a reusable library built from parts of the different JSON libraries if that isn’t what LazyJSON already provides.
If the number of elements at the top layer is not many, it might be worthwhile to split it into a few separate JSON files, which each then only contain 2 levels of the hierarchy, making parts easier to manage.

Topic		Replies	Views
Reading a large JSON file make Julia crashing Data	10	1153	December 22, 2021
Announce: A different way to read JSON data, LazyJSON.jl Data	19	10052	October 2, 2018
JSON Performance Tests Data	7	2283	November 6, 2018
Processing JSON from a .txt file and converting to a DataFrame New to Julia dataframes , json3	7	2604	May 15, 2021
Efficiently Read JSON and Create DataFrame Performance json , dataframes	23	7776	April 3, 2025

wrangling large json files

Related topics