wrangling large json files

I believe the standard JSON.jl package will not work, unless you have huge amounts of RAM, since it creates a complete dictionary of the JSON object. I haven’t tried the other JSON packages (JSON2.jl and Json2.jl on Github).

It might be worthwhile to have a look at LazyJSON it description here of seeming to only parse the parts that you need seems to match your requirement to be able to operate on 100GB files:

I’m assuming there are many entries on each level of the hierarchy. I would think that what you need conceptually is to maintain pointers to the start (and maybe end) of each level of the hierarchy you enter in a depth first type manner, then you can try to minimise what you need to reprocess when you want to jump to a different part. That might be a reusable library built from parts of the different JSON libraries if that isn’t what LazyJSON already provides.
If the number of elements at the top layer is not many, it might be worthwhile to split it into a few separate JSON files, which each then only contain 2 levels of the hierarchy, making parts easier to manage.

2 Likes