Parse a large array of small JSON objects?

Satvik · May 12, 2021, 7:01pm

Hi,

I have a DataFrame that contains a column of several million JSON strings, e.g.

julia> DataFrame(j=""""{a": 1, "b": 2}""")
julia> pretty_table(df)
┌──────────────────┐
│                j │
│           String │
├──────────────────┤
│ "{a": 1, "b": 2} │
└──────────────────┘

The simple way to parse them is with something like df[:, "j"] .= JSON3.read.(df.j). However, this appears to create a memory leak when I do it repeatedly on different datasets, even when df is no longer reachable, the memory appears to stay used and the program eventually runs into an OOM error. In contrast this doesn’t happen if I don’t do the JSON parsing.

Is there a better practice for parsing a lot of small JSON objects that I’m not aware of? I saw that JSON3 uses a semi-lazy evaluation method, and was wondering if that’s confusing the garbage collector somehow.

ericphanson · May 12, 2021, 7:12pm

I think the default JSON3.Object’s etc will reference the object, but you can tell JSON3 how to use something else. E.g. JSON3.read(obj, Dict) says to parse obj into a regular Dict (which then does not share memory with the original object). If they are all of a certain structured form, you could also define a struct and use StructTypes to say how to (de)serialize to it, then do JSON3.read(obj, MyStruct) to parse into that structure.

Satvik · May 12, 2021, 8:36pm

JSON3.read(obj, Dict) solved the memory leak. Thanks!

Topic		Replies	Views
Memory usage with large JSON Performance	5	565	December 6, 2020
Reading a large JSON file make Julia crashing Data	10	1151	December 22, 2021
Efficiently Read JSON and Create DataFrame Performance json , dataframes	23	7733	April 3, 2025
JSON list of lists to Julia matrix, preferably fast and with low memory overhead Performance	5	2750	January 18, 2021
[ANN] JSON3.jl - Yet another JSON package for Julia Package Announcements	23	10633	September 19, 2020

Parse a large array of small JSON objects?

Related topics