wrangling large json files

gvdr · December 10, 2018, 9:35pm

Thank you both!

So far the new line delimitation did not seem a huge problem. Something like the following would behave as expected returning a new line at each iteration:

using JSON
using DataFrames

file_json = open("my_file.ndjson", "r")

file_json |>
  JSON.parse |>
  DataFrame

close(file_json)

Quite nicely, the DataFrame does not get upset by the present of nested json in some of the columns.

So, one solution that would replicate the jq workflow would be to save each line on disk as I go through the lines of the ndjson — e.g., using CSV.write(...; append = true) — and feeding that to JuliaDB with loadtable() (it should work, right?).

P.S. @lwabeke I’m not sure I understood correctly what you mean with “many entries on each level of the hierarchy”. The data is almost rectangular (fixed schema, 29 fields for each row, some million rows).

Topic		Replies	Views
Reading a large JSON file make Julia crashing Data	10	1232	December 22, 2021
Announce: A different way to read JSON data, LazyJSON.jl Data	19	10188	October 2, 2018
JSON Performance Tests Data	7	2326	November 6, 2018
Processing JSON from a .txt file and converting to a DataFrame New to Julia dataframes , json3	7	2720	May 15, 2021
Efficiently Read JSON and Create DataFrame Performance json , dataframes	23	8281	April 3, 2025

wrangling large json files

Related topics