DataFrames, best way to import from JSON format file

4aHxKzD · October 15, 2019, 2:43pm

Hi.

I’m coming from Python and would like to know what is the current “standard” on how to import from a JSON file to a DataFrame data structure, i.e., the equivalent to pandas.read_json().

ExpandingMan · October 15, 2019, 2:58pm

As far as I know, there isn’t actually a universally standard JSON table format (though I’m sure countless non-standard, or aspiring-to-be-standard formats exist). That said, tables are either dicts of arrays or an array of dicts, so it seems likely that the JSON you’re reading is one of these.

DataFrames implements thte Tables interface. The default constructor DataFrame therefore supports reading dicts of arrays directly into dataframes. In that case you can just do

DataFrame(JSON.parse(data))

I don’t see any really simple way of reading an array of dicts. You could do

v = JSON.parse(data)
cols = reduce(∩, keys.(v))
DataFrame((Symbol(c)=>getindex.(v, c) for c ∈ cols)...)

quinnj · October 15, 2019, 3:22pm

A few of us have been working on https://github.com/JuliaData/JSONTables.jl, which provides an implementation of the Tables.jl interface to/from JSON via the JSON3.jl package. It’s use is like:

# turn a json string source into a DataFrame
# json source can be array of objects, or object of arrays
df = DataFrame(jsontable(json_source))

# write out a DataFrame as an array of objects
arraytable(df)

# write out a DataFrame as an object of arrays
objecttable(df)

jling · October 15, 2019, 6:00pm

could you elaborate a bit on the relation between JSON and 2 and 3? I know JSON is maintained by a different ‘team’, and what is the long term goal for these three JSON packages?

quinnj · October 15, 2019, 10:08pm

Sure, here’s a quick history:

JSON.jl has been around the longest; takes a very simple, straightforward approach to json parsing/writing, nothing fancy or custom for native Julia objects
JSON2.jl was started w/ an idea to use generated functions for custom Julia types to generate specialized JSON parsing/writing code that could be much faster than JSON.jl; it was an experiment that worked fairly well, but can incur expensive compilation costs for complex/highly nested objects
JSON3.jl is a successor of JSON2.jl that took some similar ideas (support for custom object serialization/deserialization), but aims to fix the compilation cost issue by being smarter, and also provides a hybrid lazy parsing approach for generic objects/arrays that is faster than any other solution

I’ll probably deprecate JSON2 at some point, since JSON3.jl is really more of a “breaking release” version of it; there’s also been talk/interest of eventually merging JSON.jl and JSON3.jl, but I’m in favor of letting JSON3.jl bake for a while longer before doing so.

jling · October 15, 2019, 10:11pm

this would make much sense since you probably don’t want use package name as version # (

this is even better. JSON is somewhat important for Julia’s audience (more than, let’s say, XML, comparable to CSV (IMHO

4aHxKzD · October 15, 2019, 10:55pm

@ExpandingMan @quinnj Thank you for your quick replies, you were very helpful.
I’m going to assume that using JSONTables is more generic and the way to go in most situations.

Topic		Replies	Views
Recommended way to save and read DataFrames in JSON format Data	8	3709	February 12, 2018
How to read Panda's DataFrames from json file? New to Julia dataframes	27	971	February 1, 2023
JSON to Array of DataFrames General Usage question	4	407	November 22, 2019
DataFrame or DataFrameRow to Dict and JSON Data	1	2088	August 27, 2019
Easiest way to load a DataFrame from a compressed, newline delimited json file on the cloud? Data dataframes	2	2601	October 20, 2020

DataFrames, best way to import from JSON format file

Related topics