DataFrames, best way to import from JSON format file

Hi.

I’m coming from Python and would like to know what is the current “standard” on how to import from a JSON file to a DataFrame data structure, i.e., the equivalent to pandas.read_json().

As far as I know, there isn’t actually a universally standard JSON table format (though I’m sure countless non-standard, or aspiring-to-be-standard formats exist). That said, tables are either dicts of arrays or an array of dicts, so it seems likely that the JSON you’re reading is one of these.

DataFrames implements thte Tables interface. The default constructor DataFrame therefore supports reading dicts of arrays directly into dataframes. In that case you can just do

DataFrame(JSON.parse(data))

I don’t see any really simple way of reading an array of dicts. You could do

v = JSON.parse(data)
cols = reduce(∩, keys.(v))
DataFrame((Symbol(c)=>getindex.(v, c) for c ∈ cols)...)
4 Likes

A few of us have been working on https://github.com/JuliaData/JSONTables.jl, which provides an implementation of the Tables.jl interface to/from JSON via the JSON3.jl package. It’s use is like:

# turn a json string source into a DataFrame
# json source can be array of objects, or object of arrays
df = DataFrame(jsontable(json_source))

# write out a DataFrame as an array of objects
arraytable(df)

# write out a DataFrame as an object of arrays
objecttable(df)
7 Likes

could you elaborate a bit on the relation between JSON and 2 and 3? I know JSON is maintained by a different ‘team’, and what is the long term goal for these three JSON packages?

Sure, here’s a quick history:

  • JSON.jl has been around the longest; takes a very simple, straightforward approach to json parsing/writing, nothing fancy or custom for native Julia objects
  • JSON2.jl was started w/ an idea to use generated functions for custom Julia types to generate specialized JSON parsing/writing code that could be much faster than JSON.jl; it was an experiment that worked fairly well, but can incur expensive compilation costs for complex/highly nested objects
  • JSON3.jl is a successor of JSON2.jl that took some similar ideas (support for custom object serialization/deserialization), but aims to fix the compilation cost issue by being smarter, and also provides a hybrid lazy parsing approach for generic objects/arrays that is faster than any other solution

I’ll probably deprecate JSON2 at some point, since JSON3.jl is really more of a “breaking release” version of it; there’s also been talk/interest of eventually merging JSON.jl and JSON3.jl, but I’m in favor of letting JSON3.jl bake for a while longer before doing so.

2 Likes

this would make much sense since you probably don’t want use package name as version # (:wink:

this is even better. JSON is somewhat important for Julia’s audience (more than, let’s say, XML, comparable to CSV (IMHO

@ExpandingMan @quinnj Thank you for your quick replies, you were very helpful.
I’m going to assume that using JSONTables is more generic and the way to go in most situations.