Hi.
I’m coming from Python and would like to know what is the current “standard” on how to import from a JSON file to a DataFrame data structure, i.e., the equivalent to pandas.read_json()
.
Hi.
I’m coming from Python and would like to know what is the current “standard” on how to import from a JSON file to a DataFrame data structure, i.e., the equivalent to pandas.read_json()
.
As far as I know, there isn’t actually a universally standard JSON table format (though I’m sure countless non-standard, or aspiring-to-be-standard formats exist). That said, tables are either dicts of arrays or an array of dicts, so it seems likely that the JSON you’re reading is one of these.
DataFrames implements thte Tables interface. The default constructor DataFrame
therefore supports reading dicts of arrays directly into dataframes. In that case you can just do
DataFrame(JSON.parse(data))
I don’t see any really simple way of reading an array of dicts. You could do
v = JSON.parse(data)
cols = reduce(∩, keys.(v))
DataFrame((Symbol(c)=>getindex.(v, c) for c ∈ cols)...)
A few of us have been working on https://github.com/JuliaData/JSONTables.jl, which provides an implementation of the Tables.jl interface to/from JSON via the JSON3.jl package. It’s use is like:
# turn a json string source into a DataFrame
# json source can be array of objects, or object of arrays
df = DataFrame(jsontable(json_source))
# write out a DataFrame as an array of objects
arraytable(df)
# write out a DataFrame as an object of arrays
objecttable(df)
could you elaborate a bit on the relation between JSON and 2 and 3? I know JSON is maintained by a different ‘team’, and what is the long term goal for these three JSON packages?
Sure, here’s a quick history:
I’ll probably deprecate JSON2 at some point, since JSON3.jl is really more of a “breaking release” version of it; there’s also been talk/interest of eventually merging JSON.jl and JSON3.jl, but I’m in favor of letting JSON3.jl bake for a while longer before doing so.
this would make much sense since you probably don’t want use package name as version # (
this is even better. JSON is somewhat important for Julia’s audience (more than, let’s say, XML, comparable to CSV (IMHO
@ExpandingMan @quinnj Thank you for your quick replies, you were very helpful.
I’m going to assume that using JSONTables
is more generic and the way to go in most situations.