ExpandNestedData.jl is in the process of being registered!
Do you wish you could see your JSON data as a table? How about XML? Or a struct of structs? If so, ExpandNestedData is the tool for you. Just pass your object to expand
and it will unpack any nested data into a columnar table.
Tl;Dr
using ExpandNestedData
using JSON3
using DataFrames
message = JSON3.read("""
{
"a" : [
{"b" : 1, "c" : 2},
{"b" : 2},
{"b" : [3, 4], "c" : 1},
{"b" : []}
],
"d" : 4
}
"""
)
expand(message) |> DataFrame
# returns
5Γ3 DataFrame
Row β d a_b a_c
β Int64 Int64? Int64?
ββββββΌβββββββββββββββββββββββββ
1 β 4 missing missing
2 β 4 3 1
3 β 4 4 1
4 β 4 2 missing
5 β 4 1 2
expand
has many useful kwargs that allow you to tweak how the columns are collect, how the column names are constructed, whether the table structure should be flat or if the columns/returned rows should be nested matching the structure of the source data, and even designate which paths to include (ignoring branches of the input data that are not included). You can see the docs for detailed descriptions of all options.
Iβve tested this package with XMLDict.jl and JSON3.jl extensively, and it handles a number of edge cases well, but if you find any bugs, please let me know!
Outstanding Goals
- Support for AbstractTree.jl input (This would enable composability with Gumbo.jl and XML.jl)
- Use custom Table as input for compressing tabular data to nested data
- Widen arrays so column names match XPath expressions
- Parse Xpath to ColumnDefinitions
- Dispatch on user-defined
get_keys
andget_values
functions to traverse arbitrary custom types
Contributing
Iβd love help with any of the outstanding goals or other features you think would be useful. Further, Iβm sure there is still performance left on the table. Unpacking completely generic Dicts and Arrays has been challenging to type-stabilize, and Iβm definitely open to feedback. Core.jl
contains the central logic of unpacking the input, if you want to take a crack at it.