[ANN] ExpandNestedData.jl

mrufsvold · June 6, 2023, 2:16pm

ExpandNestedData.jl is in the process of being registered!

Do you wish you could see your JSON data as a table? How about XML? Or a struct of structs? If so, ExpandNestedData is the tool for you. Just pass your object to expand and it will unpack any nested data into a columnar table.

Tl;Dr

using ExpandNestedData 
using JSON3
using DataFrames

message = JSON3.read("""
    {
        "a" : [
            {"b" : 1, "c" : 2},
            {"b" : 2},
            {"b" : [3, 4], "c" : 1},
            {"b" : []}
        ],
        "d" : 4
    }
    """
)

expand(message) |> DataFrame
# returns
5×3 DataFrame
 Row │ d      a_b      a_c     
     │ Int64  Int64?   Int64?  
─────┼─────────────────────────
   1 │     4  missing  missing 
   2 │     4        3        1
   3 │     4        4        1
   4 │     4        2  missing 
   5 │     4        1        2

expand has many useful kwargs that allow you to tweak how the columns are collect, how the column names are constructed, whether the table structure should be flat or if the columns/returned rows should be nested matching the structure of the source data, and even designate which paths to include (ignoring branches of the input data that are not included). You can see the docs for detailed descriptions of all options.

I’ve tested this package with XMLDict.jl and JSON3.jl extensively, and it handles a number of edge cases well, but if you find any bugs, please let me know!

Outstanding Goals

Support for AbstractTree.jl input (This would enable composability with Gumbo.jl and XML.jl)
Use custom Table as input for compressing tabular data to nested data
Widen arrays so column names match XPath expressions
Parse Xpath to ColumnDefinitions
Dispatch on user-defined get_keys and get_values functions to traverse arbitrary custom types

Contributing

I’d love help with any of the outstanding goals or other features you think would be useful. Further, I’m sure there is still performance left on the table. Unpacking completely generic Dicts and Arrays has been challenging to type-stabilize, and I’m definitely open to feedback. Core.jl contains the central logic of unpacking the input, if you want to take a crack at it.

mrufsvold · July 7, 2023, 8:49pm

Version 1.1.0 has been released!

There are no new features, but I did a major overhaul of the internals and brought down the allocations by ~200x. This is in no small part thanks to @Mason’s SumTypes.jl which, as I’ve said before, is totally amazing!

For just a small example of the improvement:

julia> small_dict = Dict(
           :a => 1,
           :b => "2",
           :c => Dict(:e => Symbol(3), :f => 4)
       );

julia> many_records = [small_dict for _ in 1:10_000];

# lazy_columns so we don't measure the time to collect the values
# nested so we aren't measuring reorganizing columns into a flat table
# This ensures we are only running the code I've refactored
julia> @btime ExpandNestedData.expand($many_records; lazy_columns=true, column_style=:nested);
  191.322 ms (2310855 allocations: 92.78 MiB)

Previously, this benchmark completely froze my REPL, so I can’t compare its exact performance improvement. But, last I checked, 191ms < Inf ms If I remove the kwargs above, it only increases run time to 202ms, so we’ve seen major gains in the worst parts of the algorithm!

All this to say, if you need to consume lots and lots of nested data (looking at you, XML) and turn it into a table, this package’s performance shouldn’t hold you back anymore!

mrufsvold · August 1, 2025, 6:45pm

v2.0.1 has been release!

It is a breaking change because I am dropping the PooledArrays dependency because my lazy array type is now good enough to be returned without being materialized. I also changed it up so that all methods of expand return a TypedTable.FlexTable which improves type stability. Finally, I’m bumping the required Julia version to 1.10.

This also has some general improvements in terms of reducing compilation and dispatch time by using @nospecialize more strategically.

Topic		Replies	Views
[Pre-ANN/RFC] ExpandNestedData.jl (Previously Normalize.jl) Package Announcements	21	1469	December 7, 2022
Flattening YFinance.jl JSON result into a DataFrame New to Julia question	22	1300	April 11, 2023
Flatten dicts of dicts in DataFrame General Usage dataframes	16	236	February 19, 2025
Parsing Nested JSON into DataFrame using JSONTable General Usage question , json , dataframes	1	1185	March 29, 2021
Expanding Named Tuples Data dataframes	13	737	June 5, 2021

[ANN] ExpandNestedData.jl

Tl;Dr

Outstanding Goals

Contributing

Related topics