[Pre-ANN/RFC] ExpandNestedData.jl (Previously Normalize.jl)

I believe there are no existing functions that convert this “nested namedtuple” of columns into different table types. Meanwhile, various tables support this conversion with exactly the same syntax, but the table constructor has to be inserted at each level:

T(
    d = [4,4,4,4],
    a = T(
        b=[1, 2, [3,4], []],
        c=[2, missing, 1, missing]
    )
)

works equally well for a range of table types, such as T = StructArray or T = Table (from TypedTables.jl).

Thanks again, everyone, for your feedback!

I’ve updated the project to ExpandNestedData.jl. And the main function to expand. Thanks @Dan.

Per @aplavin’s suggestion, I added an option for users to choose flat_columns or nested_columns. If you choose nested_columns it returns a TypedTable with the columns nested matching the hierarchy of the source data. I know it is idiomatic to accept Symbols for this kind of paramater, but I made an enum, ColumnStyle that gets exported because I like the built-in protections of the type system. Should I revert this decision and just use Symbols?

Long term, I’ve considered making a custom Table.jl type that is nested internally, but you can access the columns as flat or nested. But I don’t have the bandwidth right now to implement the whole interface, so I’m going to lean on TypedTables for now :slight_smile:

I still haven’t decided on what the interface for a compress function would look like. I need to experiment a bit before trying to publish a feature. But I added it to the roadmap. @quinnj’s Strapping.jl seems to be pretty much filling the need for that utility. I am considering pulling in Strapping.jl as a dep and having compress take a Table and some ColumnDefinitions and use those to build StructTypes that would drive Strapping.construct. But that’s tentative.

I’ll post another ANN once I’ve got something workable and register the package.

Thanks again!

1 Like