Transform! to destructure NamedTuple into columns

Christopher_Fisher · January 21, 2022, 3:26pm

Hi all,

I would like to destructure a DataFrame with NamedTuples into separate columns, using the keys as column names. For a similar problem with an array, I used something like transform!(df, :col => :identity => new_names). Is there a way I can take this:

2×2 DataFrame
 Row │ x         y              
     │ Float64   NamedTup…      
─────┼──────────────────────────
   1 │ 0.222043  (a = 1, b = 2)
   2 │ 0.72646   (a = 3, b = 5)

and obtain this:

2×4 DataFrame
 Row │ x         y               a      b     
     │ Float64   NamedTup…       Int64  Int64 
─────┼────────────────────────────────────────
   1 │ 0.222043  (a = 1, b = 2)      1      2
   2 │ 0.72646   (a = 3, b = 5)      3      5

Thanks!

MWE

using DataFrames

df = DataFrame(x = rand(2), y =[(a=1,b=2),(a=3,b=5)])

df_new = DataFrame(x = df.x, y = df.y, a = [1,3], b = [2,5])

sijo · January 21, 2022, 3:32pm

You can use AsTable:

julia> transform(df, :y => AsTable)
2×4 DataFrame
 Row │ x         y               a      b     
     │ Float64   NamedTup…       Int64  Int64 
─────┼────────────────────────────────────────
   1 │ 0.459213  (a = 1, b = 2)      1      2
   2 │ 0.241038  (a = 3, b = 5)      3      5

Christopher_Fisher · January 21, 2022, 3:45pm

Very nice. Thanks!

rafael.guerra · January 21, 2022, 4:32pm

This seems to be equivalent to:

hcat(df, DataFrame(df.y))

bkamins · January 21, 2022, 4:36pm

Yes, but it will do more allocations (which is a minor issue but still might be relevant occasionally).

rafael.guerra · January 21, 2022, 4:41pm

Obviously I am doing something wrong, but for the small OP example I actually see less allocations?

sijo · January 21, 2022, 5:03pm

Ah yes I also see less allocations with your solution… @bkamins ?

By the way, since [a b] is fancy syntax for hcat(a, b) you can also write

[df DataFrame(df.y)]

bkamins · January 21, 2022, 5:15pm

Ah - you are right:

julia> df = repeat(DataFrame(x=1, y=(a=1,b=2)), 10^8);

julia> @time transform(df, :y => AsTable);
  5.982615 seconds (200.00 M allocations: 9.686 GiB, 7.61% gc time)

julia> @time [df DataFrame(df.y)];
  1.345369 seconds (61 allocations: 5.215 GiB, 6.20% gc time)

This means that I need to optimize the internals of transform .

This guarantees that there is no aliasing between source and target and at the same time that we do not do unnecessary allocations:

julia> @time hcat(copy(df), DataFrame(df.y), copycols=false);
  1.231519 seconds (67 allocations: 3.725 GiB, 35.97% gc time)

Topic		Replies	Views
Dataframe destructors Data question , dataframes , namedtuple	2	464	February 20, 2022
How to create `DataFrame` from using NamedTuple keys as column names Data	4	2698	August 11, 2019
Extracting row of DataFrame directly as NamedTuple? General Usage dataframes	4	4213	October 2, 2019
Convert NamedTuple with matrix to DataFrame General Usage question , dataframes	7	314	March 9, 2024
[DataFrames Question]: How to convert single column with row of dictionary to multiple columns Specific Domains question , dataframes	4	516	May 14, 2022

Transform! to destructure NamedTuple into columns

Related topics