Destructuring object into DataFrame

Hi all,

I have an array of objects that I would like to convert to a DataFrame. Each object has some arrays and some individual values. I would like to destructure the arrays into individual rows and columns and repeat the individual values while maintaining their types. Here is a sample example:

using DataFrames

mutable struct M 
    id::Int
    vals::Array{<:Number,2}
end

ms = [M(i, rand(2,2)) for i in 1:2]
df = DataFrame(ms)

output

 Row β”‚ id     vals                              
     β”‚ Int64  Array…                            
─────┼──────────────────────────────────────────
   1 β”‚     1  [0.618182 0.00825483; 0.646895 0…
   2 β”‚     2  [0.450556 0.462105; 0.889429 0.8…

After exploring various approaches this is the closest solution:

rows = mapreduce(x -> [fill(x.id, size(x.vals,1)) x.vals], vcat, ms)
new_df = DataFrame(rows, :auto)

output


4Γ—3 DataFrame
 Row β”‚ x1       x2        x3         
     β”‚ Float64  Float64   Float64    
─────┼───────────────────────────────
   1 β”‚     1.0  0.618182  0.00825483
   2 β”‚     1.0  0.646895  0.0341765
   3 β”‚     2.0  0.450556  0.462105
   4 β”‚     2.0  0.889429  0.843017

The main problem is that x1 is not a row of Int. Is there a better/less hacky way to achieve this goal?

I should also note that I do not know the width of the array vals. So I cannot hard code column types. However, they will be the same width for each object.

flatten(combine(df, :id, :vals => ByRow(collect∘eachcol) => AsTable), Not(:id))

or if you know :id is unique:

combine(groupby(df, :id), :vals => only => AsTable)
1 Like

Thank you so much! I knew there was a better solution. I just didn’t know how to find it.

Sorry to bump this. I’ve been struggling to extend this to my use case. I have a vector that I would like to add to a column with a name. I receive an error about an unrecognized column selector. Here is a MWE:

using DataFrames

mutable struct M 
    id::Int
    vals::Array{<:Number,2}
    vect::Vector{<:Number}
end

ms = [M(i, rand(2,2), rand(2)) for i in 1:2]
df = DataFrame(ms)
combine(
    groupby(df, :id), 
    :vals => only => AsTable, 
    :vect => only => AsTable => :vect
)

How can I fix this error? Thanks!

combine(                      
    groupby(df, :id),         
    :vals => only => AsTable, 
    :vect => only => :vect    
)                             

AsTable is only needed for auto-generation of multiple column names. You could write e.g.:

combine(                             
    groupby(df, :id),                
    :vals => only => [:mat1, :mat2], 
    :vect => only => :vect           
)                                    

to give names to your marix columns

1 Like

I did not realize that about AsTable. This actually makes things a lot simpler because I do know the column names. Thanks again!

AsTable is mostly for unnesting e.g. NamedTuples which already have column names that you want to retain, see Nesting and unnesting columns in DataFrames.jl | Blog by BogumiΕ‚ KamiΕ„ski.

1 Like