How to transform DataFrame back to its data source

alexgr · November 20, 2022, 12:52pm

How would you approach going back from DataFrame to a vector of Records given this code?

struct Record 
    A::UInt32
    B::Float32
    C::Float32
end

# get bytes from stream; Vector{UInt8}
buffer = rand(UInt8, sizeof(Record) * 5) 

# reinterpret bytes as vector of Records 
records = reinterpret(Record, buffer) 

# build dataframe
df = DataFrame(records)

# how to get back from dataframe to a vector of Records?

bertschi · November 20, 2022, 1:38pm

The following will work:

[Record(x...) for x in eachrow(df)]

It is a bit dangerous though as the data frame columns have to be in the correct order, i.e., the following is not what you want, but might or might not work depending on types:

[Record(x...) for x in eachrow(select(df, :B, :A, :C))]

The best way would probably to define a Record constructor for reconstructing from data frame rows:

Record(row::DataFrameRow) = Record(row.A, row.B, row.C)

Record.(eachrow(select(df, :B, :A, :C)))  # does the right thing even with permuted columns

aplavin · November 20, 2022, 4:04pm

This means you need to add the whole heavy DataFrames dependency to where you define Record.
A cleaner, more general, and no-deps approach is to define a kwargs constructor:

@kwdef struct Record 
   A::UInt32
   B::Float32
   C::Float32
end

Then,

[Record(; x...) for x in eachrow(df)]

works no matter the column order.

rocco_sprmnt21 · November 20, 2022, 8:31pm

using Tables
reinterpret(Record, rowtable(df))

bertschi · November 20, 2022, 8:36pm

Not necessarily, as you can define the method later and elsewhere, i.e., in the code that is working with data frames already and requires that functionality.
Would probably also prefer to define constructors close to the definition of a struct, but Julia allows other options here as well.

Agreed, also more general as named tuples are more widespread and not tied to data frames.

alexgr · November 22, 2022, 8:53pm

Thank you all. I learned something from all of you.

Since column order is not a concern in my use case, I ended using @rocco_sprmnt21’s solution because it is much more performant.

rafael.guerra · November 22, 2022, 10:20pm

Couldn’t we write your second line simply like this:

Record.(eachrow(df))

This would then be faster than the other solutions posted so far.

bertschi · November 22, 2022, 11:36pm

Sure, just wanted to show that the reconstruction is correct even when you permute the columns.

Topic		Replies	Views
Construct Julia Dataframe from row data New to Julia question , dataframes , data_structures	11	6212	March 21, 2020
DataFrame construction from array of tuples General Usage data	12	7117	November 28, 2022
How to convert a dataframe into a 1-D vector, line by line? General Usage dataframes , vector	6	92	November 14, 2024
Create data frame using values from vector as columns General Usage dataframes , reshaping , matrix	6	453	October 26, 2023
Transforming DataFrame from column of vector General Usage dataframes	3	246	December 1, 2022

How to transform DataFrame back to its data source

Related topics