Extracting row of DataFrame directly as NamedTuple?

ethanminfang · October 1, 2019, 9:20pm

Hi all,

I guess that’s a stupid question(?)

I simulated a model with different sets of parameters. To save time (not repeated run the same code with different parameters), I saved the parameters into dataframe row by row.

Now what if I just want to extract a row of it (say row 5) directly as a namedtuple?

Thanks,
Ethan

nilshg · October 2, 2019, 12:58pm

Is this what you’re after?

julia> using DataFrames

julia> df = DataFrame(a = 1:10, b = 1:10)
10×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 1     │
│ 2   │ 2     │ 2     │
│ 3   │ 3     │ 3     │
│ 4   │ 4     │ 4     │
│ 5   │ 5     │ 5     │
│ 6   │ 6     │ 6     │
│ 7   │ 7     │ 7     │
│ 8   │ 8     │ 8     │
│ 9   │ 9     │ 9     │
│ 10  │ 10    │ 10    │

julia> copy(df[5, :])
(a = 5, b = 5)

Extract from docstring:

help?> DataFrameRow
(...)
Indexing is one-dimensional like specifying a column of a DataFrame. 
You can also access the data in a DataFrameRow using the getproperty 
and setproperty! functions and convert it to a NamedTuple using the 
copy function.
(...)

bkamins · October 2, 2019, 1:20pm

Also DataFrameRow behaves as a mutable NamedTuple so most of the time you should be probably fine without a conversion to a NamedTuple.

Finally you can easily convert a DataFrame into a vector of NamedTuples:

julia> df = DataFrame(rand(4,5))
4×5 DataFrame
│ Row │ x1       │ x2       │ x3       │ x4        │ x5       │
│     │ Float64  │ Float64  │ Float64  │ Float64   │ Float64  │
├─────┼──────────┼──────────┼──────────┼───────────┼──────────┤
│ 1   │ 0.770968 │ 0.560951 │ 0.866555 │ 0.415779  │ 0.592464 │
│ 2   │ 0.540743 │ 0.130965 │ 0.753823 │ 0.0484519 │ 0.29774  │
│ 3   │ 0.58207  │ 0.251234 │ 0.839407 │ 0.198445  │ 0.64087  │
│ 4   │ 0.380907 │ 0.639851 │ 0.219417 │ 0.499336  │ 0.549085 │

julia> Tables.rowtable(df)
4-element Array{NamedTuple{(:x1, :x2, :x3, :x4, :x5),NTuple{5,Float64}},1}:
 (x1 = 0.770968397202171, x2 = 0.5609505403103048, x3 = 0.8665553646186814, x4 = 0.4157788264006259, x5 = 0.5924636685911997)
 (x1 = 0.5407429997531747, x2 = 0.13096466013137342, x3 = 0.7538231604145154, x4 = 0.048451924943883506, x5 = 0.2977397808434288)
 (x1 = 0.582069831435476, x2 = 0.25123376929999, x3 = 0.8394071952281461, x4 = 0.1984448483279182, x5 = 0.6408697174304954)
 (x1 = 0.38090740524465483, x2 = 0.6398505002703665, x3 = 0.21941720362172124, x4 = 0.49933624062983384, x5 = 0.5490849304331029)

aplavin · October 2, 2019, 1:27pm

Using DataFrameRow as a NamedTuple is pretty limited - e.g. you cannot unpack it as f(; row...), cannot use tools like merging them or deleting fields, and so on. And performance, of course - getting a field from a DataFrameRow is several times slower, than from a plain named tuple.

ethanminfang · October 2, 2019, 1:41pm

Yes. It is. thx!

Topic		Replies	Views
Dataframe destructors Data question , dataframes , namedtuple	2	470	February 20, 2022
Converting NamedTuple to DataFrame seems expensive? New to Julia	7	676	May 3, 2020
Accessing a column value from DataFrameRow allocates Performance dataframes	10	838	March 7, 2022
Adding a NamedTuple to DataFrame row General Usage dataframes	11	946	September 20, 2021
Is it possible, that DataFrame row has a name, like colums have? For instance: df[:GR,:col] New to Julia dataframes	2	421	January 17, 2020

Extracting row of DataFrame directly as NamedTuple?

Related topics