Extracting row of DataFrame directly as NamedTuple?

Hi all,

I guess that’s a stupid question(?)

I simulated a model with different sets of parameters. To save time (not repeated run the same code with different parameters), I saved the parameters into dataframe row by row.

Now what if I just want to extract a row of it (say row 5) directly as a namedtuple?

Thanks,
Ethan

1 Like

Is this what you’re after?

julia> using DataFrames

julia> df = DataFrame(a = 1:10, b = 1:10)
10Γ—2 DataFrame
β”‚ Row β”‚ a     β”‚ b     β”‚
β”‚     β”‚ Int64 β”‚ Int64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1     β”‚ 1     β”‚
β”‚ 2   β”‚ 2     β”‚ 2     β”‚
β”‚ 3   β”‚ 3     β”‚ 3     β”‚
β”‚ 4   β”‚ 4     β”‚ 4     β”‚
β”‚ 5   β”‚ 5     β”‚ 5     β”‚
β”‚ 6   β”‚ 6     β”‚ 6     β”‚
β”‚ 7   β”‚ 7     β”‚ 7     β”‚
β”‚ 8   β”‚ 8     β”‚ 8     β”‚
β”‚ 9   β”‚ 9     β”‚ 9     β”‚
β”‚ 10  β”‚ 10    β”‚ 10    β”‚

julia> copy(df[5, :])
(a = 5, b = 5)

Extract from docstring:

help?> DataFrameRow
(...)
Indexing is one-dimensional like specifying a column of a DataFrame. 
You can also access the data in a DataFrameRow using the getproperty 
and setproperty! functions and convert it to a NamedTuple using the 
copy function.
(...)
5 Likes

Also DataFrameRow behaves as a mutable NamedTuple so most of the time you should be probably fine without a conversion to a NamedTuple.

Finally you can easily convert a DataFrame into a vector of NamedTuples:

julia> df = DataFrame(rand(4,5))
4Γ—5 DataFrame
β”‚ Row β”‚ x1       β”‚ x2       β”‚ x3       β”‚ x4        β”‚ x5       β”‚
β”‚     β”‚ Float64  β”‚ Float64  β”‚ Float64  β”‚ Float64   β”‚ Float64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 0.770968 β”‚ 0.560951 β”‚ 0.866555 β”‚ 0.415779  β”‚ 0.592464 β”‚
β”‚ 2   β”‚ 0.540743 β”‚ 0.130965 β”‚ 0.753823 β”‚ 0.0484519 β”‚ 0.29774  β”‚
β”‚ 3   β”‚ 0.58207  β”‚ 0.251234 β”‚ 0.839407 β”‚ 0.198445  β”‚ 0.64087  β”‚
β”‚ 4   β”‚ 0.380907 β”‚ 0.639851 β”‚ 0.219417 β”‚ 0.499336  β”‚ 0.549085 β”‚

julia> Tables.rowtable(df)
4-element Array{NamedTuple{(:x1, :x2, :x3, :x4, :x5),NTuple{5,Float64}},1}:
 (x1 = 0.770968397202171, x2 = 0.5609505403103048, x3 = 0.8665553646186814, x4 = 0.4157788264006259, x5 = 0.5924636685911997)
 (x1 = 0.5407429997531747, x2 = 0.13096466013137342, x3 = 0.7538231604145154, x4 = 0.048451924943883506, x5 = 0.2977397808434288)
 (x1 = 0.582069831435476, x2 = 0.25123376929999, x3 = 0.8394071952281461, x4 = 0.1984448483279182, x5 = 0.6408697174304954)
 (x1 = 0.38090740524465483, x2 = 0.6398505002703665, x3 = 0.21941720362172124, x4 = 0.49933624062983384, x5 = 0.5490849304331029)
2 Likes

Using DataFrameRow as a NamedTuple is pretty limited - e.g. you cannot unpack it as f(; row...), cannot use tools like merging them or deleting fields, and so on. And performance, of course - getting a field from a DataFrameRow is several times slower, than from a plain named tuple.

1 Like

Yes. It is. thx!