Dataframe to nested tuple?

Lincoln_Hannah · December 3, 2020, 11:28am

Id like to transform this:

DataFrame( GroupLetter = ['A','A','B','B'] , GroupID = [1,1,2,2], Col1 = [1,2,3,4], Col2 = [5,6,7,8] )

 | GroupLetter │ GroupID │ Col1  │ Col2  │
 │ 'A'         │ 1       │ 1     │ 5     │
 │ 'A'         │ 1       │ 2     │ 6     │
 │ 'B'         │ 2       │ 3     │ 7     │
 │ 'B'         │ 2       │ 4     │ 8     │

Into this:

data = ( 
    A = (GroupID = 1, DF = DataFrame(Col1 = [1,2], Col2 = [5,6]) ),
    B = (GroupID = 2, DF = DataFrame(Col1 = [3,4], Col2 = [7,8]) )
)

So I can write

data.A.GroupID

and

data.A.DF

giving

│ Col1  │ Col2  │
│ 1     │ 5     │
│ 2     │ 6     │

Is there an easy way to do this?
And, can the nested tuple structure be written to and from a file ?

fbanning · December 3, 2020, 11:48am

julia> data = DataFrame( GroupLetter = ['A','A','B','B'] , GroupID = [1,1,2,2], Col1 = [1,2,3,4], Col2 = [5,6,7,8] )
4×4 DataFrame
│ Row │ GroupLetter │ GroupID │ Col1  │ Col2  │
│     │ Char        │ Int64   │ Int64 │ Int64 │
├─────┼─────────────┼─────────┼───────┼───────┤
│ 1   │ 'A'         │ 1       │ 1     │ 5     │
│ 2   │ 'A'         │ 1       │ 2     │ 6     │
│ 3   │ 'B'         │ 2       │ 3     │ 7     │
│ 4   │ 'B'         │ 2       │ 4     │ 8     │

julia> groupby(data, ["GroupLetter", "GroupID"])
GroupedDataFrame with 2 groups based on keys: GroupLetter, GroupID
First Group (2 rows): GroupLetter = 'A', GroupID = 1
│ Row │ GroupLetter │ GroupID │ Col1  │ Col2  │
│     │ Char        │ Int64   │ Int64 │ Int64 │
├─────┼─────────────┼─────────┼───────┼───────┤
│ 1   │ 'A'         │ 1       │ 1     │ 5     │
│ 2   │ 'A'         │ 1       │ 2     │ 6     │
⋮
Last Group (2 rows): GroupLetter = 'B', GroupID = 2
│ Row │ GroupLetter │ GroupID │ Col1  │ Col2  │
│     │ Char        │ Int64   │ Int64 │ Int64 │
├─────┼─────────────┼─────────┼───────┼───────┤
│ 1   │ 'B'         │ 2       │ 3     │ 7     │
│ 2   │ 'B'         │ 2       │ 4     │ 8     │

julia> gdf[(GroupLetter = 'A', GroupID = 1)]
2×4 SubDataFrame
│ Row │ GroupLetter │ GroupID │ Col1  │ Col2  │
│     │ Char        │ Int64   │ Int64 │ Int64 │
├─────┼─────────────┼─────────┼───────┼───────┤
│ 1   │ 'A'         │ 1       │ 1     │ 5     │
│ 2   │ 'A'         │ 1       │ 2     │ 6     │

Not quite what you asked for but maybe close enough that you like it.

Edit: Remember when doing such a thing that piping (conveniently via Pipe.jl or Chain.jl) is always more ~~performant~~ concise.

ericphanson · December 3, 2020, 1:52pm

I don’t mean to derail this but piping should not be any more or less performant than other ways of writing the same code; it’s just syntax.

fbanning · December 3, 2020, 2:39pm

Bad wording, sorry.

bkamins · December 3, 2020, 2:44pm

or gdf[('A', 1)] if you want to avoid passing the grouping column names.

Lincoln_Hannah · December 3, 2020, 11:14pm

Thank you both.
It seems if there is only one grouping column you need a comma in the tuple

gdf = groupby(data, :GroupLetter)

gdf[ ( :A, ) ]

pdeffebach · December 3, 2020, 11:29pm

Yeah, tuple(x) looks slightly less clumsy imo

Topic		Replies	Views
DataFrame to Dict via Vector of Nested Named Tuples New to Julia jump , dataframes , namedtuple	2	531	November 28, 2021
Convert dataframe to array of tuples General Usage question , dataframes	1	1114	July 12, 2022
How do I best convert a nested named tuple to a nested struct? General Usage data_structures , struct , namedtuple	5	839	July 5, 2021
DataFrame construction from array of tuples General Usage data	12	7008	November 28, 2022
Dataframe destructors Data question , dataframes , namedtuple	2	463	February 20, 2022

Dataframe to nested tuple?

Related topics