Unstack using row numbers

feanor12 · August 15, 2021, 10:15am

I have a table with multiple column pairs I dictating x and y values. If I stack them I can add columns bases on the column names for each pair, but do unstack them I need the original row numbers again.
What is the best way to do this?
At the moment I tried something like this

using DataFrames
using Chain
data=DataFrame("400_x"=>rand(5),"400_y"=>rand(5),"450_x"=>rand(5),"450_y"=>rand(5))
data[!,:Id]=1:nrow(data)
new_data = @chain data begin
           stack(Not(:Id))
           transform(:variable=>ByRow(x->parse(Int,split(x,"_")[1]))=>:temp)
           transform(:variable=>ByRow(x->split(x,"_")[2])=>:axis)
           unstack([:Id,:temp],:axis,:value)
           select(Not(:Id))
           end

Update:
Now that I have access to my laptop I checked the example code and it should work now. Basically I want to convert a table like this:

5×4 DataFrame
 Row │ 400_x      400_y      450_x     450_y
     │ Float64    Float64    Float64   Float64
─────┼───────────────────────────────────────────
   1 │ 0.619571   0.965287   0.206373  0.908058
   2 │ 0.147488   0.534391   0.539502  0.0534432
   3 │ 0.252395   0.148259   0.779658  0.0337005
   4 │ 0.0718452  0.0305076  0.760252  0.946083
   5 │ 0.921851   0.0851036  0.890209  0.930951

into something like this:

 Row │ temp       x          y
     │ Int64  Float64?   Float64?
─────┼─────────────────────────────────
   1 │ 400        0.619571   0.965287
   2 │ 400        0.147488   0.534391
   3 │ 400        0.252395   0.148259
   4 │ 400        0.0718452  0.0305076
   5 │ 400        0.921851   0.0851036
   6 │ 450        0.206373   0.908058
   7 │ 450        0.539502   0.0534432
   8 │ 450        0.779658   0.0337005
   9 │ 450        0.760252   0.946083
  10 │ 450        0.890209   0.930951

I just feel like I am doing it in an over-complicated way. So maybe one of the data gurus can give me some hints on how to improve it.

sijo · August 15, 2021, 11:49am

It’s hard to understand what you want without working code and without an example of input and desired output, but maybe this will help:

julia> data = DataFrame(["400_x", "400_y", "450_x", "450_y"] .=> rand.(3))
3×4 DataFrame
 Row │ 400_x     400_y     450_x     450_y    
     │ Float64   Float64   Float64   Float64  
─────┼────────────────────────────────────────
   1 │ 0.858738  0.306062  0.376122  0.964485
   2 │ 0.330242  0.874829  0.983844  0.232338
   3 │ 0.582333  0.381427  0.893386  0.85312

julia> data.Id = 1:nrow(data);

julia> data_long = stack(data, Not(:Id))
12×3 DataFrame
 Row │ Id     variable  value    
     │ Int64  String    Float64  
─────┼───────────────────────────
   1 │     1  400_x     0.858738
   2 │     2  400_x     0.330242
   3 │     3  400_x     0.582333
   4 │     1  400_y     0.306062
   5 │     2  400_y     0.874829
   6 │     3  400_y     0.381427
   7 │     1  450_x     0.376122
   8 │     2  450_x     0.983844
   9 │     3  450_x     0.893386
  10 │     1  450_y     0.964485
  11 │     2  450_y     0.232338
  12 │     3  450_y     0.85312

julia> disallowmissing!(unstack(data_long, :Id, :variable, :value))
3×5 DataFrame
 Row │ Id     400_x     400_y     450_x     450_y    
     │ Int64  Float64   Float64   Float64   Float64  
─────┼───────────────────────────────────────────────
   1 │     1  0.858738  0.306062  0.376122  0.964485
   2 │     2  0.330242  0.874829  0.983844  0.232338
   3 │     3  0.582333  0.381427  0.893386  0.85312

feanor12 · August 16, 2021, 2:16pm

I am sorry for the messy example. I just had my phone on hand and could not check it, but now the example code should work.

pdeffebach · August 16, 2021, 2:27pm

I think your solution seems fine. Can’t think of a more intuitive way to do it.

sijo · August 16, 2021, 6:19pm

I agree with @pdeffebach . Just note that you don’t need two transforms (and two split calls):

julia> data = DataFrame(["400_x", "400_y", "450_x", "450_y"] .=> rand.(3))
3×4 DataFrame
 Row │ 400_x     400_y      450_x     450_y    
     │ Float64   Float64    Float64   Float64  
─────┼─────────────────────────────────────────
   1 │ 0.316218  0.0977267  0.608225  0.633215
   2 │ 0.672883  0.50661    0.617916  0.960232
   3 │ 0.238429  0.437142   0.496646  0.736143

julia> data.Id = 1:nrow(data);

julia> data_long = stack(data, Not(:Id))
12×3 DataFrame
 Row │ Id     variable  value     
     │ Int64  String    Float64   
─────┼────────────────────────────
   1 │     1  400_x     0.316218
   2 │     2  400_x     0.672883
   3 │     3  400_x     0.238429
   4 │     1  400_y     0.0977267
   5 │     2  400_y     0.50661
   6 │     3  400_y     0.437142
   7 │     1  450_x     0.608225
   8 │     2  450_x     0.617916
   9 │     3  450_x     0.496646
  10 │     1  450_y     0.633215
  11 │     2  450_y     0.960232
  12 │     3  450_y     0.736143

julia> data_long2 = transform(data_long, :variable => ByRow(x->split(x, "_")) => [:temp, :variable])
12×4 DataFrame
 Row │ Id     variable   value      temp      
     │ Int64  SubStrin…  Float64    SubStrin… 
─────┼────────────────────────────────────────
   1 │     1  x          0.316218   400
   2 │     2  x          0.672883   400
   3 │     3  x          0.238429   400
   4 │     1  y          0.0977267  400
   5 │     2  y          0.50661    400
   6 │     3  y          0.437142   400
   7 │     1  x          0.608225   450
   8 │     2  x          0.617916   450
   9 │     3  x          0.496646   450
  10 │     1  y          0.633215   450
  11 │     2  y          0.960232   450
  12 │     3  y          0.736143   450

julia> unstack(data_long2)
6×4 DataFrame
 Row │ Id     temp       x         y         
     │ Int64  SubStrin…  Float64?  Float64?  
─────┼───────────────────────────────────────
   1 │     1  400        0.316218  0.0977267
   2 │     2  400        0.672883  0.50661
   3 │     3  400        0.238429  0.437142
   4 │     1  450        0.608225  0.633215
   5 │     2  450        0.617916  0.960232
   6 │     3  450        0.496646  0.736143

feanor12 · August 16, 2021, 7:08pm

Thank you! So by overwriting the variable column and changing the transform to emit multiple columns it already looks a bit better.

Topic		Replies	Views
Best practice to unstack a dataframe with a lot of columnes? General Usage question , dataframes	5	1251	March 11, 2020
DataFrame transformation question Data question	2	574	March 11, 2019
Dataframe unstack() column order General Usage dataframes	5	578	December 12, 2023
DataFrame unmelt-like operation Data question	1	602	December 20, 2017
How (best) to transform a huge DataFrame into wide-format General Usage dataframes	8	115	December 4, 2024

Unstack using row numbers

Related topics