Unstack using row numbers

I have a table with multiple column pairs I dictating x and y values. If I stack them I can add columns bases on the column names for each pair, but do unstack them I need the original row numbers again.
What is the best way to do this?
At the moment I tried something like this

using DataFrames
using Chain
data=DataFrame("400_x"=>rand(5),"400_y"=>rand(5),"450_x"=>rand(5),"450_y"=>rand(5))
data[!,:Id]=1:nrow(data)
new_data = @chain data begin
           stack(Not(:Id))
           transform(:variable=>ByRow(x->parse(Int,split(x,"_")[1]))=>:temp)
           transform(:variable=>ByRow(x->split(x,"_")[2])=>:axis)
           unstack([:Id,:temp],:axis,:value)
           select(Not(:Id))
           end

Update:
Now that I have access to my laptop I checked the example code and it should work now. Basically I want to convert a table like this:

5Γ—4 DataFrame
 Row β”‚ 400_x      400_y      450_x     450_y
     β”‚ Float64    Float64    Float64   Float64
─────┼───────────────────────────────────────────
   1 β”‚ 0.619571   0.965287   0.206373  0.908058
   2 β”‚ 0.147488   0.534391   0.539502  0.0534432
   3 β”‚ 0.252395   0.148259   0.779658  0.0337005
   4 β”‚ 0.0718452  0.0305076  0.760252  0.946083
   5 β”‚ 0.921851   0.0851036  0.890209  0.930951

into something like this:

 Row β”‚ temp       x          y
     β”‚ Int64  Float64?   Float64?
─────┼─────────────────────────────────
   1 β”‚ 400        0.619571   0.965287
   2 β”‚ 400        0.147488   0.534391
   3 β”‚ 400        0.252395   0.148259
   4 β”‚ 400        0.0718452  0.0305076
   5 β”‚ 400        0.921851   0.0851036
   6 β”‚ 450        0.206373   0.908058
   7 β”‚ 450        0.539502   0.0534432
   8 β”‚ 450        0.779658   0.0337005
   9 β”‚ 450        0.760252   0.946083
  10 β”‚ 450        0.890209   0.930951

I just feel like I am doing it in an over-complicated way. So maybe one of the data gurus can give me some hints on how to improve it. :slight_smile:

It’s hard to understand what you want without working code and without an example of input and desired output, but maybe this will help:

julia> data = DataFrame(["400_x", "400_y", "450_x", "450_y"] .=> rand.(3))
3Γ—4 DataFrame
 Row β”‚ 400_x     400_y     450_x     450_y    
     β”‚ Float64   Float64   Float64   Float64  
─────┼────────────────────────────────────────
   1 β”‚ 0.858738  0.306062  0.376122  0.964485
   2 β”‚ 0.330242  0.874829  0.983844  0.232338
   3 β”‚ 0.582333  0.381427  0.893386  0.85312

julia> data.Id = 1:nrow(data);

julia> data_long = stack(data, Not(:Id))
12Γ—3 DataFrame
 Row β”‚ Id     variable  value    
     β”‚ Int64  String    Float64  
─────┼───────────────────────────
   1 β”‚     1  400_x     0.858738
   2 β”‚     2  400_x     0.330242
   3 β”‚     3  400_x     0.582333
   4 β”‚     1  400_y     0.306062
   5 β”‚     2  400_y     0.874829
   6 β”‚     3  400_y     0.381427
   7 β”‚     1  450_x     0.376122
   8 β”‚     2  450_x     0.983844
   9 β”‚     3  450_x     0.893386
  10 β”‚     1  450_y     0.964485
  11 β”‚     2  450_y     0.232338
  12 β”‚     3  450_y     0.85312

julia> disallowmissing!(unstack(data_long, :Id, :variable, :value))
3Γ—5 DataFrame
 Row β”‚ Id     400_x     400_y     450_x     450_y    
     β”‚ Int64  Float64   Float64   Float64   Float64  
─────┼───────────────────────────────────────────────
   1 β”‚     1  0.858738  0.306062  0.376122  0.964485
   2 β”‚     2  0.330242  0.874829  0.983844  0.232338
   3 β”‚     3  0.582333  0.381427  0.893386  0.85312
2 Likes

I am sorry for the messy example. I just had my phone on hand and could not check it, but now the example code should work.

I think your solution seems fine. Can’t think of a more intuitive way to do it.

1 Like

I agree with @pdeffebach . Just note that you don’t need two transforms (and two split calls):

julia> data = DataFrame(["400_x", "400_y", "450_x", "450_y"] .=> rand.(3))
3Γ—4 DataFrame
 Row β”‚ 400_x     400_y      450_x     450_y    
     β”‚ Float64   Float64    Float64   Float64  
─────┼─────────────────────────────────────────
   1 β”‚ 0.316218  0.0977267  0.608225  0.633215
   2 β”‚ 0.672883  0.50661    0.617916  0.960232
   3 β”‚ 0.238429  0.437142   0.496646  0.736143

julia> data.Id = 1:nrow(data);

julia> data_long = stack(data, Not(:Id))
12Γ—3 DataFrame
 Row β”‚ Id     variable  value     
     β”‚ Int64  String    Float64   
─────┼────────────────────────────
   1 β”‚     1  400_x     0.316218
   2 β”‚     2  400_x     0.672883
   3 β”‚     3  400_x     0.238429
   4 β”‚     1  400_y     0.0977267
   5 β”‚     2  400_y     0.50661
   6 β”‚     3  400_y     0.437142
   7 β”‚     1  450_x     0.608225
   8 β”‚     2  450_x     0.617916
   9 β”‚     3  450_x     0.496646
  10 β”‚     1  450_y     0.633215
  11 β”‚     2  450_y     0.960232
  12 β”‚     3  450_y     0.736143

julia> data_long2 = transform(data_long, :variable => ByRow(x->split(x, "_")) => [:temp, :variable])
12Γ—4 DataFrame
 Row β”‚ Id     variable   value      temp      
     β”‚ Int64  SubStrin…  Float64    SubStrin… 
─────┼────────────────────────────────────────
   1 β”‚     1  x          0.316218   400
   2 β”‚     2  x          0.672883   400
   3 β”‚     3  x          0.238429   400
   4 β”‚     1  y          0.0977267  400
   5 β”‚     2  y          0.50661    400
   6 β”‚     3  y          0.437142   400
   7 β”‚     1  x          0.608225   450
   8 β”‚     2  x          0.617916   450
   9 β”‚     3  x          0.496646   450
  10 β”‚     1  y          0.633215   450
  11 β”‚     2  y          0.960232   450
  12 β”‚     3  y          0.736143   450

julia> unstack(data_long2)
6Γ—4 DataFrame
 Row β”‚ Id     temp       x         y         
     β”‚ Int64  SubStrin…  Float64?  Float64?  
─────┼───────────────────────────────────────
   1 β”‚     1  400        0.316218  0.0977267
   2 β”‚     2  400        0.672883  0.50661
   3 β”‚     3  400        0.238429  0.437142
   4 β”‚     1  450        0.608225  0.633215
   5 β”‚     2  450        0.617916  0.960232
   6 β”‚     3  450        0.496646  0.736143

Thank you! So by overwriting the variable column and changing the transform to emit multiple columns it already looks a bit better.