I have a table with multiple column pairs I dictating x and y values. If I stack them I can add columns bases on the column names for each pair, but do unstack them I need the original row numbers again.
What is the best way to do this?
At the moment I tried something like this
using DataFrames
using Chain
data=DataFrame("400_x"=>rand(5),"400_y"=>rand(5),"450_x"=>rand(5),"450_y"=>rand(5))
data[!,:Id]=1:nrow(data)
new_data = @chain data begin
stack(Not(:Id))
transform(:variable=>ByRow(x->parse(Int,split(x,"_")[1]))=>:temp)
transform(:variable=>ByRow(x->split(x,"_")[2])=>:axis)
unstack([:Id,:temp],:axis,:value)
select(Not(:Id))
end
Update:
Now that I have access to my laptop I checked the example code and it should work now. Basically I want to convert a table like this:
5Γ4 DataFrame
Row β 400_x 400_y 450_x 450_y
β Float64 Float64 Float64 Float64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββ
1 β 0.619571 0.965287 0.206373 0.908058
2 β 0.147488 0.534391 0.539502 0.0534432
3 β 0.252395 0.148259 0.779658 0.0337005
4 β 0.0718452 0.0305076 0.760252 0.946083
5 β 0.921851 0.0851036 0.890209 0.930951
into something like this:
Row β temp x y
β Int64 Float64? Float64?
ββββββΌβββββββββββββββββββββββββββββββββ
1 β 400 0.619571 0.965287
2 β 400 0.147488 0.534391
3 β 400 0.252395 0.148259
4 β 400 0.0718452 0.0305076
5 β 400 0.921851 0.0851036
6 β 450 0.206373 0.908058
7 β 450 0.539502 0.0534432
8 β 450 0.779658 0.0337005
9 β 450 0.760252 0.946083
10 β 450 0.890209 0.930951
I just feel like I am doing it in an over-complicated way. So maybe one of the data gurus can give me some hints on how to improve it.
sijo
August 15, 2021, 11:49am
2
Itβs hard to understand what you want without working code and without an example of input and desired output, but maybe this will help:
julia> data = DataFrame(["400_x", "400_y", "450_x", "450_y"] .=> rand.(3))
3Γ4 DataFrame
Row β 400_x 400_y 450_x 450_y
β Float64 Float64 Float64 Float64
ββββββΌββββββββββββββββββββββββββββββββββββββββ
1 β 0.858738 0.306062 0.376122 0.964485
2 β 0.330242 0.874829 0.983844 0.232338
3 β 0.582333 0.381427 0.893386 0.85312
julia> data.Id = 1:nrow(data);
julia> data_long = stack(data, Not(:Id))
12Γ3 DataFrame
Row β Id variable value
β Int64 String Float64
ββββββΌβββββββββββββββββββββββββββ
1 β 1 400_x 0.858738
2 β 2 400_x 0.330242
3 β 3 400_x 0.582333
4 β 1 400_y 0.306062
5 β 2 400_y 0.874829
6 β 3 400_y 0.381427
7 β 1 450_x 0.376122
8 β 2 450_x 0.983844
9 β 3 450_x 0.893386
10 β 1 450_y 0.964485
11 β 2 450_y 0.232338
12 β 3 450_y 0.85312
julia> disallowmissing!(unstack(data_long, :Id, :variable, :value))
3Γ5 DataFrame
Row β Id 400_x 400_y 450_x 450_y
β Int64 Float64 Float64 Float64 Float64
ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββ
1 β 1 0.858738 0.306062 0.376122 0.964485
2 β 2 0.330242 0.874829 0.983844 0.232338
3 β 3 0.582333 0.381427 0.893386 0.85312
2 Likes
I am sorry for the messy example. I just had my phone on hand and could not check it, but now the example code should work.
I think your solution seems fine. Canβt think of a more intuitive way to do it.
1 Like
sijo
August 16, 2021, 6:19pm
5
I agree with @pdeffebach . Just note that you donβt need two transforms (and two split
calls):
julia> data = DataFrame(["400_x", "400_y", "450_x", "450_y"] .=> rand.(3))
3Γ4 DataFrame
Row β 400_x 400_y 450_x 450_y
β Float64 Float64 Float64 Float64
ββββββΌβββββββββββββββββββββββββββββββββββββββββ
1 β 0.316218 0.0977267 0.608225 0.633215
2 β 0.672883 0.50661 0.617916 0.960232
3 β 0.238429 0.437142 0.496646 0.736143
julia> data.Id = 1:nrow(data);
julia> data_long = stack(data, Not(:Id))
12Γ3 DataFrame
Row β Id variable value
β Int64 String Float64
ββββββΌββββββββββββββββββββββββββββ
1 β 1 400_x 0.316218
2 β 2 400_x 0.672883
3 β 3 400_x 0.238429
4 β 1 400_y 0.0977267
5 β 2 400_y 0.50661
6 β 3 400_y 0.437142
7 β 1 450_x 0.608225
8 β 2 450_x 0.617916
9 β 3 450_x 0.496646
10 β 1 450_y 0.633215
11 β 2 450_y 0.960232
12 β 3 450_y 0.736143
julia> data_long2 = transform(data_long, :variable => ByRow(x->split(x, "_")) => [:temp, :variable])
12Γ4 DataFrame
Row β Id variable value temp
β Int64 SubStrinβ¦ Float64 SubStrinβ¦
ββββββΌββββββββββββββββββββββββββββββββββββββββ
1 β 1 x 0.316218 400
2 β 2 x 0.672883 400
3 β 3 x 0.238429 400
4 β 1 y 0.0977267 400
5 β 2 y 0.50661 400
6 β 3 y 0.437142 400
7 β 1 x 0.608225 450
8 β 2 x 0.617916 450
9 β 3 x 0.496646 450
10 β 1 y 0.633215 450
11 β 2 y 0.960232 450
12 β 3 y 0.736143 450
julia> unstack(data_long2)
6Γ4 DataFrame
Row β Id temp x y
β Int64 SubStrinβ¦ Float64? Float64?
ββββββΌβββββββββββββββββββββββββββββββββββββββ
1 β 1 400 0.316218 0.0977267
2 β 2 400 0.672883 0.50661
3 β 3 400 0.238429 0.437142
4 β 1 450 0.608225 0.633215
5 β 2 450 0.617916 0.960232
6 β 3 450 0.496646 0.736143
Thank you! So by overwriting the variable column and changing the transform to emit multiple columns it already looks a bit better.