Performing multiple melt/stack operations on a dataset simultaneously


#1

I’m importing some R data so decided to learn DataFrames manipulation in julia instead of doing it in R. I have a wide dataframe with some initial columns followed by four variables that span 12 columns, one for each month. I want them in long format arranged with month as a column. I used stack() to successfully transform for the first monthly variable from 12 columns to 12 rows with:

df = stack(df, 5:16)

Then I tried to do all of the 12 column blocks simultaneously with:

df = stack(df, [5:16, 17:28, 29:40, 41:52])

but there was no method defined for an array of ranges.

Any ideas about a similarly powerful way to do multiple stack operations at the same time?


#2

Use [5:16; 17:28; 29:40; 41:52] instead. With commas, you create an array of ranges instead of an array of integers.


#3

Thanks! that does transform all columns ranges at once, but it appends each set of columns into the same two variable/value columns, not multiple sets of variable/value columns in the same rows. I guess I was expecting too much power built in, it’s probably not even possible in R! Maybe I’ll write a pull request that allows the comma syntax for multiple stacks in the same rows.


#4

Is this what you want (assuming that [:x1,:x2,:x3,:x4] are names of variables you do not want to stack)?

reduce((x,y) -> join(x,y,on=[:x1,:x2,:x3,:x4], makeunique=true), getindex.(stack.(x,  [5:16, 17:28, 29:40, 41:52]),[1:6]))

alternatively you can drop those not-stacked variables if you do not need them:

hcat(getindex.(stack.(x,  [5:16, 17:28, 29:40, 41:52]),[1:2])..., makeunique=true)

#5

Unless your really pressed for performance, the easiest way is to stack the months then spread the 4 variables. Are the column names something like var_1_month_2 so that you can split out a separate column for month?


#6

Thanks the reduce/join approach looks good.