Performing multiple melt/stack operations on a dataset simultaneously

I’m importing some R data so decided to learn DataFrames manipulation in julia instead of doing it in R. I have a wide dataframe with some initial columns followed by four variables that span 12 columns, one for each month. I want them in long format arranged with month as a column. I used stack() to successfully transform for the first monthly variable from 12 columns to 12 rows with:

df = stack(df, 5:16)

Then I tried to do all of the 12 column blocks simultaneously with:

df = stack(df, [5:16, 17:28, 29:40, 41:52])

but there was no method defined for an array of ranges.

Any ideas about a similarly powerful way to do multiple stack operations at the same time?

1 Like

Use [5:16; 17:28; 29:40; 41:52] instead. With commas, you create an array of ranges instead of an array of integers.

Thanks! that does transform all columns ranges at once, but it appends each set of columns into the same two variable/value columns, not multiple sets of variable/value columns in the same rows. I guess I was expecting too much power built in, it’s probably not even possible in R! Maybe I’ll write a pull request that allows the comma syntax for multiple stacks in the same rows.

Is this what you want (assuming that [:x1,:x2,:x3,:x4] are names of variables you do not want to stack)?

reduce((x,y) -> join(x,y,on=[:x1,:x2,:x3,:x4], makeunique=true), getindex.(stack.(x,  [5:16, 17:28, 29:40, 41:52]),[1:6]))

alternatively you can drop those not-stacked variables if you do not need them:

hcat(getindex.(stack.(x,  [5:16, 17:28, 29:40, 41:52]),[1:2])..., makeunique=true)
1 Like

Unless your really pressed for performance, the easiest way is to stack the months then spread the 4 variables. Are the column names something like var_1_month_2 so that you can split out a separate column for month?

Thanks the reduce/join approach looks good.