Stack hierarchical data

question

#1

Hi,

I have some survey data which are organized hierarchically and I would like to transform them into a rectangular dataset. For example using JuliaDB:

t = table([1,1,1,2,2,2], [1,2,2,2,4,5],[10,20,10,5,20,10], names=[:ID, :Activity,:Duration])
Table with 6 rows, 3 columns:
ID  Activity  Duration
──────────────────────
1   1         10
1   2         20
1   2         10
2   2         5
2   4         20
2   5         10

Each activity can happen more than once per ID and some activities are not reported if the duration is zero. I would like to create a new variable for all possible Activities (which are 100) and sum the duration to something like this:

Table with 2 rows, 6 columns:
ID  x_1  x_2  x_3  x_4  x_5
───────────────────────────
1   10   30   0    0    0
2   0    5    0    20   10

I have tried the following:

duration = select(t,:Duration)
activity = select(t,:Activity)

for i in 1:5
           @eval $(Symbol("x_$i")) = zeros(Int,size(activity,1))
end

for j=1:size(activity,1)
    for i=1:5
        if activity[j]==i
            @eval $(Symbol("x_$i[$j]")) =duration[$j]
        else
            @eval $(Symbol("x_$i[$j]"))=0
        end
    end
end

and then use groupreduce using summation by ID and push the arrays into a new table.

However, the loop doesn’t work and all the x Arrays remain zero. Other variations such as

@eval $(Symbol("x_$i[$j]")) =duration[j]
ERROR: UndefVarError: j not defined

or

@eval $(Symbol("x_$i[$j]")) = $(Symbol("duration[$j]"))
ERROR: UndefVarError: duration[1] not defined

produce errors or return zeros.

Is there a way to make this work or a more efficient way of doing this without creating so many columns? For some activities of interest, I did it manually but since I would like to obtain the information for all the 100 activities it is not a feasible strategy.

The tricky part is that I need the information for the number of individuals that didn’t do some activities. The total duration per activity is straightforward and can be done without the above loops simply by groupreduce.

Thank you very much in advance!