Hello, I have a Dataframe with a frequency matrix generate about events in relation to a user.
I need to reduce this matrix in unique USER_ID and sum each EVENT CATEGORY - events categories correspond to category 1,2,3,4.
I have tried using often functions in relation with DataFrames, for example, group by, filter or map but I canβt obtain a unique ID with the sum of each category values.
Thanks for the help!
nilshg
September 16, 2020, 6:16am
2
If I understand you correctly you just want to groupby
the USER_ID
and then sum the columns? You can do this:
julia> using DataFrames
julia> df = hcat(DataFrame(USER_ID = rand(100:110, 20)), DataFrame(rand(0:1, 20, 10)))
20Γ11 DataFrame
β Row β USER_ID β x1 β x2 β x3 β x4 β x5 β x6 β x7 β x8 β x9 β x10 β
β β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β
βββββββΌββββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββ€
β 1 β 107 β 0 β 0 β 0 β 1 β 1 β 1 β 1 β 0 β 1 β 0 β
β 2 β 100 β 0 β 1 β 1 β 0 β 0 β 1 β 0 β 0 β 0 β 1 β
β 3 β 109 β 0 β 1 β 0 β 1 β 1 β 1 β 1 β 0 β 0 β 0 β
β 4 β 102 β 1 β 1 β 1 β 1 β 0 β 0 β 1 β 1 β 1 β 1 β
β 5 β 103 β 1 β 1 β 0 β 0 β 1 β 0 β 0 β 0 β 1 β 1 β
β 6 β 109 β 1 β 0 β 1 β 0 β 1 β 1 β 0 β 0 β 1 β 0 β
β 7 β 101 β 1 β 1 β 0 β 1 β 1 β 1 β 0 β 0 β 1 β 0 β
β 8 β 101 β 0 β 0 β 0 β 0 β 1 β 1 β 1 β 0 β 0 β 0 β
β 9 β 102 β 0 β 0 β 1 β 1 β 1 β 1 β 1 β 1 β 1 β 0 β
β 10 β 106 β 0 β 0 β 1 β 0 β 0 β 0 β 0 β 0 β 1 β 0 β
β 11 β 100 β 1 β 1 β 0 β 0 β 1 β 1 β 1 β 1 β 0 β 0 β
β 12 β 105 β 1 β 0 β 0 β 0 β 0 β 0 β 0 β 0 β 1 β 1 β
β 13 β 101 β 0 β 1 β 1 β 1 β 1 β 1 β 0 β 0 β 1 β 1 β
β 14 β 109 β 0 β 0 β 1 β 1 β 1 β 1 β 0 β 0 β 1 β 0 β
β 15 β 107 β 0 β 1 β 1 β 1 β 1 β 1 β 0 β 1 β 1 β 1 β
β 16 β 101 β 0 β 1 β 1 β 1 β 1 β 0 β 0 β 1 β 0 β 0 β
β 17 β 102 β 0 β 0 β 1 β 0 β 0 β 1 β 1 β 1 β 1 β 0 β
β 18 β 104 β 0 β 1 β 1 β 0 β 0 β 0 β 0 β 0 β 0 β 1 β
β 19 β 105 β 0 β 0 β 0 β 0 β 0 β 1 β 1 β 1 β 1 β 0 β
β 20 β 104 β 1 β 0 β 0 β 1 β 1 β 1 β 0 β 1 β 0 β 1 β
julia> to_group = names(df[!, Not(:USER_ID)])
10-element Array{String,1}:
"x1"
"x2"
"x3"
"x4"
"x5"
"x6"
"x7"
"x8"
"x9"
"x10"
julia> combine(groupby(df, :USER_ID), to_group .=> sum .=> to_group)
9Γ11 DataFrame
β Row β USER_ID β x1 β x2 β x3 β x4 β x5 β x6 β x7 β x8 β x9 β x10 β
β β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β Int64 β
βββββββΌββββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββΌββββββββ€
β 1 β 107 β 0 β 1 β 1 β 2 β 2 β 2 β 1 β 1 β 2 β 1 β
β 2 β 100 β 1 β 2 β 1 β 0 β 1 β 2 β 1 β 1 β 0 β 1 β
β 3 β 109 β 1 β 1 β 2 β 2 β 3 β 3 β 1 β 0 β 2 β 0 β
β 4 β 102 β 1 β 1 β 3 β 2 β 1 β 2 β 3 β 3 β 3 β 1 β
β 5 β 103 β 1 β 1 β 0 β 0 β 1 β 0 β 0 β 0 β 1 β 1 β
β 6 β 101 β 1 β 3 β 2 β 3 β 4 β 3 β 1 β 1 β 2 β 1 β
β 7 β 106 β 0 β 0 β 1 β 0 β 0 β 0 β 0 β 0 β 1 β 0 β
β 8 β 105 β 1 β 0 β 0 β 0 β 0 β 1 β 1 β 1 β 2 β 1 β
β 9 β 104 β 1 β 1 β 1 β 1 β 1 β 1 β 0 β 1 β 0 β 2 β
1 Like
xiaodai
September 16, 2020, 6:17am
3
An MWE would help Please read: make it easier to help you
Is this what u mean?
using DataFrames
using DataConvenience: @>
@> df begin
groupby(:USER_ID)
combine( 1:12 .=> sum)
end
1 Like
Thanks @nilshg , that is the solution!!
1 Like