In the documentation for Dataframe.eachcol, I noticed these two examples:
julia> sum.(eachcol(df))
2-element Array{Int64,1}:
10
50
julia> collect(eachcol(df, true))
2-element Array{Pair{Symbol,AbstractArray{T,1} where T},1}:
:x => [1, 2, 3, 4]
:y => [11, 12, 13, 14]
Is there a way to combine these two examples where you return pairs as in the second example, but each AbstractArray is replaced with its sum? I tried:
collect(sum.(eachcol(df,true)))
But that didnβt work, I think because sum is being applied to the Pairs and not to the AbstractArrays.
Iβm relatively new to coding, so let me know if you need clarification or more information.
At the end of the day, I just want the name of each column in a dataframe paired with its sum.
1 Like
Welcome to Juliaβs Discourse!
Disclaimer, i donβt use dataframes
this gives the combination of the two operations as requested
names(df).=> sum.(eachcol(df))
If you put this in a dataframe, it will give you another dataframe
DataFrame(names(df) .=> sum.(eachcol(df)))
2Γ2 DataFrame
β Row β first β second β
β β Symbol β Int64 β
βββββββΌβββββββββΌβββββββββ€
β 1 β x β 10 β
β 2 β y β 50 β
Another way, putting the result in a Dict, gives you the transpose
DataFrame(Dict(names(df) .=> sum.(eachcol(df))))
1Γ2 DataFrame
β Row β x β y β
β β Int64 β Int64 β
βββββββΌββββββββΌββββββββ€
β 1 β 10 β 50 β
for a quick and dirty manipulation, i think is fine, but for big datasets, my intuition tells me that something is wrong
1 Like
Thanks Andres! A couple follow-ups:
- Just to make sure Iβm interpreting correctly, this expression is taking two arraysβone of the dataframe column names and one of sums of each dataframe columnβthen is broadcasting each value of the first array as a key for each corresponding value in the second array. Am I reading that right?
- You mentioned you donβt use dataframes, is there a different data structure you would personally use instead?