Combine two capabilities of DataFrames.eachcol

In the documentation for Dataframe.eachcol, I noticed these two examples:

julia> sum.(eachcol(df))
2-element Array{Int64,1}:
 10
 50

julia> collect(eachcol(df, true))
2-element Array{Pair{Symbol,AbstractArray{T,1} where T},1}:
 :x => [1, 2, 3, 4]
 :y => [11, 12, 13, 14]

Is there a way to combine these two examples where you return pairs as in the second example, but each AbstractArray is replaced with its sum? I tried:

collect(sum.(eachcol(df,true)))

But that didn’t work, I think because sum is being applied to the Pairs and not to the AbstractArrays.

I’m relatively new to coding, so let me know if you need clarification or more information. :nerd_face:

At the end of the day, I just want the name of each column in a dataframe paired with its sum.

1 Like

Welcome to Julia’s Discourse!

Disclaimer, i don’t use dataframes
this gives the combination of the two operations as requested

names(df).=> sum.(eachcol(df))

If you put this in a dataframe, it will give you another dataframe

DataFrame(names(df) .=> sum.(eachcol(df)))

2Γ—2 DataFrame
β”‚ Row β”‚ first  β”‚ second β”‚
β”‚     β”‚ Symbol β”‚ Int64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ x      β”‚ 10     β”‚
β”‚ 2   β”‚ y      β”‚ 50     β”‚

Another way, putting the result in a Dict, gives you the transpose

DataFrame(Dict(names(df) .=> sum.(eachcol(df))))

1Γ—2 DataFrame
β”‚ Row β”‚ x     β”‚ y     β”‚
β”‚     β”‚ Int64 β”‚ Int64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 10    β”‚ 50    β”‚

for a quick and dirty manipulation, i think is fine, but for big datasets, my intuition tells me that something is wrong :sweat_smile:

1 Like

Thanks Andres! A couple follow-ups:

  • Just to make sure I’m interpreting correctly, this expression is taking two arrays–one of the dataframe column names and one of sums of each dataframe column–then is broadcasting each value of the first array as a key for each corresponding value in the second array. Am I reading that right?
  • You mentioned you don’t use dataframes, is there a different data structure you would personally use instead?