DataFramesMeta selecting multiple columns

Using DataFramesMeta (or others), can I select multiple columns of choice and do some calculations?

Here’s a mock data. I would like to subset by column x, and then calculate column-wise means. Here’s the answer.

df = DataFrame(x = [1,1,1,2,3,2,3,2,3,], y1 = [2,1,2,1,2,1,2,1,2], y2 = [5,7,6,5,7,6,5,7,6], y3 = [4,2,3,4,2,3,4,2,3])
l = [:y1, :y2, :y3]
by(df, :x, d->mean(Array{Float64,2}(d[l]); dims=1))

3×4 DataFrame
│ Row │ x │ x1 │ x2 │ x3 │
├──┼─┼────┼───┼──┤
│ 1 │ 1 │ 1.66667 │ 6.0 │ 3.0 │
│ 2 │ 2 │ 1.0 │ 6.0 │ 3.0 │
│ 3 │ 3 │ 2.0 │ 6.0 │ 3.0 │

I also wanted to do this using @linq so that it can be coupled with other functions. But, mine didn’t work (obviously).

@linq df |> by(:x, mean(Array{Float64,2}(l; dims=2)))

I wanted to,

  1. subset a data frame by column x
  2. select columns of choice (y1, y2, y3)
  3. calculate column-wise means for y1, y2, y3

Any idea?

Thanks,

Can you clarify what you mean by “select multiple columns of choice and do some calcuations”?

It’s unclear why you have Array{Int,2} in your function call, as well. What are you trying to achieve?

@pdeffebach sorry about that. I edited my post. I think here’s a better representation.

I wanted to

  1. subset a data frame by column x
  2. select columns of choice (y1, y2, y3)
  3. calculate column-wise means for y1, y2, y3

Ah I see. What you are asking for is difficult to do with DataFramesMeta. Better to use Base DataFrames instead:

cols = [:x2, :x3, :x4]
by(df, :a) do d
       DataFrame([mean(d[col]) for col in cols]', cols)
end

@pdeffebach Thanks!

colwise and aggregate are also useful for this operation