DataFrame creating a rowmean for 95 columns

korilium · March 7, 2021, 12:55pm

Hello,

I am quite new to Julia and I was wondering how to implement a rowmean using a dataframe type. For Arrays, this is quite easy but I can not seem to figure out how to do it with a dataframe because most transformations work with the columns instead of the rows. I tried to transpose the data using the permutedims function, but I got an error


test = SP500Return[week2:end,groupedbeta[1].name]

permutedims(test, 1)

ArgumentError: src_namescol must have eltype `Symbol` or `<:AbstractString`

I have 95 columns which I have grouped in 10 portfolio’s based on the beta’s of the stocks (each column is a stock in my portfolio), so renaming them becomes quite annoying. Does somebody has an idea on how to solve this issue.

I also tried:

combine(SP500Return[week2:end,groupedbeta[1].name], groupedbeta[1].name .=> ByRow(mean))

But then he returns the DataFrame back.

Thank you in advance

pdeffebach · March 7, 2021, 4:06pm

Sorry I’m a bit confused as for what you are asking.

Do you mean like Stata’s rowmean

egen x = rowmean(`vars')

There is no transpose for DataFrames.

I think you want AsTable

julia> df = DataFrame(rand(1000, 100), :auto);
julia> transform(df, AsTable(Between(:x50, :x100)) => ByRow(mean) => :mean_50_100)

tbeason · March 7, 2021, 4:23pm

Yea the standard way to do this is

transform!(df,["col1","col2","col3"] => ByRow(mean) => "meancol")

where you just need to update the strings to correspond to your list of columns and the name of your new column. The AsTable in the above answer is a handy shortcut instead of listing the column names 1 by 1.

pdeffebach · March 7, 2021, 4:28pm

This will fail

julia> mean(1, 2, 3)
ERROR: MethodError: no method matching mean(::Int64, ::Int64, ::Int64)

You need the AsTable so that the input is a NamedTuple.

tbeason · March 7, 2021, 4:42pm

You’re right! I guess that’s just what I want to work. I think I made that mistake just a few days ago too…

EDIT: It is just a failing of the mean method though, not of the approach here.

korilium · March 7, 2021, 4:57pm

Thank you for your responses!

tbeason · March 7, 2021, 4:57pm

For posterity I will add the fix to my earlier solution. Annoying that this is needed, but you can make it a tuple before passing to mean.

transform!(df,["col1","col2","col3"] => ByRow(mean) ∘ ByRow(tuple) => "meancol")

Topic		Replies	Views
Row-wise mean of columns in a DataFrame Data	4	1722	August 13, 2021
Byrow function to get the mean of row Data dataframes , inmemorydatasets	10	253	August 1, 2023
Apply function By Row without re-stating column names General Usage dataframes , functions	36	3488	May 9, 2022
Passing DataFrameRow to transform in split-apply-combine General Usage question , dataframes	4	305	May 1, 2023
DataFramesMeta.jl version 0.11.0 Release Package Announcements dataframesmeta	0	532	April 18, 2022

DataFrame creating a rowmean for 95 columns

Related topics