Row-wise mean of columns in a DataFrame

hdavid16 · August 13, 2021, 8:07pm

I saw a post where @bkamins explained how to add a column that has the sum of the rows in selected columns ( (Julia) Assigning DataFrame column sum to a new column - Stack Overflow). Is there a way to use transform to do the same, but with the mean operation?

I know I can do something like:

df.mean = mean(Array(df), dims=2)

Another way with transform is:

transform(df, names(df) => ByRow((i...) -> mean(i)))

Is there a cleaner way to do this with transform?

pdeffebach · August 13, 2021, 8:11pm

Definitely don’t do this. It will allocate tons of memory.

You want

transform(df, AsTable(:) => ByRow(mean) => :rowmean)

BTW, are you coming from Stata by chance? This is a common operation in Stata.

Also, the above will run into trouble if you have many many columns because it constructs a named tuple. Making it better has been the object of extensive discussion. See here.

hdavid16 · August 13, 2021, 8:14pm

Awesome, thanks!

Not from Stata, just trying to plot the timeseries output of multiple stochastic simulations (mean +/- 1 std)

rafael.guerra · August 13, 2021, 10:58pm

In this case, broadcasting seems to be faster & allocate less than using transform?

using DataFrames, BenchmarkTools
df = DataFrame(rand(1000,1000), :auto)
dg = deepcopy(df)
@btime transform($df, AsTable(:) => ByRow(mean) => :rowmean)  # 322.4 ms (1015669 allocations: 77.59 MiB)
@btime dg.rowmean .= mean(Array($dg), dims=2)    # 1.887 ms (1020 allocations: 7.67 MiB)

pdeffebach · August 13, 2021, 11:58pm

Yeah, 1000 is definitely large enough for AsTable to cause problems.

reduce(+, eachcol(df)) ./ ncol(df)

should be faster than both.

Topic		Replies	Views
DataFrame creating a rowmean for 95 columns General Usage question , dataframes	6	482	March 7, 2021
Mean of multiple DataFrames New to Julia dataframes	10	2607	April 17, 2022
Different ways to calculate rowwise sums? New to Julia dataframes	7	1649	November 8, 2021
How to use the transform function on data frames General Usage dataframes	6	1680	October 5, 2023
Broadcast transformed data from single row to multiple columns General Usage dataframes , dataframesmeta	13	568	December 7, 2022

Row-wise mean of columns in a DataFrame

Related topics