Row-wise mean of columns in a DataFrame

I saw a post where @bkamins explained how to add a column that has the sum of the rows in selected columns ( (Julia) Assigning DataFrame column sum to a new column - Stack Overflow). Is there a way to use transform to do the same, but with the mean operation?

I know I can do something like:

df.mean = mean(Array(df), dims=2)

Another way with transform is:

transform(df, names(df) => ByRow((i...) -> mean(i)))

Is there a cleaner way to do this with transform?

1 Like

Definitely don’t do this. It will allocate tons of memory.

You want

transform(df, AsTable(:) => ByRow(mean) => :rowmean)

BTW, are you coming from Stata by chance? This is a common operation in Stata.

Also, the above will run into trouble if you have many many columns because it constructs a named tuple. Making it better has been the object of extensive discussion. See here.

1 Like

Awesome, thanks!

Not from Stata, just trying to plot the timeseries output of multiple stochastic simulations (mean +/- 1 std)

In this case, broadcasting seems to be faster & allocate less than using transform?

using DataFrames, BenchmarkTools
df = DataFrame(rand(1000,1000), :auto)
dg = deepcopy(df)
@btime transform($df, AsTable(:) => ByRow(mean) => :rowmean)  # 322.4 ms (1015669 allocations: 77.59 MiB)
@btime dg.rowmean .= mean(Array($dg), dims=2)    # 1.887 ms (1020 allocations: 7.67 MiB)
1 Like

Yeah, 1000 is definitely large enough for AsTable to cause problems.

reduce(+, eachcol(df)) ./ ncol(df)

should be faster than both.

2 Likes