I saw a post where @bkamins explained how to add a column that has the sum of the rows in selected columns ( (Julia) Assigning DataFrame column sum to a new column - Stack Overflow). Is there a way to use transform to do the same, but with the mean operation?
I know I can do something like:
df.mean = mean(Array(df), dims=2)
Another way with transform is:
transform(df, names(df) => ByRow((i...) -> mean(i)))
Is there a cleaner way to do this with transform?
Definitely don’t do this. It will allocate tons of memory.
You want
transform(df, AsTable(:) => ByRow(mean) => :rowmean)
BTW, are you coming from Stata by chance? This is a common operation in Stata.
Also, the above will run into trouble if you have many many columns because it constructs a named tuple. Making it better has been the object of extensive discussion. See here.
Awesome, thanks!
Not from Stata, just trying to plot the timeseries output of multiple stochastic simulations (mean +/- 1 std)
In this case, broadcasting seems to be faster & allocate less than using transform?
using DataFrames, BenchmarkTools
df = DataFrame(rand(1000,1000), :auto)
dg = deepcopy(df)
@btime transform($df, AsTable(:) => ByRow(mean) => :rowmean) # 322.4 ms (1015669 allocations: 77.59 MiB)
@btime dg.rowmean .= mean(Array($dg), dims=2) # 1.887 ms (1020 allocations: 7.67 MiB)
Yeah, 1000 is definitely large enough for AsTable to cause problems.
reduce(+, eachcol(df)) ./ ncol(df)
should be faster than both.