I saw a post where @bkamins explained how to add a column that has the sum of the rows in selected columns ( (Julia) Assigning DataFrame column sum to a new column - Stack Overflow). Is there a way to use transform
to do the same, but with the mean
operation?
I know I can do something like:
df.mean = mean(Array(df), dims=2)
Another way with transform
is:
transform(df, names(df) => ByRow((i...) -> mean(i)))
Is there a cleaner way to do this with transform
?
1 Like
Definitely don’t do this. It will allocate tons of memory.
You want
transform(df, AsTable(:) => ByRow(mean) => :rowmean)
BTW, are you coming from Stata by chance? This is a common operation in Stata.
Also, the above will run into trouble if you have many many columns because it constructs a named tuple. Making it better has been the object of extensive discussion. See here.
1 Like
Awesome, thanks!
Not from Stata, just trying to plot the timeseries output of multiple stochastic simulations (mean +/- 1 std)
In this case, broadcasting seems to be faster & allocate less than using transform?
using DataFrames, BenchmarkTools
df = DataFrame(rand(1000,1000), :auto)
dg = deepcopy(df)
@btime transform($df, AsTable(:) => ByRow(mean) => :rowmean) # 322.4 ms (1015669 allocations: 77.59 MiB)
@btime dg.rowmean .= mean(Array($dg), dims=2) # 1.887 ms (1020 allocations: 7.67 MiB)
1 Like
Yeah, 1000
is definitely large enough for AsTable
to cause problems.
reduce(+, eachcol(df)) ./ ncol(df)
should be faster than both.
2 Likes