# Row-wise mean of columns in a DataFrame

I saw a post where @bkamins explained how to add a column that has the sum of the rows in selected columns ( (Julia) Assigning DataFrame column sum to a new column - Stack Overflow). Is there a way to use `transform` to do the same, but with the `mean` operation?

I know I can do something like:

``````df.mean = mean(Array(df), dims=2)
``````

Another way with `transform` is:

``````transform(df, names(df) => ByRow((i...) -> mean(i)))
``````

Is there a cleaner way to do this with `transform`?

Definitely don’t do this. It will allocate tons of memory.

You want

``````transform(df, AsTable(:) => ByRow(mean) => :rowmean)
``````

BTW, are you coming from Stata by chance? This is a common operation in Stata.

Also, the above will run into trouble if you have many many columns because it constructs a named tuple. Making it better has been the object of extensive discussion. See here.

1 Like

Awesome, thanks!

Not from Stata, just trying to plot the timeseries output of multiple stochastic simulations (mean +/- 1 std)

In this case, broadcasting seems to be faster & allocate less than using transform?

``````using DataFrames, BenchmarkTools
df = DataFrame(rand(1000,1000), :auto)
dg = deepcopy(df)
@btime transform(\$df, AsTable(:) => ByRow(mean) => :rowmean)  # 322.4 ms (1015669 allocations: 77.59 MiB)
@btime dg.rowmean .= mean(Array(\$dg), dims=2)    # 1.887 ms (1020 allocations: 7.67 MiB)
``````
1 Like

Yeah, `1000` is definitely large enough for `AsTable` to cause problems.

``````reduce(+, eachcol(df)) ./ ncol(df)
``````

should be faster than both.

2 Likes