Drop array dimension with Statistics.mean etc

I’m using aggregations such as Statistics.mean on multi-dimension arrays, working across a specific dimension. The default behavior is to return an Array with the same number of dimensions, but only one entry in the aggregated dimension. That is:

julia> using Statistics

julia> A = [1 2; 3 4]
2×2 Array{Int64,2}:
 1  2
 3  4

julia> mean(A)
2.5

julia> mean(A, dims=1)
1×2 Array{Float64,2}:
 2.0  3.0

julia> mean(A, dims=1)[1, :]
2-element Array{Float64,1}:
 2.0
 3.0

Since I usually want an array with a reduced number of dimensions, I apply the [1, :] to the result. But I find this annoying and somewhat error-prone.

Question 1: What work-around do you know for this use-case?
Question 2: Why would you prefer the current behavior of mean over the one I want?

For completeness-sake, if I want to take the row-wise means, I’m using

julia> mean(A, dims=2)[:, 1]
2-element Array{Float64,1}:
 1.5
 3.5

Note that when the keyword argument dim is not used, a scalar value is returned, not a 1x1 Array.

You can easily do this:

julia> dropmean(A; dims=:) = dropdims(mean(A; dims=dims); dims=dims)
dropmean (generic function with 1 method)

julia> A = rand(5, 7);

julia> dropmean(A, dims=1)
7-element Array{Float64,1}:
 0.37404294862855475
 0.26512340497828324
 0.4450455597397507 
 0.5827853682438383 
 0.7073441513338029 
 0.5301565274545441 
 0.4448844069896148 

To your question 2, how about this?

julia> Anorm = A ./ sum(A, dims=1)
5×7 Array{Float64,2}:
 0.358107   0.643074   0.411948   0.154884   0.26041    0.355657   0.17027   
 0.0519266  0.180166   0.136706   0.209475   0.0600134  0.20015    0.00529412
 0.272589   0.12818    0.126928   0.286857   0.260759   0.035573   0.443182  
 0.120199   0.0335493  0.29278    0.0304732  0.203559   0.362079   0.0478683 
 0.197178   0.0150304  0.0316369  0.31831    0.215259   0.0465401  0.333385  

julia> sum(Anorm, dims=1)
1×7 Array{Float64,2}:
 1.0  1.0  1.0  1.0  1.0  1.0  1.0

Keeping the dimensions ensures alignment.

2 Likes

Thanks, I didn’t know about dropdims, which makes it easy, indeed.
I was worried that mymean would need to look at the number of dimensions of the input array to produce the appropriate [:, :, :, ...].