Statistics on random matrices

Hi,
I am generating a lot of random matrices and want to study their statistical properties. Right now I am storing them in a 3-dimensional array, where the first two dimensions are the matrix sizes, let’s say square matrices of size D, and the third dimension is the number of realizations of random matrices. E.g.

D = 10
realizations = 20
matrices(D,D,realizations)

holds 20 10x10 matrices. Now I want to calculate some statistics for each entry of the random matrices, say the mean of each entry or the variance of each entry. I did this in two ways. First, by explicitly writing out the formulas for mean and variance and then using them for each entry, e.g.

mu = zeros(D,D)
for i in 1:realizations
  mu .+= matrices[:,:,i]
end
mu ./= realizations

Second, converting the D x D x realizations array into a DxD matrix of vectors of length realizations. And then using the functions mean and var from Statistics.

matrices_2(D,D)
mean.(matrices)

The array conversion hit performance really hard, because I did it naively in a loop.

How would achieve my goal, calculate some statistics for each entry of the random matrices, by using functions from Statistics or any other package, and not have to allocate a new array?

EDIT: The answer for mean and var is the marked answer, while for higher moment estimations (and general estimations) see the post of @nilshg Statistics on random matrices.

Functions like mean, var, std, and median take an optional keyword argument for computing things elementwise along a given dimension of a multidimensional array.

using Statistics

X = rand(2, 2, 10)

mean(X, dims = 3)
var(X, dims = 3)
std(X, dims = 3)
median(X, dims = 3)
3 Likes

@johnczito Thanks for the quick answer! I somehow missed that functionality by going through the documentation.

The result is a 2×2×1 Array. Is there a more elegant way to reduce the dimensions by 1 and get a 2x2 Array, than calling

mean(X, dims = 3)[:,:,1]

?

The dropdims function will do the trick.

using Statistics

X = rand(2, 2, 10)

julia> M = mean(X, dims = 3)
2×2×1 Array{Float64,3}:
[:, :, 1] =
 0.500996  0.491739
 0.545284  0.582671

julia> dropdims(M, dims = 3)
2×2 Array{Float64,2}:
 0.500996  0.491739
 0.545284  0.582671
3 Likes

@johnczito Perfect, thanks!

@johnczito Sorry to disturb you again. I looked into StatsBase and the functions to calculate skewness and kurtosis seem to miss the above mentioned functionality! Is there a workaround?

You can generally apply a function to different slices of a higher dimensional array using mapslices, see e.g. my answer here: https://stackoverflow.com/questions/62215025/julia-get-range-minimum-maximum-values-of-a-multidimensional-array-along-sp/62215277

3 Likes