Correlation

statistics

#1

I have an array of two vectors and I want to compute the correlation of the components. Currently, I am computing this as

mean([X[j]*X[j]' for j=1:n_batch_iters])

which gives the desired result, but I was curious if this is the most efficient approach.


#2

You question is not clear, please give the dimensions of this array and what correlation you are interested in. In any case, try to use Julia’s cor(x,y) function if you can.


#3

Each X[j] is itself an array of length d and there are a total of N of them. Essentially, what I want to compute, is

\frac{1}{N}\sum_{n=1}^N X_n^{(i)} X_n^{(j)}

for each (i,j) pair, with X^{(i)}_n corresponding to the i-th component of X[n]


#4

The covariance of a random variable with itself is the variance, so var(X)?


#5

var does not appear to take an array of arrays as its arguments.


#6

So what do you do in this case?

  1. You can start with a matrix instead of an array of arrays and use var directly
    or
  2. You can stick with your original implementation (which assumes zero mean vectors)

#7

Any thoughts on the efficiency of one method over the other?


#8

Try it and see?