The equivalent of R's var() for matrix?

I am trying to reproduce the following R code in Julia:

simdata_correlation <- function(coefs=c(0.2, 0.5), n=100){

  # Generate outcome
  y <- rnorm(n, 0, 1)

  n_var <- length(coefs)
  X <- scale(matrix(rnorm(n*(n_var), 0, 1), ncol=n_var))
  X <- cbind(y, X)

  # find the current correlation matrix
  cor_0 <- var(X)

  # cholesky decomposition to get independence
  chol_0 <- solve(chol(cor_0))

  X <-  X %*% chol_0

  # create new correlation structure (zeros can be replaced with other r vals)
  coefs_structure <- diag(x = 1, nrow=n_var+1, ncol=n_var+1, names = TRUE)
  coefs_structure[-1, 1] <- coefs
  coefs_structure[1, -1] <- coefs

  X <- X %*% chol(coefs_structure) * sd(y) + mean(y)
  X <- X[,-1]


  return(list(X=X, y=y))
}

However, I am unsuccessful. This might be because the R’s var function returns a 3x3 matrix, while Julia’s Statistics.var returns a value. Therefore, I’ve used cov instead of var. Unfortunately, the results are wrong…

R’s output:

> foo = simulate_data(c(0.2, 0.5))
> cor(foo$y, foo$X)
     [,1] [,2]
[1,]  0.2  0.5

Julia’s attempt:

import LinearAlgebra, Distributions, Random, Statistics

function simdata_correlation(coefs=[0.2, 0.5], n=100)

  # Generate outcome
  y = Random.rand(Distributions.Normal(0, 1), n)

  n_var = length(coefs)
  X = Random.rand(Distributions.Normal(0, 1), n, n_var)
  X = hcat(y, X)

  # find the current correlation matrix
  cor_structure = Statistics.cov(X)
  chol = LinearAlgebra.cholesky(cor_structure).U

  # cholesky decomposition to get independence
  X = X * chol

  # create new correlation structure (zeros can be replaced with other r vals)
  coefs_structure = LinearAlgebra.eye(n_var+1, n_var+1)
  coefs_structure[1, 2:end] = coefs
  coefs_structure[2:end, 1] = coefs

  X = X * LinearAlgebra.cholesky(coefs_structure).U
  X = X * Statistics.std(y) * Statistics.mean(y)

  X = X[:, 2:end]


  return y, X

end

Julia’s output

y, X = simdata_correlation([0.2, 0.5])

cor(y, X)
> -0.293738  -0.329608

What am I doing wrong?

You have two bugs in your Julia function

  1. X = X * chol -> X = X / chol
  2. X = X * Statistics.std(y) * Statistics.mean(y) -> X = X * Statistics.std(y) .+ Statistics.mean(y)
2 Likes

awesome thanks a lot!