I am trying to reproduce the following R code in Julia:
simdata_correlation <- function(coefs=c(0.2, 0.5), n=100){
# Generate outcome
y <- rnorm(n, 0, 1)
n_var <- length(coefs)
X <- scale(matrix(rnorm(n*(n_var), 0, 1), ncol=n_var))
X <- cbind(y, X)
# find the current correlation matrix
cor_0 <- var(X)
# cholesky decomposition to get independence
chol_0 <- solve(chol(cor_0))
X <- X %*% chol_0
# create new correlation structure (zeros can be replaced with other r vals)
coefs_structure <- diag(x = 1, nrow=n_var+1, ncol=n_var+1, names = TRUE)
coefs_structure[-1, 1] <- coefs
coefs_structure[1, -1] <- coefs
X <- X %*% chol(coefs_structure) * sd(y) + mean(y)
X <- X[,-1]
return(list(X=X, y=y))
}
However, I am unsuccessful. This might be because the R’s var
function returns a 3x3 matrix, while Julia’s Statistics.var
returns a value. Therefore, I’ve used cov
instead of var
. Unfortunately, the results are wrong…
R’s output:
> foo = simulate_data(c(0.2, 0.5))
> cor(foo$y, foo$X)
[,1] [,2]
[1,] 0.2 0.5
Julia’s attempt:
import LinearAlgebra, Distributions, Random, Statistics
function simdata_correlation(coefs=[0.2, 0.5], n=100)
# Generate outcome
y = Random.rand(Distributions.Normal(0, 1), n)
n_var = length(coefs)
X = Random.rand(Distributions.Normal(0, 1), n, n_var)
X = hcat(y, X)
# find the current correlation matrix
cor_structure = Statistics.cov(X)
chol = LinearAlgebra.cholesky(cor_structure).U
# cholesky decomposition to get independence
X = X * chol
# create new correlation structure (zeros can be replaced with other r vals)
coefs_structure = LinearAlgebra.eye(n_var+1, n_var+1)
coefs_structure[1, 2:end] = coefs
coefs_structure[2:end, 1] = coefs
X = X * LinearAlgebra.cholesky(coefs_structure).U
X = X * Statistics.std(y) * Statistics.mean(y)
X = X[:, 2:end]
return y, X
end
Julia’s output
y, X = simdata_correlation([0.2, 0.5])
cor(y, X)
> -0.293738 -0.329608
What am I doing wrong?