I’m trying to sample correlation matrices using the vine method explained here:
How to generate a large full-rank random correlation matrix with some strong correlations present? - Cross Validated (which is based on the JKL 2009 paper https://doi.org/10.1016/j.jmva.2009.04.008).
using Random
using LinearAlgebra
Random.seed!(123)
function random_correlation_matrix(d::Int)
P = fill(0.0, d, d)
S = Matrix{Float64}(I, d, d)
for k in 1:(d - 1), i in (k + 1):d
P[k, i] = (rand() - 0.5) * 2 # scale to [-1, 1]
p = P[k, i]
for l in (k - 1):-1:1
p = p * sqrt((1 - P[l, i] ^ 2) * (1 - P[l, k] ^ 2)) + P[l, i] * P[l, k]
end
S[k, i] = p
S[i, k] = p
end
permutation = shuffle(1:d)
S = S[permutation, permutation]
return S
end
S1 = random_correlation_matrix(10);
S2 = random_correlation_matrix(1000);
This works fine for lower dimensions d
, but fails for higher dimensions:
isposdef(S1) # true
isposdef(S2) # false
This seems to be due to accumulating floating-point rounding errors, as the smallest eigenvalues are all close to 0 but negative:
minimum(eigvals(S2)) # -8.46e-14
Is there a method to heal an almost positive semidefinite matrix like this (or tweaks to make the algorithm more robust)?
I’ve found a cor_nearPD
function in the Bigsimr.jl package that does exactly what I want, but the problem seems to be simple enough that I’d like to avoid adding another dependency.
I also tried doing an eigendecomposition and simply replace the negative eigenvalues in D, but that didn’t work out because I’m lacking the background in linear algebra to know what I’m doing.