Wrong cov2cor()?

seems like StatsBase.cov2cor() is not doing correctly. First, it takes 2 arguments: the covariance matrix and a “a vector of standard deviations”, which should NOT be required. (noted that in cor2cov() we do need both of these two arguments).

now, doing some experiments, seems like cov2cor() could NOT recover the correlation matrix:

using StatsBase
s = [2.0, 3.0]
corr = [1.0 0.2; -0.5 1.0]
covv = transpose(s .* transpose(corr) ) .* s

julia> covv
2×2 Array{Float64,2}:
  4.0  1.2
 -3.0  9.0

julia> corr
2×2 Array{Float64,2}:
  1.0  0.2
 -0.5  1.0

julia> StatsBase.cov2cor(corr, s)
2×2 Array{Float64,2}:
  1.0        -0.0833333
 -0.0833333   1.0

julia> StatsBase.cov2cor(corr, s * 0.5)
2×2 Array{Float64,2}:
  1.0       -0.333333
 -0.333333   1.0

julia> StatsBase.cov2cor(corr, [1.0, 1.0])
2×2 Array{Float64,2}:
  1.0  -0.5
 -0.5   1.0

^^^ all correlation matrices obtained by StatsBase.cov2cor() are incorrect. Noted that the results are always symmetric: which is wrong.

R does give the correct correlation matrix. It takes one argument only:

> cov2cor(matrix(c(4, -3, 1.2, 9), nr = 2))
     [,1] [,2]
[1,]  1.0  0.2
[2,] -0.5  1.0

Moreover, going into the source code, while covcor() is implemented inside cov.jl under StatsBase, it actually calls covcor!() that is implemented inside Statistics.jl under Statistics. It’s kind of strange, isn’t it?

Finally, I don’t see the implementation of covcor!() is correct …

3 Likes

I would like to propose the following based on R’s implementation:

function cov2cor_proposing(C::AbstractMatrix)
    (nr, nc) = size(C)
    @assert nr == nc

    s = sqrt.(1.0 ./ LinearAlgebra.diag(C) )
    corr = transpose(s .* transpose(C) ) .* s
    corr[LinearAlgebra.diagind(corr) ] .= 1.0

    return corr
end

julia> cov2cor_proposing(covv)
2×2 Array{Float64,2}:
  1.0  0.2
 -0.5  1.0
1 Like

I agree that it should not be necessary to provide the standard deviations to ‘cov2cor’.

However, the current implementation does provide the correct result .This works fine:

import StatsBase

s = [2.0, 3.0]
corr = [1.0 -0.5; -0.5 1.0]
covv = (s * s') .* corr

StatsBase.cov2cor(covv, s)

The problem in your example is that you used a ill-defined, unsymmetric covariance matrix (which should be positive semidefinite, check with ‘LinearAlgebra.isposdef’)

why don’t we use a simpler implementation that works in all cases?

I’ve opened an issue in StatsBase: `cov2cor` why must the standard deviations be given as argument? · Issue #444 · JuliaStats/StatsBase.jl · GitHub

3 Likes