Wrong cov2cor()?

tomtom · January 9, 2019, 12:20pm

seems like StatsBase.cov2cor() is not doing correctly. First, it takes 2 arguments: the covariance matrix and a “a vector of standard deviations”, which should NOT be required. (noted that in cor2cov() we do need both of these two arguments).

now, doing some experiments, seems like cov2cor() could NOT recover the correlation matrix:

using StatsBase
s = [2.0, 3.0]
corr = [1.0 0.2; -0.5 1.0]
covv = transpose(s .* transpose(corr) ) .* s

julia> covv
2×2 Array{Float64,2}:
  4.0  1.2
 -3.0  9.0

julia> corr
2×2 Array{Float64,2}:
  1.0  0.2
 -0.5  1.0

julia> StatsBase.cov2cor(corr, s)
2×2 Array{Float64,2}:
  1.0        -0.0833333
 -0.0833333   1.0

julia> StatsBase.cov2cor(corr, s * 0.5)
2×2 Array{Float64,2}:
  1.0       -0.333333
 -0.333333   1.0

julia> StatsBase.cov2cor(corr, [1.0, 1.0])
2×2 Array{Float64,2}:
  1.0  -0.5
 -0.5   1.0

^^^ all correlation matrices obtained by StatsBase.cov2cor() are incorrect. Noted that the results are always symmetric: which is wrong.

R does give the correct correlation matrix. It takes one argument only:

> cov2cor(matrix(c(4, -3, 1.2, 9), nr = 2))
     [,1] [,2]
[1,]  1.0  0.2
[2,] -0.5  1.0

Moreover, going into the source code, while covcor() is implemented inside cov.jl under StatsBase, it actually calls covcor!() that is implemented inside Statistics.jl under Statistics. It’s kind of strange, isn’t it?

Finally, I don’t see the implementation of covcor!() is correct …

tomtom · January 9, 2019, 1:44pm

I would like to propose the following based on R’s implementation:

function cov2cor_proposing(C::AbstractMatrix)
    (nr, nc) = size(C)
    @assert nr == nc

    s = sqrt.(1.0 ./ LinearAlgebra.diag(C) )
    corr = transpose(s .* transpose(C) ) .* s
    corr[LinearAlgebra.diagind(corr) ] .= 1.0

    return corr
end

julia> cov2cor_proposing(covv)
2×2 Array{Float64,2}:
  1.0  0.2
 -0.5  1.0

scheidan · January 9, 2019, 1:52pm

I agree that it should not be necessary to provide the standard deviations to ‘cov2cor’.

However, the current implementation does provide the correct result .This works fine:

import StatsBase

s = [2.0, 3.0]
corr = [1.0 -0.5; -0.5 1.0]
covv = (s * s') .* corr

StatsBase.cov2cor(covv, s)

The problem in your example is that you used a ill-defined, unsymmetric covariance matrix (which should be positive semidefinite, check with ‘LinearAlgebra.isposdef’)

tomtom · January 9, 2019, 2:07pm

why don’t we use a simpler implementation that works in all cases?

scheidan · January 9, 2019, 4:23pm

I’ve opened an issue in StatsBase: `cov2cor` why must the standard deviations be given as argument? · Issue #444 · JuliaStats/StatsBase.jl · GitHub

Topic		Replies	Views
What wrong i cov() in Julia 5.0? General Usage	8	1465	September 2, 2017
The equivalent of R's var() for matrix? Statistics	2	570	August 27, 2018
Generate Covariance Matrix with Pre-Specified Restrictions Statistics question	6	862	July 23, 2020
What is the current best practice for population formula for cov? General Usage	5	692	December 3, 2016
Whats the easiest way to create correlation matrices in Julia? New to Julia question , statistics	5	4282	November 5, 2021

Wrong cov2cor()?

Related topics