CDF of a copula from Copulas.jl?

I am experimenting with Copulas.jl to see if I could use it for my work. I am in general impressed. However, I found I could not get the CDF value from a joint distribution constructed by the copula. Getting the PDF value is fine, though. The following is a MWE.

using Copulas, Distributions

x = Normal(0, 1)
y = Normal(0, 2)
C = GaussianCopula([1 0.5; 0.5 1])
D = SklarDist(C, (x,y))

pdf(D, ([1.0, 1.0])) # this is fine
cdf(D, ([1.0, 1.0])) # this produces error

The last line produces the method error: "ERROR: MethodError: no method matching cdf(::GaussianCopula{2, Matrix{Float64}}, ::Vector{Float64})".

Any suggestion would be appreciated.

1 Like

cdf is more complicated function, if you have just one-variate distribution you can use QuadGK.jl, for multivariate - HCubature.jl or Quasi-Monte-Carlo approaches like in MvNormalCDF.jl

Indeed, CDF values can be computed using numerical integrations. The reason I brought it up is because the packageā€™s document indicates that the value could be obtained using the usual cdf() function, and that (as I just found out) it actually returns cdf values for Archimedean copulas though not for the more-often-used Gaussian copulas (which is what in my original question).

This is stated in the packageā€™s doc: ā€œThis package brings most standard copula features into native Julia: random number generation, pdf and cdf, fitting, copula-based multivariate distributions through Sklarā€™s theorem, etc., while fully complying with the Distributions.jl APIā€¦ā€

The following code shows that it does return cdf values for Archimedean copulas.

julia> using Copulas, Distributions

julia> x = Normal(0,1); y = Normal(0,2);

julia> C = GumbelCopula(2, 1.2)  # a type of Archimedean copula
GumbelCopula{2, Float64}(Īø=1.2)

julia> D = SklarDist(C, (x,y))
SklarDist{GumbelCopula{2, Float64}, Tuple{Normal{Float64}, Normal{Float64}}}(
C: GumbelCopula{2, Float64}(Īø=1.2)
m: (Normal{Float64}(Ī¼=0.0, Ļƒ=1.0), Normal{Float64}(Ī¼=0.0, Ļƒ=2.0))
)

julia> cdf(D, ([1.0, 1.0]))
0.6132230493600664

So, the cdf() works for Archimedean copulas but not for Gaussian copulas. Maybe there is a feature not been implemented yet? Or a bug? @lrnv

2 Likes

Indeed, this is simply not implemented yet, the package is not complete at all. Should be a quick fix though, could you open an issue on the repo about it, Iā€™ll take a look as soon as I can ?

Thanks for catching it !

1 Like

I fixed Copulas.jl (in the latest master commit), so that it provides a more accurate error: we rely on Distributions.jlā€™s implementation of the multivariate Gaussian and Student distributions to handle the complexity behind the Gaussian and Student copulas, and Distributions.jl has no multivariate Gaussian cdf.

Here is a Github issue for this : CDF of Multivariate Normal Distribution Ā· Issue #260 Ā· JuliaStats/Distributions.jl Ā· GitHub

Also, seems relevant there : MVN cdf by blackeneth Ā· Pull Request #114 Ā· JuliaStats/StatsFuns.jl Ā· GitHub

But it does not look like it is going to be fixed soon. If you find somewhere an integration routine that computes the cdf of a multivariate Gaussian, I am willing to include it as a temporary workaround though, until those issues are fixed in StatsFun.jl and Distributions.jl

I oppened an issue myself on Copulas.jl there : Elliptical CDF Ā· Issue #15 Ā· lrnv/Copulas.jl Ā· GitHub to cross-link everything.

1 Like

Thanks for looking into the issue! I saw that you already open an issue in the repo.

Itā€™s somewhat surprising to know that the cdf of multivariate Gaussians has not been implemented in Distributions.jl and thus cannot be (more or less) easily applied by Copulas.jl. On the other hand, as @PharmCat mentioned in the 2nd post above, the cdf can now be calculated using packages like MvNormalCDF.jl. Could Copulas.jl use this or other packages to fix the problem?

1 Like

Hi!
Unlikely that MvNormal cdf will be implemented in Distributions.jl or StatsFuns.jl , this needs multidimensional integration solution. @blackeneth provide Quasi-Monte-Carlo solution as it done in MATLAB and I make package (MvNormalCDF.jl) and include some improvements. Maybe this is most robust solution now.

PS if some things needs to be implemented in MvNormalCDF - just open issue, I will try to do that

1 Like

Yes we should try with MvNormalCdf.jl. you can make a PR if you want or wait until tomorrow, Iā€™ll do it.

1 Like

Thanks to @PharmCatā€™s volunteer. I have opened an issue.

@lrnv , @HJW019

If I understand correctly - anyway there is no analytical solution for this type, so if cdf will be implemented - it will be numerical method. Quasi-Monte-Carlo as I understand have sense in high-dimentional cases in your case any numerical method can be used.

If I understand correctly (point me if Iā€™m wrong) - this code could be used to get cdf:

using Copulas, Distributions, BenchmarkTools
import Cubature, HCubature

x = Normal(0, 1)
y = Normal(0, 2)
C = GaussianCopula([1 0.5; 0.5 1])
D = SklarDist(C, (x,y))

pdf(D, [1.0, 1.0]) # this is fine
#cdf(D, ([1.0, 1.0]))

func(x) = pdf(D, [x[1], x[2]])

HCubature.hcubature(func, [-8.0, -8.0], [8.0, 8.0], rtol=sqrt(eps()))
#(0.9999366573436514, 1.4884341341974467e-8)
Cubature.hcubature(func, [-8.0, -8.0], [8.0, 8.0], reltol=sqrt(eps()))
#(0.9999366573436526, 1.4884341344610422e-8)
Cubature.pcubature(func, [-8.0, -8.0], [8.0, 8.0], reltol=sqrt(eps()))
#(0.9999366575163308, 1.2989609388114332e-14)

@btime HCubature.hcubature(func, [-8.0, -8.0], [8.0, 8.0], rtol=sqrt(eps()))
#95.819 ms (791157 allocations: 36.17 MiB)

@btime Cubature.hcubature(func, [-8.0, -8.0], [8.0, 8.0], reltol=sqrt(eps()))
#83.939 ms (688260 allocations: 32.91 MiB)

@btime Cubature.pcubature(func, [-8.0, -8.0], [8.0, 8.0], reltol=sqrt(eps()))
#16.128 ms (125790 allocations: 6.01 MiB)
1 Like

Thanks; thatā€™s helpful. On the other hand, I was thinking of making cdf() workable to all elliptical copulas in Copulas.jl, as it currently does for the Archimedean copulas. So that, for example, cdf(D, ([0.1, 0.2])) would return the CDF value where D is constructed by GaussianCopula. It probably means that the HCubature or MvNormalCDF may have to be called by Copulas.

In terms of HCubature (used before) vs MvNormalCDF (never used): My experience with an old problem was that HCubature is quite good (fast and accurate) if the dimension is like <=4. For higher dimensions, the convergence rate of the cubature method deteriorates quickly and the estimation becomes very slow, leaving the quasi-Monte Carlo method the only choice (I coded myself using low discrepancy sequences such as the Halton). So, if Copulas.jl is to implement cdf() for elliptical copulas, the quasi-Monte Carlo method may be the choice for its generality, unless the package decides to support the cdf() up to a certain dimension.

Just to clarify: my original intent of this thread is not too much about a one-time calculation of the CDF, but more about using pdf() and cdf() in a consistent way in Copulas.jl. Sorry that it was not made clear in the original post.

(edited to add the last paragraph)

1 Like

If I understand correctly :smile: cdf win MvNormalCDF can be calculated:

using MvNormalCDF
using Copulas, Distributions

x = Normal(0, 1)
y = Normal(0, 2)
C = GaussianCopula([1 0.5; 0.5 1])
D = SklarDist(C, (x,y))

####
Ī¼ = mean.(D.m)
Ļƒ = collect(std.(D.m))
Ī£ = cor2cov(D.C.Ī£, Ļƒ)
MvNormalCDF.mvnormcdf(Ī¼, Ī£, [-1.0, -2.0], [3.0, 4.0], m = 6000)
#(0.7216874201273508, 4.10731083253638e-5)
####
func(x) = pdf(D, [x[1], x[2]])
Cubature.pcubature(func, [-1.0, -2.0], [3.0, 4.0], reltol=sqrt(eps()))
#(0.7217122929674319, 8.450906641144229e-12)

But I think this solution canā€™t be applied to other distributions. MvNormalCDF based on MATLAB qsimvnv() function by Alan Genz with using Richtmyer generators and use propeties of MvNormal distribution (see ref). Other multivariate types should implement general Quasi-Monte-Carlo integration tecniques. So I didnā€™t found packages for QMC multidimentional integration, may be someone wants to make it one day.

ref: ā€œNumerical Computation of Multivariate Normal Probabilitiesā€, in J. of Computational and Graphical Stat., 1(1992), pp. 141-149, by Alan Genz, WSU Math, PO Box 643113, Pullman, WA 99164-3113

2 Likes

What about computing the LU decomposition of the covariance matrix and using the univariate gaussian cdf ? I think this might be faster, and I do not really want to depend on Matlab to stay type agnostic. I think it might even work for any elliptical.

Iā€™ll give it a shot asap.

1 Like

I donā€™t know :sweat_smile: ā€¦ When I said that MvNormalCDF based on MATLAB - I mean than code was rewritten from MATLAB utilitie - ofcource MvNormalCDF is pure Julia package with minimal dependencies.

I found package MonteCarloIntegration.jl , I didnā€™t found QuasiMonteCarlo integration, but this QuasiMonteCarlo can helps. Maybe I can make mini-mackage for QuasiMonteCarloIntegratio if it may bu usefull.

1 Like

It should now be fixed on 0.1.5, currently registering.

@HJW019, I think you have now the behavior you wanted for all elliptical copulas, which is IMHO the correct behavior of the package. It might be slow for some cases (especially high dimensionalā€¦), but should be correct. It uses Cubatures.jl, but if you want to propose a version with QMC you can of course.

For Gaussian copulas, I used the fact that uncorrelated Gaussian random vectors and independent, which is maybe not the most performing algorithm, but its clean, and does ā€œonlyā€ require a linear solve and a matrix square root.

Thanks @PharmCat and @HJW019 for your input, do not hesitate to tell me if it works as expected !

1 Like

Wonderful! Thanks for the contribution. Now I think the package has all the basic functions for a typical copula analysis. Thanks again.

You are welcome, please come back if something is wrong !

++

I did a quick check on the updated cdf() and was puzzled by the finding. The result of cdf() on the Gaussian copula is inconsistent with results from three alternative ways of computing the CDF. Meanwhile, the three alternatives are consistent among themselves.

using Copulas, BivariateCopulas, Distributions, StatsBase, MvNormalCDF, Cubature

rho = 0.5
ub1, ub2 = -0.1, 0.1
x = Normal(0, 1)
y = Normal(0, 2)

#### Copulas.jl

C1 = Copulas.GaussianCopula([1 rho; rho 1]) 
D1 = SklarDist(C1, (x,y))
cdf(D1, [ub1, ub2]) # 0.24018536633697982

#### alternative 1: Cubature on D1

func(x) = pdf(D1, [x[1], x[2]])
Cubature.pcubature(func, [-15, -15], [ub1, ub2], reltol=sqrt(eps())) # (0.32190029773358525, 3.3549441003088987e-11)

#### alternative 2: MvNormalCDF on D1

Ī¼ = mean.(D1.m)
Ļƒ = collect(std.(D1.m))
Ī£ = cor2cov(D1.C.Ī£, Ļƒ)
MvNormalCDF.mvnormcdf(Ī¼, Ī£, [-Inf, -Inf], [ub1, ub2], m = 10000) # (0.3219097660568473, 1.5325442373789225e-5)

#### alternative 3: BivariateCopulas.jl

C2 = BivariateCopulas.Gaussian(rho)
D2 = C2(x, y);
cdf(D2, ub1, ub2) # 0.3219002977336174

As you can see, the first result of 0.24018536633697982 is quite different from others.

I looked at the source code:

function Distributions.cdf(C::CT,u) where {CT<:Copula}
    f(x) = pdf(C,x)
    z = zeros(eltype(u),length(C))
    return Cubature.pcubature(f,z,u,reltol=sqrt(eps()))[1]
end

The line on z seems to indicate that the lower bound of the integration is a vector of 0. However, I think it should be the lower bounds of the marginals of X and Y. In the above example where both of the marginals are normal, the domain should be in the range of (-Inf, u).

How to fix the program? I am scratching my head, as I donā€™t know how to pull out the lower bounds of user-specified marginals.

Indeed, there is a bug. This is troubling as the lower bound of the copula pdf should be zero and not -Infā€¦ The copula pdf is not integrating on the marginal scales.

Iā€™ll make a test out of your example and try to fix it. Thanks again :slight_smile: