CDF of a copula from Copulas.jl?

HJW019 · August 15, 2022, 4:22pm

I am experimenting with Copulas.jl to see if I could use it for my work. I am in general impressed. However, I found I could not get the CDF value from a joint distribution constructed by the copula. Getting the PDF value is fine, though. The following is a MWE.

using Copulas, Distributions

x = Normal(0, 1)
y = Normal(0, 2)
C = GaussianCopula([1 0.5; 0.5 1])
D = SklarDist(C, (x,y))

pdf(D, ([1.0, 1.0])) # this is fine
cdf(D, ([1.0, 1.0])) # this produces error

The last line produces the method error: "ERROR: MethodError: no method matching cdf(::GaussianCopula{2, Matrix{Float64}}, ::Vector{Float64})".

Any suggestion would be appreciated.

PharmCat · August 16, 2022, 10:33am

cdf is more complicated function, if you have just one-variate distribution you can use QuadGK.jl, for multivariate - HCubature.jl or Quasi-Monte-Carlo approaches like in MvNormalCDF.jl

HJW019 · August 16, 2022, 12:56pm

Indeed, CDF values can be computed using numerical integrations. The reason I brought it up is because the package’s document indicates that the value could be obtained using the usual cdf() function, and that (as I just found out) it actually returns cdf values for Archimedean copulas though not for the more-often-used Gaussian copulas (which is what in my original question).

This is stated in the package’s doc: “This package brings most standard copula features into native Julia: random number generation, pdf and cdf, fitting, copula-based multivariate distributions through Sklar’s theorem, etc., while fully complying with the Distributions.jl API…”

The following code shows that it does return cdf values for Archimedean copulas.

julia> using Copulas, Distributions

julia> x = Normal(0,1); y = Normal(0,2);

julia> C = GumbelCopula(2, 1.2)  # a type of Archimedean copula
GumbelCopula{2, Float64}(θ=1.2)

julia> D = SklarDist(C, (x,y))
SklarDist{GumbelCopula{2, Float64}, Tuple{Normal{Float64}, Normal{Float64}}}(
C: GumbelCopula{2, Float64}(θ=1.2)
m: (Normal{Float64}(μ=0.0, σ=1.0), Normal{Float64}(μ=0.0, σ=2.0))
)

julia> cdf(D, ([1.0, 1.0]))
0.6132230493600664

So, the cdf() works for Archimedean copulas but not for Gaussian copulas. Maybe there is a feature not been implemented yet? Or a bug? @lrnv

lrnv · August 16, 2022, 3:31pm

Indeed, this is simply not implemented yet, the package is not complete at all. Should be a quick fix though, could you open an issue on the repo about it, I’ll take a look as soon as I can ?

Thanks for catching it !

lrnv · August 16, 2022, 4:18pm

I fixed Copulas.jl (in the latest master commit), so that it provides a more accurate error: we rely on Distributions.jl’s implementation of the multivariate Gaussian and Student distributions to handle the complexity behind the Gaussian and Student copulas, and Distributions.jl has no multivariate Gaussian cdf.

Here is a Github issue for this : CDF of Multivariate Normal Distribution · Issue #260 · JuliaStats/Distributions.jl · GitHub

Also, seems relevant there : MVN cdf by blackeneth · Pull Request #114 · JuliaStats/StatsFuns.jl · GitHub

But it does not look like it is going to be fixed soon. If you find somewhere an integration routine that computes the cdf of a multivariate Gaussian, I am willing to include it as a temporary workaround though, until those issues are fixed in StatsFun.jl and Distributions.jl

I oppened an issue myself on Copulas.jl there : Elliptical CDF · Issue #15 · lrnv/Copulas.jl · GitHub to cross-link everything.

HJW019 · August 16, 2022, 4:55pm

Thanks for looking into the issue! I saw that you already open an issue in the repo.

It’s somewhat surprising to know that the cdf of multivariate Gaussians has not been implemented in Distributions.jl and thus cannot be (more or less) easily applied by Copulas.jl. On the other hand, as @PharmCat mentioned in the 2nd post above, the cdf can now be calculated using packages like MvNormalCDF.jl. Could Copulas.jl use this or other packages to fix the problem?

PharmCat · August 16, 2022, 6:10pm

Hi!
Unlikely that MvNormal cdf will be implemented in Distributions.jl or StatsFuns.jl , this needs multidimensional integration solution. @blackeneth provide Quasi-Monte-Carlo solution as it done in MATLAB and I make package (MvNormalCDF.jl) and include some improvements. Maybe this is most robust solution now.

PS if some things needs to be implemented in MvNormalCDF - just open issue, I will try to do that

lrnv · August 16, 2022, 7:36pm

Yes we should try with MvNormalCdf.jl. you can make a PR if you want or wait until tomorrow, I’ll do it.

HJW019 · August 17, 2022, 2:02am

Thanks to @PharmCat’s volunteer. I have opened an issue.

PharmCat · August 17, 2022, 1:30pm

@lrnv , @HJW019

If I understand correctly - anyway there is no analytical solution for this type, so if cdf will be implemented - it will be numerical method. Quasi-Monte-Carlo as I understand have sense in high-dimentional cases in your case any numerical method can be used.

If I understand correctly (point me if I’m wrong) - this code could be used to get cdf:

using Copulas, Distributions, BenchmarkTools
import Cubature, HCubature

x = Normal(0, 1)
y = Normal(0, 2)
C = GaussianCopula([1 0.5; 0.5 1])
D = SklarDist(C, (x,y))

pdf(D, [1.0, 1.0]) # this is fine
#cdf(D, ([1.0, 1.0]))

func(x) = pdf(D, [x[1], x[2]])

HCubature.hcubature(func, [-8.0, -8.0], [8.0, 8.0], rtol=sqrt(eps()))
#(0.9999366573436514, 1.4884341341974467e-8)
Cubature.hcubature(func, [-8.0, -8.0], [8.0, 8.0], reltol=sqrt(eps()))
#(0.9999366573436526, 1.4884341344610422e-8)
Cubature.pcubature(func, [-8.0, -8.0], [8.0, 8.0], reltol=sqrt(eps()))
#(0.9999366575163308, 1.2989609388114332e-14)

@btime HCubature.hcubature(func, [-8.0, -8.0], [8.0, 8.0], rtol=sqrt(eps()))
#95.819 ms (791157 allocations: 36.17 MiB)

@btime Cubature.hcubature(func, [-8.0, -8.0], [8.0, 8.0], reltol=sqrt(eps()))
#83.939 ms (688260 allocations: 32.91 MiB)

@btime Cubature.pcubature(func, [-8.0, -8.0], [8.0, 8.0], reltol=sqrt(eps()))
#16.128 ms (125790 allocations: 6.01 MiB)

HJW019 · August 17, 2022, 2:08pm

Thanks; that’s helpful. On the other hand, I was thinking of making cdf() workable to all elliptical copulas in Copulas.jl, as it currently does for the Archimedean copulas. So that, for example, cdf(D, ([0.1, 0.2])) would return the CDF value where D is constructed by GaussianCopula. It probably means that the HCubature or MvNormalCDF may have to be called by Copulas.

In terms of HCubature (used before) vs MvNormalCDF (never used): My experience with an old problem was that HCubature is quite good (fast and accurate) if the dimension is like <=4. For higher dimensions, the convergence rate of the cubature method deteriorates quickly and the estimation becomes very slow, leaving the quasi-Monte Carlo method the only choice (I coded myself using low discrepancy sequences such as the Halton). So, if Copulas.jl is to implement cdf() for elliptical copulas, the quasi-Monte Carlo method may be the choice for its generality, unless the package decides to support the cdf() up to a certain dimension.

Just to clarify: my original intent of this thread is not too much about a one-time calculation of the CDF, but more about using pdf() and cdf() in a consistent way in Copulas.jl. Sorry that it was not made clear in the original post.

(edited to add the last paragraph)

PharmCat · August 17, 2022, 4:36pm

If I understand correctly cdf win MvNormalCDF can be calculated:

using MvNormalCDF
using Copulas, Distributions

x = Normal(0, 1)
y = Normal(0, 2)
C = GaussianCopula([1 0.5; 0.5 1])
D = SklarDist(C, (x,y))

####
μ = mean.(D.m)
σ = collect(std.(D.m))
Σ = cor2cov(D.C.Σ, σ)
MvNormalCDF.mvnormcdf(μ, Σ, [-1.0, -2.0], [3.0, 4.0], m = 6000)
#(0.7216874201273508, 4.10731083253638e-5)
####
func(x) = pdf(D, [x[1], x[2]])
Cubature.pcubature(func, [-1.0, -2.0], [3.0, 4.0], reltol=sqrt(eps()))
#(0.7217122929674319, 8.450906641144229e-12)

But I think this solution can’t be applied to other distributions. MvNormalCDF based on MATLAB qsimvnv() function by Alan Genz with using Richtmyer generators and use propeties of MvNormal distribution (see ref). Other multivariate types should implement general Quasi-Monte-Carlo integration tecniques. So I didn’t found packages for QMC multidimentional integration, may be someone wants to make it one day.

ref: “Numerical Computation of Multivariate Normal Probabilities”, in J. of Computational and Graphical Stat., 1(1992), pp. 141-149, by Alan Genz, WSU Math, PO Box 643113, Pullman, WA 99164-3113

lrnv · August 17, 2022, 9:39pm

What about computing the LU decomposition of the covariance matrix and using the univariate gaussian cdf ? I think this might be faster, and I do not really want to depend on Matlab to stay type agnostic. I think it might even work for any elliptical.

I’ll give it a shot asap.

PharmCat · August 17, 2022, 10:00pm

I don’t know … When I said that MvNormalCDF based on MATLAB - I mean than code was rewritten from MATLAB utilitie - ofcource MvNormalCDF is pure Julia package with minimal dependencies.

PharmCat · August 18, 2022, 11:52am

I found package MonteCarloIntegration.jl , I didn’t found QuasiMonteCarlo integration, but this QuasiMonteCarlo can helps. Maybe I can make mini-mackage for QuasiMonteCarloIntegratio if it may bu usefull.

lrnv · August 18, 2022, 2:46pm

It should now be fixed on 0.1.5, currently registering.

@HJW019, I think you have now the behavior you wanted for all elliptical copulas, which is IMHO the correct behavior of the package. It might be slow for some cases (especially high dimensional…), but should be correct. It uses Cubatures.jl, but if you want to propose a version with QMC you can of course.

For Gaussian copulas, I used the fact that uncorrelated Gaussian random vectors and independent, which is maybe not the most performing algorithm, but its clean, and does “only” require a linear solve and a matrix square root.

Thanks @PharmCat and @HJW019 for your input, do not hesitate to tell me if it works as expected !

HJW019 · August 18, 2022, 3:45pm

Wonderful! Thanks for the contribution. Now I think the package has all the basic functions for a typical copula analysis. Thanks again.

lrnv · August 18, 2022, 4:03pm

You are welcome, please come back if something is wrong !

++

HJW019 · August 18, 2022, 5:36pm

I did a quick check on the updated cdf() and was puzzled by the finding. The result of cdf() on the Gaussian copula is inconsistent with results from three alternative ways of computing the CDF. Meanwhile, the three alternatives are consistent among themselves.

using Copulas, BivariateCopulas, Distributions, StatsBase, MvNormalCDF, Cubature

rho = 0.5
ub1, ub2 = -0.1, 0.1
x = Normal(0, 1)
y = Normal(0, 2)

#### Copulas.jl

C1 = Copulas.GaussianCopula([1 rho; rho 1]) 
D1 = SklarDist(C1, (x,y))
cdf(D1, [ub1, ub2]) # 0.24018536633697982

#### alternative 1: Cubature on D1

func(x) = pdf(D1, [x[1], x[2]])
Cubature.pcubature(func, [-15, -15], [ub1, ub2], reltol=sqrt(eps())) # (0.32190029773358525, 3.3549441003088987e-11)

#### alternative 2: MvNormalCDF on D1

μ = mean.(D1.m)
σ = collect(std.(D1.m))
Σ = cor2cov(D1.C.Σ, σ)
MvNormalCDF.mvnormcdf(μ, Σ, [-Inf, -Inf], [ub1, ub2], m = 10000) # (0.3219097660568473, 1.5325442373789225e-5)

#### alternative 3: BivariateCopulas.jl

C2 = BivariateCopulas.Gaussian(rho)
D2 = C2(x, y);
cdf(D2, ub1, ub2) # 0.3219002977336174

As you can see, the first result of 0.24018536633697982 is quite different from others.

I looked at the source code:

function Distributions.cdf(C::CT,u) where {CT<:Copula}
    f(x) = pdf(C,x)
    z = zeros(eltype(u),length(C))
    return Cubature.pcubature(f,z,u,reltol=sqrt(eps()))[1]
end

The line on z seems to indicate that the lower bound of the integration is a vector of 0. However, I think it should be the lower bounds of the marginals of X and Y. In the above example where both of the marginals are normal, the domain should be in the range of (-Inf, u).

How to fix the program? I am scratching my head, as I don’t know how to pull out the lower bounds of user-specified marginals.

lrnv · August 18, 2022, 10:34pm

Indeed, there is a bug. This is troubling as the lower bound of the copula pdf should be zero and not -Inf… The copula pdf is not integrating on the marginal scales.

I’ll make a test out of your example and try to fix it. Thanks again

Topic		Replies	Views
[ANN] Copulas.jl : A fully `Distributions.jl`-compliant copula package Package Announcements package , announcement , distributions , copula	31	3768	September 3, 2024
Cdf of multivariate normal in distributions.jl Statistics	24	10450	April 24, 2023
MVN CDF -- have it coded, need help getting integrating into Distributions.jl Statistics	18	2848	March 25, 2022
Copulas in Julia? Statistics question	17	4508	August 6, 2022
[ANN] MvNormalCDF.jl - multivariate normal CDF Package Announcements package , announcement , statistics	1	847	November 12, 2021

CDF of a copula from Copulas.jl?

Related topics