[ANN] Copulas.jl : A fully `Distributions.jl`-compliant copula package

TLDR: I am very pleased to announce my first registered package: Copulas.jl !

What is it

This package aims at bringing into native Julia most of the standard copula features: random number generation, fitting, construction of copula-based multivariate distributions through Sklar’s theorem, etc. while fully complying with the Distributions.jl API (after all, copulas are distributions functions).

Usually, people that use and work with copulas turn to R, because of the amazing R package copula.
While still perfectly maintained and updated today, the R package copula is full of obscured, heavily optimized, fast C code on one hand, and obscure, heavily optimized slow R code on the other hand (pun intended :wink: ).

This is an attempt to provide a very light, fast, reliable and maintainable copula implementation in native Julia (in particular, type-agnostic, so it’ll work with arbitrary type of floats like Float32 for speed, BigFloats or DoubleFloats or MultiFloats for precision), with correct SIMD’sation, etc.

Two of the exported types are of most importance:

  • Copula : this is an abstract mother type for all our copulas.
  • SklarDist : Allows to construct a multivariate distribution by specifying the copula and the marginals, through Sklar’s theorem.

What is already implemented

The API we implemented contains random number generations, cdf and pdf evaluations, and the fit function from Distributions.jl. Typical use case might look like this:

using Copulas, Distributions, Random
X₁ = Gamma(2,3)
X₂ = Pareto()
X₃ = LogNormal(0,1)
C = FranckCopula(3,7) # A 3-variate Franck Copula with θ = 7
D = SklarDist(C,(X₁,X₂,X₃)) # The final distribution

# This generates a (3,1000)-sized dataset from the multivariate distribution D
simu = rand(D,1000)

# While the following estimates the parameters of the model from a dataset : 
D̂ = fit(SklarDist{ClaytonCopula,Tuple{Gamma,Normal,LogNormal}}, simu)
# Increase the number of observations to get a beter fit  (well, might not work. can you spot why ?)

Atop from the very neat SklarDist type, available copulas are :

  • EmpiricalCopula
  • GaussianCopula
  • TCopula
  • ArchimedeanCopula (general, for any generator)
  • ClaytonCopula (as an instantiation of ArchimedeanCopula, see after)

Next ones to be implemented will probably be :

  • A few more archimedeans : FranckCopula, AMHCopula, JoeCopula, GumbelCopula
  • Nested archimedeans (general, with the possibility to nest any family with any family, assuming it is possible, with parameter checks.)
  • Bernstein copula and more general Beta copula as smoothing of the Empirical copula.
  • CheckerboardCopula (and more generally PatchworkCopula)

Adding a new ArchimedeanCopula is very easy. The Clayton implementation is as short as :

struct ClaytonCopula{d,T} <: ArchimedeanCopula{d}
    θ::T
end
ClaytonCopula(d,θ)            = ClaytonCopula{d,typeof(θ)}(θ)     # Constructor
radial_dist(C::ClaytonCopula) = Distributions.Gamma(1/C.θ,1)      # Radial distribution
ϕ(C::ClaytonCopula,      t)   = (1+sign(C.θ)*t)^(-1/C.θ)          # Generator
ϕ⁻¹(C::ClaytonCopula,      t) = sign(C.θ)*(t^(-C.θ)-1)            # Inverse Generator
τ(C::ClaytonCopula)           = C.θ/(C.θ+2)                       # θ -> τ
τ⁻¹(::Type{ClaytonCopula},τ)  = 2τ/(1-τ)                          # τ -> θ

The Archimedean API is modular :

  • To sample an archimedean, only radial_dist and ϕ are needed.
  • To evaluate the cdf, only ϕ and ϕ⁻¹ are needed
  • Currently to fit the copula, τ⁻¹ is needed as we use inverted tau moment method. But we plan on also implementing inverse rho and MLE (density needed).

Both FranckCopula, AMHCopula, JoeCopula, GumbelCopula and many others are waiting for you to implement these methods !

Dev roadmap

With only a few dozens of lines of code, in a few hours, I reproduced a large part of the R copula package, which used thousands of lines of code for the same functionality…

However, the release still lacks documentation and tests, which are on top of my todo-list. Nested archimedean copulas with very complient nesting (no more restriction to only nest from the same family, as in R) is my next goal. Later, I’d like Bernstein & Beta copulas, Checkerboards & Patchworks, etc… The rest of the todo-list can be found in the package’s README.

I hope you’ll enjoy the package !

62 Likes

Exactly my experience with PowerAnalysis.jl versus R’s pwr package! I could avoid a lot of code duplication via Julia’s expressiveness. Julia is not all about raw throughput. Also about simplicity. Happy to hear that it worked out for Copulas.jl too

11 Likes

The ease I had to code certain functionalities really surprised me, as I remember coding some S4 classes in R (in another life) that were really cumbersome… When I started I thought this was a project for the summer, and in a week-end I did half of it. I realize now that using the Distributions.jl framework is actually what allowed me to do this as efficiently.

5 Likes

Super nice to know there is a Copulas.jl package now. Thanks for the contribution @lrnv!

I had the exact same experience writing CoDa.jl, the equivalent of R’s compositions package. It is so nice to code in Julia!

5 Likes

Thanks ! As i said, this is only a first draft and many things are lacking in the package… starting with a propper testing suite :wink: But my goal is parity with the R package and even beyond.

2 Likes

This is great! I’ll be sure to check it out in the near future.

1 Like

For the unitialisated… Copula (probability theory) - Wikipedia
Also the links to the two leading packages in R and Python

2 Likes

@sylvaticus quoting you in Op, hope you dont mind.

1 Like

nice one

1 Like

Version 0.1.1 is out

This first patch version brings a few more copulas to the table : FranckCopula , AMHCopula , JoeCopula , GumbelCopula. All of them use the Archimedean Frailty representation for simulations.

The documentation now contains the definition of all exported copulas.

Next step : testing :slight_smile:

5 Likes

I use the DatagenCopulaBased package in some of my projects. @lrnv You seem to be also actively involved in that project. Do you recommend switching to this package? Are there any major differences between the two packages I should be aware of before switching?

I’m not much involved, I just fixed a few things that were bothering me the other week and then decided that a more general package might be needed. However:

  • DatagenCopulaBased.jl is for the moment much better tested than Copulas.jl, therefore I’ll recommend to keep using it for now (and open issues / PR onto Copulas.jl for missing features if you have time / want, you’ll always be welcome :slight_smile: ). Hopefully in the few next months, the testing and feature levels of Copulas.jl will increase and allow you to switch over.

  • The main difference between the two is that DatagenCopulaBased.jl, as its name suggests, only provides random number generations, while Copulas.jl provides full interactivity with Distributions.jl’s methods, which allows to use Distributions.fit(), and possibility to interact with the broader ecosystem such as Turing.jl, StatsPlots.jl etc. There is also the new SklarDist{} type, which is also a Distributions.jl-complient type which is AFAIK completely new in the ecosystem.


PS: There are also more copulas available in DatagenCopulaBased. One of my goals is indeed parity, which will come someday, some of the code will be ported from one package to the other soon.


PPS: The SklarDist construct allows you to do stuff as :

julia> MyDist = SklarDist(SurvivalCopula(FrankCopula(4,7),(2,4)),(LogNormal(),Pareto(),Gamma(),Normal()));

julia> data = rand(MyDist,1000)
4×1000 Matrix{Float64}:
  2.35294     0.457584    2.44058   0.840622   1.81306   0.630984  …   2.22521     0.496158  0.525926   0.701049  0.87533      4.92817
  1.47767    42.4113      1.55615  14.7199     1.75489   1.01644      13.6979     80.3027    1.40588    1.51737   1.40788      1.16863
  3.40522     0.0811707   2.19233   0.154554   2.5537    0.342503      0.0466783   0.134426  0.943388   0.853033  0.557026     1.49137
 -0.0871605   1.1993     -2.18685   0.626663  -0.553239  0.133756      0.668708    0.81423   0.271098  -0.295031  0.00926165  -0.891382

julia> MyCop = SurvivalCopula{4,ClaytonCopula,(2,4)};

julia> MyMarginals = Tuple{LogNormal,Pareto,Gamma,Normal};

julia> fitted_model = fit(SklarDist{MyCop,MyMarginals},data)

Which works correctly, and estimates the Copula, taking into acount the fact that it is Survival in dimensions 2 and 4, (meaning that the pseudo-observations are flipped), together with the margins. I think SklarDist{Cop,Margins} is the right name for this distribution.

You could also prefer to keep the empirical copula in the multivariate distribution, which you can easily do via :

julia> MyCop = EmpiricalCopula;

julia> fitted_model = fit(SklarDist{MyCop,MyMarginals},data)
3 Likes

Version 0.1.4 is out.

It contains only a few bug-fixes, and one new feature : implementation of a generic algorithm for Archimedean densities. This allows to leverage Distributions.jl’s compatability with the broader ecosystem, i.e. to do MCMC with Turing.jl on complicated models as follows:

using Copulas, Distributions, Random, Turing, Plots, StatsPlots

Random.seed!(123)
D = SklarDist(ClaytonCopula(3,7), (Exponential(1.0), Pareto(3.0), Exponential(2.0)))
draws = rand(D, 2_000)

@model function copula(X)
    # Priors
    θ  ~ TruncatedNormal(1.0, 1.0, 0, Inf)
    θ₁ ~ TruncatedNormal(1.0, 1.0, 0, Inf)
    θ₂ ~ TruncatedNormal(1.0, 1.0, 0, Inf)
    θ₃ ~ TruncatedNormal(1.0, 1.0, 0, Inf)

    # Marginals distributions and copula
    X₁ = Exponential(θ₁)
    X₂ = Pareto(θ₂)
    X₃ = Exponential(θ₃)
    C = ClaytonCopula(3,θ)
    D = SklarDist(C, (X₁, X₂, X₃))
    Turing.Turing.@addlogprob! loglikelihood(D, X)
end

sampler = NUTS() # MH() works too
chain = sample(copula(draws), sampler, MCMCThreads(), 1_00, 4)
p = plot(chain)
savefig(p, "plot_archimedeans.png")

10 Likes

@lrnv - the Turing compatibility in particular looks great!

How can I download version 0.1.4 of your package? The Julia package manager suggests that v0.1.0 is the latest available.

@Domenic_Di_Francesco Thanks, glad you enjoy it !

This is weird… Did you try ] up ? I think it might be a compatibility issue in your environment.

The most up to date version is 0.1.6, and the Julia package manager should be able to install it. Updating your packages should be enough to resolve the compatibility issue

1 Like

@lrnv - hi, just to confirm that after various updates the issue is resolved for me.
Thank you again for this contribution! The straightforward integration with Distributions and Turing is fantastic.

Glad to hear it then.

Indeed ! The best part is the ridiculous amount of code that was needed for that to happen.

1 Like

@lrnv thank you for your contribution on this. This is great!
Is it possible to implement a regression version of this? Namely, how can I specify a regression model for the θ, θ₁, θ₂?

I created a small example (see below). But it seems to not work very well. I am not sure what I am doing wrong. Any help would be greatly appreciated.


using Random
Random.seed!(43);

n = 200
p = 2

X = randn(n, p)
β1 = randn(p)

Y = X*hcat(β1,β1) + transpose(rand(SklarDist(GaussianCopula([1 .8; .8 1]), (Normal(0.0, 1.), Normal(0.0,1.0))), n))#transpose(rand(MultivariateNormal(zeros(2), [1 0.8; 0.8 1]),n))

#-- LOok at the data
cornerplot(Y, compact=true)

#-- Model
@model function copulaReg(Y, X)

    n, K = size(X)


    # Priors
    ρ ~ truncated(Normal(0,.3), -1, 1) #-- Corelated
    θ  ~ Normal(0.,1.)
    θ₁ ~ Normal(0.,1.)
    θ₂ ~ Normal(0.,1.)

    β1 ~ filldist(Normal(0., 1.), K)
    β2 ~ filldist(Normal(0., 1.), K)
     β ~ filldist(Normal(0., 1.), K)

    #-- Regression parameters
    μ1 = zeros(n)
    μ2 = zeros(n)


    for i in 1:n
        μ1[i] = θ₁ .+ sum(X[i,:].* β1)
        μ2[i] = θ₂ .+ sum(X[i,:].* β2)

        #Turing.Turing.@addlogprob! loglikelihood(SklarDist(GaussianCopula([1. ρ; ρ 1.0]), (Normal(μ1[i], σ1), Normal(μ2[i], σ2))), Y[i,:])
        Turing.Turing.@addlogprob! loglikelihood(SklarDist(GaussianCopula([1. ρ; ρ 1.0]), (Normal(μ1[i], 1.), Normal(μ2[i], 1.))), Y[i,:])
    end
end

#-- sampler
sampler = NUTS() # doe not work
sampler = PG(10) # works but gives weird results!!
#-- sample from the posterior
chain = sample(copulaReg(Y, X), sampler, MCMCThreads(), 2_000, 3)

describe(chain)

@Kokora Yes, this regression version should work out of the box.

Unfortunately, I tried everything I could and still gets the same error every time. Could you please open a topic specific to this problem in the Specific Domains / Probabilistic programming so that we could draw attention from people that might know more than me the internals of Turing.jl ? That would be great.

@lrn Thank you for looking at this. I will go ahead create a topic specific as you suggested.

2 Likes