Issues implementing Copula regression using copula and Turing

I am trying to implement a copula regression model by leveraging Turing and copulas in Julia.

The author of copula implemented a nice working example as below

using Copulas, Distributions, Random, Turing, Plots, StatsPlots

Random.seed!(123)
D = SklarDist(ClaytonCopula(3,7), (Exponential(1.0), Pareto(3.0), Exponential(2.0)))
draws = rand(D, 2_000)

@model function copula(X)
    # Priors
    θ  ~ TruncatedNormal(1.0, 1.0, 0, Inf)
    θ₁ ~ TruncatedNormal(1.0, 1.0, 0, Inf)
    θ₂ ~ TruncatedNormal(1.0, 1.0, 0, Inf)
    θ₃ ~ TruncatedNormal(1.0, 1.0, 0, Inf)

    # Marginals distributions and copula
    X₁ = Exponential(θ₁)
    X₂ = Pareto(θ₂)
    X₃ = Exponential(θ₃)
    C = ClaytonCopula(3,θ)
    D = SklarDist(C, (X₁, X₂, X₃))
    Turing.Turing.@addlogprob! loglikelihood(D, X)
end

sampler = NUTS() # MH() works too
chain = sample(copula(draws), sampler, MCMCThreads(), 1_00, 4)
plot(chain)

I would like to implement a regression equivalent of that model. I have a simple example below. But NUTS and HMC samplers do not work. PG works but does not produce a reliable results.

using Random
Random.seed!(43);

n = 200
p = 2

X = randn(n, p)
β1 = randn(p)

Y = X*hcat(β1,β1) + transpose(rand(SklarDist(GaussianCopula([1 .8; .8 1]), (Normal(0.0, 1.), Normal(0.0,1.0))), n))

#-- Look at the data
cornerplot(Y, compact=true)

#-- Model
@model function copulaReg(Y, X)

    n, K = size(X)


    # Priors
    ρ ~ truncated(Normal(0,.3), -1, 1) #-- Corelated
    θ  ~ Normal(0.,1.)
    θ₁ ~ Normal(0.,1.)
    θ₂ ~ Normal(0.,1.)

    β1 ~ filldist(Normal(0., 1.), K)
    β2 ~ filldist(Normal(0., 1.), K)
     β ~ filldist(Normal(0., 1.), K)

    #-- Regression parameters
    μ1 = zeros(n)
    μ2 = zeros(n)


    for i in 1:n
        μ1[i] = θ₁ .+ sum(X[i,:].* β1)
        μ2[i] = θ₂ .+ sum(X[i,:].* β2)

        #Turing.Turing.@addlogprob! loglikelihood(SklarDist(GaussianCopula([1. ρ; ρ 1.0]), (Normal(μ1[i], σ1), Normal(μ2[i], σ2))), Y[i,:])
        Turing.Turing.@addlogprob! loglikelihood(SklarDist(GaussianCopula([1. ρ; ρ 1.0]), (Normal(μ1[i], 1.), Normal(μ2[i], 1.))), Y[i,:])
    end
end

#-- sampler
sampler = NUTS() # doe not work
sampler = PG(10) # works but gives weird results!!
#-- sample from the posterior
chain = sample(copulaReg(Y, X), sampler, MCMCThreads(), 2_000, 3)

describe(chain)

I am not sure why NUTS and HMC are not working on this. Any help on this will be greatly appreciated. Thank you.

2 Likes

Your code is not generic enough, i.e., zeros creates an array of Float64 and thus fails when Turing tries to take gradients using ForwardDiff by default as ForwardDiff.Dual cannot be written to those float arrays. Try with μ1 = zeros(typeof(θ₁), n) (same for μ2) or don’t use these arrays at all, but simply use local variables within the loop, i.e., μ1 = θ₁ .+ sum(X[i,:].* β1) (same for μ2).
I also needed to fix the correlation matrix of the Gaussian copula to GaussianCopula([1+eps ρ; ρ 1+eps]) for some small eps, e.g., 1e-7, in order to fix numerical issues when ρ gets close to one.

3 Likes

@bertschi haha - I see now! That works just perfect! Thank you.

@kokora, if I may what was the purpose of this MWE ?

The trick with eps works great. Do you think changing ρ to ρ ~ truncated(Normal(0,.3), -0.99, 0.99) would also work? In some sense this is not an ‘uninformed’ prior, but if the posterior bangs against the truncation, it would indicate the model needs adjustment.
I think having diagonal > 1 on correlation matrix feels too tricky.

The matrix you input is re-normalized to be a propper correlation matrix by the copula constructor. Thus, this +eps really is a contraction of rho by some amount, and it is not great as the variable rho does not mean anything anymore.

Truncating rho’s prior, if it works, would be indeed more sensible as the value obtained for it would still be interpretable.

1 Like

Perhaps an error/exception is better. This thread might be the example to show why an error/exception for a non-proper correlation matrix is better. The ‘projection’ to a correlation matrix can be done outside the copula constructor (maybe even a CorrelationMatrix type is needed).

Relevant link: https://en.wikipedia.org/wiki/Correlation#Nearest_valid_correlation_matrix

You might be right.

Moreover, with NUTS(), even if resulsts look great, I still got a bunch of warnings :

┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)      
└ @ AdvancedHMC C:\Users\lrnv\.julia\packages\AdvancedHMC\iWHPQ\src\hamiltonian.jl:47

Which do not look great. What should i do about these ?

Got those too. Didn’t dig enough to understand either.

@Dan Good point! I can also constrain that prior away from -1 and 1.
I am also thinking about specifying a regression model for ρ so that ρ = tanh(XB)*0.99 with prior on B instead. That 0.99 allows us to stay away from -1 and 1.

@lrnv I conquer - It is not at all clear what these warnings message are about? Maybe I should reopen this and mark as not fully resolved yet?

@Kokora About the warnings I cannot help you, but maybe someone else can. On the other hand, about the prior on ρ, you might also try other copula families than the Gaussian on your real data.

If using tanh formula, it is not necessary perhaps to limit to 0.99 as the problematic value is only reached ‘at infinity’. It also seems tanh is being used as a bijector from (-Inf,Inf) to (-1,1). So it might be interesting to study Bijector package to formally make this into one. It might somehow help some gradient calculations.
https://github.com/TuringLang/Bijectors.jl

(I’m learning as I go too).

UPDATE (after a bit of aforementioned ‘learning’):
Bijectors.jl already provides just such a ‘link’ (statistics name for this invertible function). So:

using Bijectors
cb = Bijectors.CorrBijector()

And now:

julia> Inverse(cb)([0 0.4; 0 0])
2×2 Matrix{Float64}:
 1.0       0.379949
 0.379949  1.0

This bijector is good for larger matrices. Basically, the 0.4 in the matrix is the XB and the exact tanh transform is done. For larger correlation matrices, bigger upper triangular matrices can be supplied.

@lrnv Yes, I plan to implement other Copula. I wanted to implement the following models for the rotated Clayton copulas defined as follows:

C90(u1, u2) = u2 − C(1 − u1, u2),
C180(u1, u2) = u1 + u2 − 1 + C(1 − u1, 1 − u2),
C270(u1, u2) = u1 − C(u1, 1 − u2),

where C is the standard Clayton Copula.
See this paper here
These are defined on page 6 of the paper.
I think these will take some thinking to do!
Any idea?

Ok I see your point. It took a little to get it. But it does not seem to work. See a small example below

using Copulas, Distributions, Random, Turing, Plots, StatsPlots
n = 200
p = 2

X = randn(n, p)
β1 = randn(p)

Y = X*hcat(β1,β1) + transpose(rand(SklarDist(GaussianCopula([1 .8; .8 1]), (Normal(0.0, 1.), Normal(0.0,1.0))), n))#transpose(rand(MultivariateNormal(zeros(2), [1 0.8; 0.8 1]),n))



#-- Model without bijector for Tanh
@model function copulaReg3(Y, X)

    n, K = size(X)
    cb = Bijectors.CorrBijector()

    # Priors
    #ρ ~ truncated(Normal(0,.3), -.99, 0.99) #-- Corelated
    θ  ~ Normal(0.,1.)
    θ₁ ~ Normal(0.,1.)
    θ₂ ~ Normal(0.,1.)

    β1 ~ filldist(Normal(0., 1.), K)
    β2 ~ filldist(Normal(0., 1.), K)
     β ~ filldist(Normal(0., 1.), K)

    #-- Regression parameters
    μ1 = zeros(typeof(θ₁), n)
    μ2 = zeros(typeof(θ₂), n)
    μ3 = zeros(typeof(θ₂), n)
     ρ = zeros(typeof(θ), n)


    for i in 1:n
        μ1[i] = θ₁+sum(X[i,:].* β1)
        μ2[i] = θ₂+sum(X[i,:].* β2)
        μ3[i] = θ+sum(X[i,:].* β)
        ρ[i] = Inverse(cb)([0. μ3[i]; 0. 0.])[2,1]

        #Turing.Turing.@addlogprob! loglikelihood(SklarDist(GaussianCopula([1. ρ; ρ 1.0]), (Normal(μ1[i], σ1), Normal(μ2[i], σ2))), Y[i,:])
        Turing.Turing.@addlogprob! loglikelihood(SklarDist(GaussianCopula([1 ρ[i]; ρ[i] 1]), (Normal(μ1[i], 1.), Normal(μ2[i], 1.))), Y[i,:])
    end
end

#-- sampler
sampler = NUTS()

#-- sample from the posterior
chain = sample(copulaReg3(Y, X), sampler, MCMCThreads(), 2_000, 3)

unless there is something I am missing.

For that, you might use the SurvivalCopula class, that is meant to flip directions of any copula:

using Copulas
C = ClaytonCopula(2,3.0) # bivariate clayton with theta = 3.0
C90 = SurvivalCopula(C,(1,)) # flips the first dimension
C270 = SurvivalCopula(C,(2,)) # flips only the second dimension. 
C180 = SurvivalCopula(C,(1,2)) # flips both dimensions.

You might want to update to Copulas.jl v0.1.7 for that to work correctly.

1 Like

@lrnv Exactly what I needed. It works just nicely! Thank you.

1 Like

One question - does Copula work with OrderedLogistic distribution?

I have this small example below

using Copulas, Distributions, Random, Turing, Plots, StatsPlots
using DataFrames
#using CairoMakie
using AlgebraOfGraphics
using Distributions
using StatsFuns#: logit
using Bijectors
using LazyArrays
using LinearAlgebra
using Random

#--- I am trying to fit the following copula model
#-- Parameters 
seed!(123)
n = 500 # sample sizes
p = 3  # Number of caovariates 
β = randn(p)
X = hcat(randn(n, p-1), zeros(n, 1))
lp = X*β
#-- Simulate data from a Clayton copula model

C180 = SurvivalCopula(ClaytonCopula(2,3.0),(1,2)) #ClaytonCopula(2,3.0) # bivariate clayton with theta = 3.0
#D = SklarDist(C180, (Normal(0.0, 1.), Normal(0.0,1.0)))
#D180 = SklarDist(C180, (Normal(0.0, 1.), Normal(0.0,1.0)))

Y = zeros(n, 2) ## Store the data
for i in 1:n
    Y[i,:] = rand( SklarDist(C180, (Normal(lp[i], 1.), Logistic(lp[i],1.0))), 1)
end


#-- Plot the data
cornerplot( Y, compact=true)

#--- Copula model for this joint data
#-- Clayton Copula Model 180 Rotate
@model function Model1Clay180(Y, X)

    n, K = size(X)


    # Priors
    θ  ~ Normal(0.,1.)
    θ₁ ~ Normal(0.,1.)
    θ₂ ~ Normal(0.,1.)

    β1 ~ filldist(Normal(0., 1.), K)
    β2 ~ filldist(Normal(0., 1.), K)
     β ~ filldist(Normal(0., 1.), K)

    # Regression parameters
    μ1 = zeros(typeof(θ₁), n)
    μ2 = zeros(typeof(θ₂), n)
     ρ = zeros(typeof(θ), n)


    # Implement  Shrinkage Priors
    for i in 1:n
        μ1[i] = θ₁ .+ sum(X[i,:].* β1)
        μ2[i] = θ₂ .+ sum(X[i,:].* β2)
        ρ[i] = exp(θ .+ sum(X[i,:].* β))

        Turing.Turing.@addlogprob! loglikelihood(SklarDist(SurvivalCopula(ClaytonCopula(2,ρ[i]),(1,2)), (Normal(μ1[i], 1.), Logistic(μ2[i], 1.))), Y[i,:])
       
    end
end

# Fit the model to the data
sampler = NUTS()

# sample from the posterior

Model1chain = sample(Model1Clay180(Y, X), sampler, MCMCThreads(), 2_000, 3)



# Second par of the problem
# SUppose Y[:,2] is not full observed but we only observe categories obtained by dichotomizing Y2
using CategoricalArrays

Ycat = cut(Y[:,2], [-Inf, -1.5, 0, 1.5, Inf], labels = [1,2,3,4])

#--- Our model becomes (using OrderedLogistic distribition)

#-- Clayton Copula Model 180 Rotate
@model function Model2Clay180(Y1, Y2, X)

    n, K = size(X)
    ncateg = maximum(Y2)
     
    # Priors
    cutpoints ~ Bijectors.ordered(filldist(Normal(0., 10.), ncateg - 1))
    θ  ~ Normal(0.,1.)
    θ₁ ~ Normal(0.,1.)
    θ₂ ~ Normal(0.,1.)

    β1 ~ filldist(Normal(0., 1.), K)
    β2 ~ filldist(Normal(0., 1.), K)
     β ~ filldist(Normal(0., 1.), K)

    # Regression parameters
    μ1 = zeros(typeof(θ₁), n)
    μ2 = zeros(typeof(θ₂), n)
     ρ = zeros(typeof(θ), n)


    # Implement  Shrinkage Priors
    for i in 1:n
        μ1[i] = θ₁ .+ sum(X[i,:].* β1)
        μ2[i] = θ₂ .+ sum(X[i,:].* β2)
        ρ[i] = exp(θ .+ sum(X[i,:].* β))

        Turing.Turing.@addlogprob! loglikelihood(SklarDist(SurvivalCopula(ClaytonCopula(2,ρ[i]),(1,2)), (Normal(μ1[i], 1.), OrderedLogistic(μ2[i],cutpoints))), [Y1[i],Y2[i]] )
       
    end
end

#-- Fit the model
Model2chain = sample(Model2Clay180(Y[:,1], Ycat, X), sampler, MCMCThreads(), 2_000, 3)

This does not at all for me.

Hum…

OrderedLogistic seem to come from Turing.jl, but the cdf corresponding to this distribution is not implemented. We could add it ourselves, but that’ll be type piracy:

function Distributions.cdf(d::OrderedLogistic{T, Vector{T2}}, x::T3) where {T<:Real, T2<:Real, T3<:Real}
    rez = pdf(d,x)
    for atom in support(d)
        if atom < x
            rez += pdf(d,atom)
        end
    end
    return rez
end

But then I still get numerical errors from Turing.unsafe_logpdf_ordered_logistic

I can assure you that this should work, but this distribution does not implement the interface from Create New Samplers and Distributions · Distributions.jl correctly, namely the cdf function is not implemented. Maybe we could ping Turing.jl with a github issue to fix this.

Another option is to go to Distributions.jl and ask for a generic fallback for the cdf of a discrete distribution knowing the pdf, which seems not to be implemented right now (i dont know why).

EDIT: As a side note, your model takes ages to run on my machine. I am not familiar enough with Turing to know if it is normal, but maybe there are things to do to make it a lot faster ?

@lrnv Thank you for your comment. I seem to still run into issues even when I use your snippet of code.
I have another idea. What if I define the copula based on Y1 and a latent variable Y2st = logistic(mu2, 1). Also I supplement the likelihood with the Ycat (ordinal categorical data) using the OrderedLogistic likelihood. I tried to implement that below. But it does not seem to work either. I might be doing something terrible wrong here.


#-- Clayton Copula Model 180 Rotate
@model function Model3Clay180(Y1, Y2, X)

    n, K = size(X)
    ncateg = 4#maximum(Y2)
     
    # Priors
    cutpoints ~ Bijectors.ordered(filldist(Normal(0., 10.), ncateg - 1))
    θ  ~ Normal(0.,1.)
    θ₁ ~ Normal(0.,1.)
    θ₂ ~ Normal(0.,1.)

    β1 ~ filldist(Normal(0., 1.), K)
    β2 ~ filldist(Normal(0., 1.), K)
     β ~ filldist(Normal(0., 1.), K)

    # Regression parameters
    μ1 = zeros(typeof(θ₁), n)
    μ2 = zeros(typeof(θ₂), n)
     ρ = zeros(typeof(θ), n)
     Y2st ~ filldist(Logistic(0, 1.), n)


    # Implement  Shrinkage Priors
    for i in 1:n
        μ1[i] = θ₁ .+ sum(X[i,:].* β1)
        μ2[i] = θ₂ .+ sum(X[i,:].* β2)
        ρ[i] = exp(θ .+ sum(X[i,:].* β))
        Y2[i] ~ OrderedLogistic(Y2st[i]+μ2[i],cutpoints)

        #Now the copula is between Y1 (observed) and Y2st (latent)
        Turing.Turing.@addlogprob! loglikelihood(SklarDist(SurvivalCopula(ClaytonCopula(2,ρ[i]),(1,2)), (Normal(μ1[i], 1.), Logistic(μ2[i],1.))), [Y1[i],Y2st[i]+μ2[i]] )

    end
end