# Issues implementing Copula regression using copula and Turing

I am trying to implement a copula regression model by leveraging Turing and copulas in Julia.

The author of copula implemented a nice working example as below

``````using Copulas, Distributions, Random, Turing, Plots, StatsPlots

Random.seed!(123)
D = SklarDist(ClaytonCopula(3,7), (Exponential(1.0), Pareto(3.0), Exponential(2.0)))
draws = rand(D, 2_000)

@model function copula(X)
# Priors
θ  ~ TruncatedNormal(1.0, 1.0, 0, Inf)
θ₁ ~ TruncatedNormal(1.0, 1.0, 0, Inf)
θ₂ ~ TruncatedNormal(1.0, 1.0, 0, Inf)
θ₃ ~ TruncatedNormal(1.0, 1.0, 0, Inf)

# Marginals distributions and copula
X₁ = Exponential(θ₁)
X₂ = Pareto(θ₂)
X₃ = Exponential(θ₃)
C = ClaytonCopula(3,θ)
D = SklarDist(C, (X₁, X₂, X₃))
end

sampler = NUTS() # MH() works too
chain = sample(copula(draws), sampler, MCMCThreads(), 1_00, 4)
plot(chain)

``````

I would like to implement a regression equivalent of that model. I have a simple example below. But NUTS and HMC samplers do not work. PG works but does not produce a reliable results.

``````using Random
Random.seed!(43);

n = 200
p = 2

X = randn(n, p)
β1 = randn(p)

Y = X*hcat(β1,β1) + transpose(rand(SklarDist(GaussianCopula([1 .8; .8 1]), (Normal(0.0, 1.), Normal(0.0,1.0))), n))

#-- Look at the data
cornerplot(Y, compact=true)

#-- Model
@model function copulaReg(Y, X)

n, K = size(X)

# Priors
ρ ~ truncated(Normal(0,.3), -1, 1) #-- Corelated
θ  ~ Normal(0.,1.)
θ₁ ~ Normal(0.,1.)
θ₂ ~ Normal(0.,1.)

β1 ~ filldist(Normal(0., 1.), K)
β2 ~ filldist(Normal(0., 1.), K)
β ~ filldist(Normal(0., 1.), K)

#-- Regression parameters
μ1 = zeros(n)
μ2 = zeros(n)

for i in 1:n
μ1[i] = θ₁ .+ sum(X[i,:].* β1)
μ2[i] = θ₂ .+ sum(X[i,:].* β2)

#Turing.Turing.@addlogprob! loglikelihood(SklarDist(GaussianCopula([1. ρ; ρ 1.0]), (Normal(μ1[i], σ1), Normal(μ2[i], σ2))), Y[i,:])
Turing.Turing.@addlogprob! loglikelihood(SklarDist(GaussianCopula([1. ρ; ρ 1.0]), (Normal(μ1[i], 1.), Normal(μ2[i], 1.))), Y[i,:])
end
end

#-- sampler
sampler = NUTS() # doe not work
sampler = PG(10) # works but gives weird results!!
#-- sample from the posterior
chain = sample(copulaReg(Y, X), sampler, MCMCThreads(), 2_000, 3)

describe(chain)
``````

I am not sure why NUTS and HMC are not working on this. Any help on this will be greatly appreciated. Thank you.

2 Likes

Your code is not generic enough, i.e., `zeros` creates an array of `Float64` and thus fails when Turing tries to take gradients using ForwardDiff by default as `ForwardDiff.Dual` cannot be written to those float arrays. Try with `μ1 = zeros(typeof(θ₁), n)` (same for `μ2`) or don’t use these arrays at all, but simply use local variables within the loop, i.e., `μ1 = θ₁ .+ sum(X[i,:].* β1)` (same for `μ2`).
I also needed to fix the correlation matrix of the Gaussian copula to `GaussianCopula([1+eps ρ; ρ 1+eps])` for some small `eps`, e.g., `1e-7`, in order to fix numerical issues when `ρ` gets close to one.

3 Likes

@bertschi haha - I see now! That works just perfect! Thank you.

@kokora, if I may what was the purpose of this MWE ?

The trick with `eps` works great. Do you think changing `ρ` to `ρ ~ truncated(Normal(0,.3), -0.99, 0.99)` would also work? In some sense this is not an ‘uninformed’ prior, but if the posterior bangs against the truncation, it would indicate the model needs adjustment.
I think having diagonal > 1 on correlation matrix feels too tricky.

The matrix you input is re-normalized to be a propper correlation matrix by the copula constructor. Thus, this `+eps` really is a contraction of `rho` by some amount, and it is not great as the variable rho does not mean anything anymore.

Truncating `rho`’s prior, if it works, would be indeed more sensible as the value obtained for it would still be interpretable.

1 Like

Perhaps an error/exception is better. This thread might be the example to show why an error/exception for a non-proper correlation matrix is better. The ‘projection’ to a correlation matrix can be done outside the copula constructor (maybe even a CorrelationMatrix type is needed).

You might be right.

Moreover, with `NUTS()`, even if resulsts look great, I still got a bunch of warnings :

``````┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
``````

Which do not look great. What should i do about these ?

Got those too. Didn’t dig enough to understand either.

@Dan Good point! I can also constrain that prior away from -1 and 1.
I am also thinking about specifying a regression model for ρ so that ρ = tanh(XB)*0.99 with prior on B instead. That 0.99 allows us to stay away from -1 and 1.

@lrnv I conquer - It is not at all clear what these warnings message are about? Maybe I should reopen this and mark as not fully resolved yet?

@Kokora About the warnings I cannot help you, but maybe someone else can. On the other hand, about the prior on `ρ`, you might also try other copula families than the Gaussian on your real data.

If using `tanh` formula, it is not necessary perhaps to limit to `0.99` as the problematic value is only reached ‘at infinity’. It also seems `tanh` is being used as a bijector from (-Inf,Inf) to (-1,1). So it might be interesting to study Bijector package to formally make this into one. It might somehow help some gradient calculations.
https://github.com/TuringLang/Bijectors.jl

(I’m learning as I go too).

UPDATE (after a bit of aforementioned ‘learning’):
`Bijectors.jl` already provides just such a ‘link’ (statistics name for this invertible function). So:

``````using Bijectors
cb = Bijectors.CorrBijector()
``````

And now:

``````julia> Inverse(cb)([0 0.4; 0 0])
2×2 Matrix{Float64}:
1.0       0.379949
0.379949  1.0
``````

This bijector is good for larger matrices. Basically, the `0.4` in the matrix is the `XB` and the exact `tanh` transform is done. For larger correlation matrices, bigger upper triangular matrices can be supplied.

@lrnv Yes, I plan to implement other Copula. I wanted to implement the following models for the rotated Clayton copulas defined as follows:

C90(u1, u2) = u2 − C(1 − u1, u2),
C180(u1, u2) = u1 + u2 − 1 + C(1 − u1, 1 − u2),
C270(u1, u2) = u1 − C(u1, 1 − u2),

where C is the standard Clayton Copula.
See this paper here
These are defined on page 6 of the paper.
I think these will take some thinking to do!
Any idea?

Ok I see your point. It took a little to get it. But it does not seem to work. See a small example below

``````using Copulas, Distributions, Random, Turing, Plots, StatsPlots
n = 200
p = 2

X = randn(n, p)
β1 = randn(p)

Y = X*hcat(β1,β1) + transpose(rand(SklarDist(GaussianCopula([1 .8; .8 1]), (Normal(0.0, 1.), Normal(0.0,1.0))), n))#transpose(rand(MultivariateNormal(zeros(2), [1 0.8; 0.8 1]),n))

#-- Model without bijector for Tanh
@model function copulaReg3(Y, X)

n, K = size(X)
cb = Bijectors.CorrBijector()

# Priors
#ρ ~ truncated(Normal(0,.3), -.99, 0.99) #-- Corelated
θ  ~ Normal(0.,1.)
θ₁ ~ Normal(0.,1.)
θ₂ ~ Normal(0.,1.)

β1 ~ filldist(Normal(0., 1.), K)
β2 ~ filldist(Normal(0., 1.), K)
β ~ filldist(Normal(0., 1.), K)

#-- Regression parameters
μ1 = zeros(typeof(θ₁), n)
μ2 = zeros(typeof(θ₂), n)
μ3 = zeros(typeof(θ₂), n)
ρ = zeros(typeof(θ), n)

for i in 1:n
μ1[i] = θ₁+sum(X[i,:].* β1)
μ2[i] = θ₂+sum(X[i,:].* β2)
μ3[i] = θ+sum(X[i,:].* β)
ρ[i] = Inverse(cb)([0. μ3[i]; 0. 0.])[2,1]

#Turing.Turing.@addlogprob! loglikelihood(SklarDist(GaussianCopula([1. ρ; ρ 1.0]), (Normal(μ1[i], σ1), Normal(μ2[i], σ2))), Y[i,:])
Turing.Turing.@addlogprob! loglikelihood(SklarDist(GaussianCopula([1 ρ[i]; ρ[i] 1]), (Normal(μ1[i], 1.), Normal(μ2[i], 1.))), Y[i,:])
end
end

#-- sampler
sampler = NUTS()

#-- sample from the posterior
chain = sample(copulaReg3(Y, X), sampler, MCMCThreads(), 2_000, 3)

``````

unless there is something I am missing.

For that, you might use the `SurvivalCopula` class, that is meant to flip directions of any copula:

``````using Copulas
C = ClaytonCopula(2,3.0) # bivariate clayton with theta = 3.0
C90 = SurvivalCopula(C,(1,)) # flips the first dimension
C270 = SurvivalCopula(C,(2,)) # flips only the second dimension.
C180 = SurvivalCopula(C,(1,2)) # flips both dimensions.
``````

You might want to update to Copulas.jl v0.1.7 for that to work correctly.

1 Like

@lrnv Exactly what I needed. It works just nicely! Thank you.

1 Like

One question - does Copula work with OrderedLogistic distribution?

I have this small example below

``````using Copulas, Distributions, Random, Turing, Plots, StatsPlots
using DataFrames
#using CairoMakie
using AlgebraOfGraphics
using Distributions
using StatsFuns#: logit
using Bijectors
using LazyArrays
using LinearAlgebra
using Random

#--- I am trying to fit the following copula model
#-- Parameters
seed!(123)
n = 500 # sample sizes
p = 3  # Number of caovariates
β = randn(p)
X = hcat(randn(n, p-1), zeros(n, 1))
lp = X*β
#-- Simulate data from a Clayton copula model

C180 = SurvivalCopula(ClaytonCopula(2,3.0),(1,2)) #ClaytonCopula(2,3.0) # bivariate clayton with theta = 3.0
#D = SklarDist(C180, (Normal(0.0, 1.), Normal(0.0,1.0)))
#D180 = SklarDist(C180, (Normal(0.0, 1.), Normal(0.0,1.0)))

Y = zeros(n, 2) ## Store the data
for i in 1:n
Y[i,:] = rand( SklarDist(C180, (Normal(lp[i], 1.), Logistic(lp[i],1.0))), 1)
end

#-- Plot the data
cornerplot( Y, compact=true)

#--- Copula model for this joint data
#-- Clayton Copula Model 180 Rotate
@model function Model1Clay180(Y, X)

n, K = size(X)

# Priors
θ  ~ Normal(0.,1.)
θ₁ ~ Normal(0.,1.)
θ₂ ~ Normal(0.,1.)

β1 ~ filldist(Normal(0., 1.), K)
β2 ~ filldist(Normal(0., 1.), K)
β ~ filldist(Normal(0., 1.), K)

# Regression parameters
μ1 = zeros(typeof(θ₁), n)
μ2 = zeros(typeof(θ₂), n)
ρ = zeros(typeof(θ), n)

# Implement  Shrinkage Priors
for i in 1:n
μ1[i] = θ₁ .+ sum(X[i,:].* β1)
μ2[i] = θ₂ .+ sum(X[i,:].* β2)
ρ[i] = exp(θ .+ sum(X[i,:].* β))

Turing.Turing.@addlogprob! loglikelihood(SklarDist(SurvivalCopula(ClaytonCopula(2,ρ[i]),(1,2)), (Normal(μ1[i], 1.), Logistic(μ2[i], 1.))), Y[i,:])

end
end

# Fit the model to the data
sampler = NUTS()

# sample from the posterior

Model1chain = sample(Model1Clay180(Y, X), sampler, MCMCThreads(), 2_000, 3)

# Second par of the problem
# SUppose Y[:,2] is not full observed but we only observe categories obtained by dichotomizing Y2
using CategoricalArrays

Ycat = cut(Y[:,2], [-Inf, -1.5, 0, 1.5, Inf], labels = [1,2,3,4])

#--- Our model becomes (using OrderedLogistic distribition)

#-- Clayton Copula Model 180 Rotate
@model function Model2Clay180(Y1, Y2, X)

n, K = size(X)
ncateg = maximum(Y2)

# Priors
cutpoints ~ Bijectors.ordered(filldist(Normal(0., 10.), ncateg - 1))
θ  ~ Normal(0.,1.)
θ₁ ~ Normal(0.,1.)
θ₂ ~ Normal(0.,1.)

β1 ~ filldist(Normal(0., 1.), K)
β2 ~ filldist(Normal(0., 1.), K)
β ~ filldist(Normal(0., 1.), K)

# Regression parameters
μ1 = zeros(typeof(θ₁), n)
μ2 = zeros(typeof(θ₂), n)
ρ = zeros(typeof(θ), n)

# Implement  Shrinkage Priors
for i in 1:n
μ1[i] = θ₁ .+ sum(X[i,:].* β1)
μ2[i] = θ₂ .+ sum(X[i,:].* β2)
ρ[i] = exp(θ .+ sum(X[i,:].* β))

Turing.Turing.@addlogprob! loglikelihood(SklarDist(SurvivalCopula(ClaytonCopula(2,ρ[i]),(1,2)), (Normal(μ1[i], 1.), OrderedLogistic(μ2[i],cutpoints))), [Y1[i],Y2[i]] )

end
end

#-- Fit the model
Model2chain = sample(Model2Clay180(Y[:,1], Ycat, X), sampler, MCMCThreads(), 2_000, 3)
``````

This does not at all for me.

Hum…

`OrderedLogistic` seem to come from `Turing.jl`, but the cdf corresponding to this distribution is not implemented. We could add it ourselves, but that’ll be type piracy:

``````function Distributions.cdf(d::OrderedLogistic{T, Vector{T2}}, x::T3) where {T<:Real, T2<:Real, T3<:Real}
rez = pdf(d,x)
for atom in support(d)
if atom < x
rez += pdf(d,atom)
end
end
return rez
end
``````

But then I still get numerical errors from `Turing.unsafe_logpdf_ordered_logistic`

I can assure you that this should work, but this distribution does not implement the interface from Create New Samplers and Distributions · Distributions.jl correctly, namely the `cdf` function is not implemented. Maybe we could ping `Turing.jl` with a github issue to fix this.

Another option is to go to `Distributions.jl` and ask for a generic fallback for the cdf of a discrete distribution knowing the pdf, which seems not to be implemented right now (i dont know why).

EDIT: As a side note, your model takes ages to run on my machine. I am not familiar enough with Turing to know if it is normal, but maybe there are things to do to make it a lot faster ?

@lrnv Thank you for your comment. I seem to still run into issues even when I use your snippet of code.
I have another idea. What if I define the copula based on Y1 and a latent variable Y2st = logistic(mu2, 1). Also I supplement the likelihood with the Ycat (ordinal categorical data) using the OrderedLogistic likelihood. I tried to implement that below. But it does not seem to work either. I might be doing something terrible wrong here.

``````
#-- Clayton Copula Model 180 Rotate
@model function Model3Clay180(Y1, Y2, X)

n, K = size(X)
ncateg = 4#maximum(Y2)

# Priors
cutpoints ~ Bijectors.ordered(filldist(Normal(0., 10.), ncateg - 1))
θ  ~ Normal(0.,1.)
θ₁ ~ Normal(0.,1.)
θ₂ ~ Normal(0.,1.)

β1 ~ filldist(Normal(0., 1.), K)
β2 ~ filldist(Normal(0., 1.), K)
β ~ filldist(Normal(0., 1.), K)

# Regression parameters
μ1 = zeros(typeof(θ₁), n)
μ2 = zeros(typeof(θ₂), n)
ρ = zeros(typeof(θ), n)
Y2st ~ filldist(Logistic(0, 1.), n)

# Implement  Shrinkage Priors
for i in 1:n
μ1[i] = θ₁ .+ sum(X[i,:].* β1)
μ2[i] = θ₂ .+ sum(X[i,:].* β2)
ρ[i] = exp(θ .+ sum(X[i,:].* β))
Y2[i] ~ OrderedLogistic(Y2st[i]+μ2[i],cutpoints)

#Now the copula is between Y1 (observed) and Y2st (latent)
Turing.Turing.@addlogprob! loglikelihood(SklarDist(SurvivalCopula(ClaytonCopula(2,ρ[i]),(1,2)), (Normal(μ1[i], 1.), Logistic(μ2[i],1.))), [Y1[i],Y2st[i]+μ2[i]] )

end
end

``````