How to sample ordered variables with different distributions in turing

gonzalo · January 7, 2024, 4:55am

Hi julia bros,

I want to use turing to sample ordered variables each with different distributions. For example,

X~Normal(0,1) Y~Exp(1), Z~Uniform(0,1)

and require

X>Y>Z

I know this can be done by rejection sampling but in case there are lots of ordered variables the efficiency is very low. Is there a way in turing or other packages in julia to help?

Gonzalo

bertschi · January 7, 2024, 12:27pm

Would try something like

@model function demo()
    Z ~ Uniform(0, 1)
    Y ~ truncated(Exponential(1); lower = Z)
    X ~ truncated(Normal(); lower = Y)
    [X, Y, Z]
end

chain = sample(demo(), NUTS(), 1_000; progress = true)

Not sure if it works correctly though, as it returns some samples violating the constraints … maybe a bug?

Dan · January 7, 2024, 1:33pm

chain = sample(demo(), SMC(), 1_000; progress = true);

Seems to return non-violating samples.

gonzalo · January 8, 2024, 2:25am

Thank you. In your code Z ~ Uniform(0,1) and the others are conditioned on its value but that’s not accurate because Z’s distribution is not affected. A rejection sampling can show that X,Y,Z’s distributions are not what they are specified at the beginning.

bertschi · January 8, 2024, 6:56pm

Well, yes, that was my interpretation of your definition.
Seems you want a different density though. In this case, you will probably need to define a transformation from \mathbb{R}^3 \to (x, y, z) where x \in [0, 1], y > x, z > y and then use Turing.@addlogprob! to explicitly define the desired density – including the log-Jacobian of the transformation – or define a custom Bijector and use this to implicitly transform the desired density. Have never done this in Turing though.

bertschi · January 8, 2024, 7:46pm

Following up, does this do what you want?

struct CustomDistribution <: ContinuousMultivariateDistribution
end

Base.length(c::CustomDistribution) = 3

Bijectors.bijector(c::CustomDistribution) = Bijectors.Stacked([Bijectors.Logit(0, 1), identity, identity]) ∘ inverse(Bijectors.OrderedBijector())

Distributions.logpdf(c::CustomDistribution, v::AbstractVector) = sum(logpdf.([Uniform(0, 1), Exponential(1), Normal(0, 1)], v))

@model function demo()
    xyz ~ CustomDistribution()
end

chain = sample(demo(), NUTS(), 1_000; progress = true)

Note: This is just hacked together as MWE. Please check the docs of Distributions and Bijectors in more detail on how to define and use custom distributions!

Dan · January 8, 2024, 10:15pm

Is there a bijector which simply wraps the reals mod 1 ? It isn’t 1-1 but it is continuous and a local bijection.
Perhaps I should go and look at the Bijectors code, but here I am asking this question.

(as for the relevance, using such a uniform bijector will generate different sampling distribution, which may somehow correspond better to some desired distribution).

UPDATE: There is a problem though with using such a bijector, as it doesn’t conserve ordering (it is non-monotonic) and wouldn’t work in combination with the second bijector as suggested.

gonzalo · January 9, 2024, 4:46am

Thank you so much! That appears nicely solve my questions. I am not familiar with Bijectors but what you present gives the same result to rejection sampling.

However if the distributions for X,Y,Z are changed, for example X,Y,Z iid ~ Exponential(1), then I use your code and got mean(X) = 0.28 which however should be 1/3 (can be shown analytically or with rejection sampling).

function rejection_sampling_exp3(n_samples::Int)
samples_X = Float64
samples_Y = Float64
samples_Z = Float64
while length(samples_X) < n_samples
    x = rand(Exponential(1))
    y = rand(Exponential(1))
    z = rand(Exponential(1))
    
    if x < y < z
        push!(samples_X, x)
        push!(samples_Y, y)
        push!(samples_Z, z)
    end
end

return samples_X, samples_Y, samples_Z
end

n_samples = 100000

samples_X, samples_Y, samples_Z = rejection_sampling_exp3(n_samples)

mean_X = mean(samples_X)
mean_Y = mean(samples_Y)
mean_Z = mean(samples_Z)

println(“Mean X: $mean_X”)
println(“Mean Y: $mean_Y”)
println(“Mean Z: $mean_Z”)

I guess this is related to transforming exponentially distributed X~Exp(1) by the Bijectors.

Is there anything that I got wrong? Thanks so much!

Dan · January 9, 2024, 6:01am

This following does rejection sampling, but does it with Turing:

using Turing

@model function demo()
    Z ~ Uniform(0, 1)
    Y ~ Exponential(1)
    X ~ Normal()
    if !(Z < Y < X)
        DynamicPPL.@addlogprob! -Inf
        return
    end
    [X, Y, Z]
end

chain = sample(demo(), NUTS(), 1_000; progress = true)

The average of Z in my test came out to be 0.338

gonzalo · January 9, 2024, 9:04am

Thanks you much! working pretty well in my exampke. That’s very super clever way.

gonzalo · January 9, 2024, 10:34am

By the way, do you know how with VI this can be performed?

bertschi · January 9, 2024, 11:40am

As you mention yourself, it would not be 1-1, i.e., not a bijector. This will be a problem when trying to compute the Jacobian correction it has on the density – which is needed in your case as you want to define a density on the constrained, i.e., transformed, space.

bertschi · January 9, 2024, 12:04pm

In this case, you would need to change the bijector to match the support of your distribution:

Logit(0, 1): Transforms from a constrained space x \in [0, 1], i.e., the support of uniform, to an unconstrained value y \in \mathbb{R}:
```
julia> using Bijectors
with_logabsdet_jacobian(Bijectors.Logit(0, 1), 0.1)  # (unconstrained value, Jacobian correction)
```
Turing uses this to define a density p(y) on the unconstrained space which nevertheless has the desired density p(x) when mapped onto the constrained space. By change of variable, this requires to adjust the density by the log-abs-Jacobian as p(y) = p(x) |dx/dy| where dx/dy denotes the Jacobian of the inverse transformation. The Stan manual has a nice explanation on this, but also the source code of Bijectors is rather clear, i.e., just try @less logpdf(transformed(Uniform(0, 1), Bijectors.Logit(0, 1)), 0.2)

Long story short, in case of the Exponential distribution, you need a bijector from \mathbb{R}^+ \to \mathbb{R} instead, i.e., just replace Bijectors.Logit(0, 1)
by bijector(Exponential(1)) – should probably have used bijector(Uniform(0, 1)) in the first place.

Regarding VB: This should work out-of-the-box for the model using CustomDistribution.

Dan · January 9, 2024, 11:42pm

This is a bit of a tangent to the thread. But I think even though it is not a bijector, using this transformation is valid, and might be useful. The two caveats needed are:
(i) Continuity of density/transform on the stiched domains of the covering space - without which some algos might get confused (although this may not be a strict requirement)
(ii) A ‘uniform’ covering. For example the mod1 cover of [0,1) with R has the required symmetry (the covering can be partitioned into branches)
With these conditions, the stable distribution on the original space should be maintained (and the sampling will work as intended).

Such more general transformation would be a nice addition (and maybe Stan doesn’t have them).

bertschi · January 10, 2024, 7:34pm

Still not sure that something like mod 1 is a good idea: First, the posterior on the unconstrained space will be multi-modal as many points map onto the same constrained parameter and accordingly also have the same likelihood. Secondly, and even worse, this means that the posterior will not be proper as it cannot be integrated – without any additional density defined on the unconstrained space directly.
Why do you think such an approach would be better than a 1-1 mapping, e.g., via inverse logit?

Dan · January 10, 2024, 7:51pm

I have an intuition it might converge faster.
Speed is not so much of an issue here, but perhaps a different problem can amplify the speed difference.

bertschi · January 10, 2024, 7:55pm

Would be surprised by that – and in any case, it would only work as most practical samplers are not able to explore the true posterior which is improper and multi-modal.

Dan · January 10, 2024, 8:01pm

I don’t understand this comment fully:
Samplers are supposed to sample the distribution which is often multi-modal.
Samplers are not integrating, just sampling. They don’t really know about a global property of the distribution, like whether it is proper.
But I’m not really up to date with all the modern sampling tricks.
This is off-thread topic. So maybe we should let this idea slip through the cracks unless someone picks it up (maybe I will).

sethaxen · May 29, 2024, 8:06am

This is strictly not correct. MCMC algorithms are numerical integration methods. If certain conditions are satisfied, the resulting samples can be used to approximate certain expectations numerically. Moreover, when those conditions are satisfied, there’s a central limit theorem that bounds the error in the approximation of expectations. MCMC methods cannot sample from an improper distribution. “Improper distributions” are not distributions at all. One could never draw exact samples from an “improper distribution,” so it is also not possible to draw samples using MCMC methods. Multi-modal distributions are in principle no problem for MCMC, but in practice most widely used MCMC algorithms would require infinite time to jump between widely-separated modes. So when possible, it is a good idea to parametrize a model in such a way that it is unimodal.

Dan · May 29, 2024, 8:26am

Few things are.

According to Markov chain Monte Carlo - Wikipedia :
" In statistics, Markov chain Monte Carlo (MCMC ) is a class of algorithms used to draw samples from a probability distribution. " but it is true that:
" MCMC methods are primarily used for calculating numerical approximations ".

You can’t even draw from the Normal Distribution on a computer (everything is finite…).

Also inaccurate, and depends mostly on the width of the chain’s “bottlenecks” (see Mixing Time literature).

Topic		Replies	Views
How to sample many ordered iid variables in turing Probabilistic programming turing , distributions , sampling	2	89	March 24, 2025
Automatic transformation of constrained distributions in Turing.jl General Usage question , turing	0	144	June 14, 2023
How to use Bijectors.jl for model with simplex, am I doing right? Probabilistic programming question , turing	4	821	February 21, 2022
Is there a turing alternative to pm.Deterministic from pymc3? New to Julia turing , distributions	4	1660	June 25, 2020
Sample dependent variables in Turing Probabilistic programming question , turing	7	362	June 7, 2024

How to sample ordered variables with different distributions in turing

Related topics