I want to use turing to sample ordered variables each with different distributions. For example,
X~Normal(0,1) Y~Exp(1), Z~Uniform(0,1)
and require
X>Y>Z
I know this can be done by rejection sampling but in case there are lots of ordered variables the efficiency is very low. Is there a way in turing or other packages in julia to help?
Thank you. In your code Z ~ Uniform(0,1) and the others are conditioned on its value but thatās not accurate because Zās distribution is not affected. A rejection sampling can show that X,Y,Zās distributions are not what they are specified at the beginning.
Well, yes, that was my interpretation of your definition.
Seems you want a different density though. In this case, you will probably need to define a transformation from \mathbb{R}^3 \to (x, y, z) where x \in [0, 1], y > x, z > y and then use Turing.@addlogprob! to explicitly define the desired density ā including the log-Jacobian of the transformation ā or define a custom Bijector and use this to implicitly transform the desired density. Have never done this in Turing though.
Note: This is just hacked together as MWE. Please check the docs of Distributions and Bijectors in more detail on how to define and use custom distributions!
Is there a bijector which simply wraps the reals mod 1 ? It isnāt 1-1 but it is continuous and a local bijection.
Perhaps I should go and look at the Bijectors code, but here I am asking this question.
(as for the relevance, using such a uniform bijector will generate different sampling distribution, which may somehow correspond better to some desired distribution).
UPDATE: There is a problem though with using such a bijector, as it doesnāt conserve ordering (it is non-monotonic) and wouldnāt work in combination with the second bijector as suggested.
Thank you so much! That appears nicely solve my questions. I am not familiar with Bijectors but what you present gives the same result to rejection sampling.
However if the distributions for X,Y,Z are changed, for example X,Y,Z iid ~ Exponential(1), then I use your code and got mean(X) = 0.28 which however should be 1/3 (can be shown analytically or with rejection sampling).
while length(samples_X) < n_samples
x = rand(Exponential(1))
y = rand(Exponential(1))
z = rand(Exponential(1))
if x < y < z
push!(samples_X, x)
push!(samples_Y, y)
push!(samples_Z, z)
end
end
return samples_X, samples_Y, samples_Z
This following does rejection sampling, but does it with Turing:
using Turing
@model function demo()
Z ~ Uniform(0, 1)
Y ~ Exponential(1)
X ~ Normal()
if !(Z < Y < X)
DynamicPPL.@addlogprob! -Inf
return
end
[X, Y, Z]
end
chain = sample(demo(), NUTS(), 1_000; progress = true)
As you mention yourself, it would not be 1-1, i.e., not a bijector. This will be a problem when trying to compute the Jacobian correction it has on the density ā which is needed in your case as you want to define a density on the constrained, i.e., transformed, space.
Turing uses this to define a density p(y) on the unconstrained space which nevertheless has the desired density p(x) when mapped onto the constrained space. By change of variable, this requires to adjust the density by the log-abs-Jacobian as p(y) = p(x) |dx/dy| where dx/dy denotes the Jacobian of the inverse transformation. The Stan manual has a nice explanation on this, but also the source code of Bijectors is rather clear, i.e., just try @less logpdf(transformed(Uniform(0, 1), Bijectors.Logit(0, 1)), 0.2)
Long story short, in case of the Exponential distribution, you need a bijector from \mathbb{R}^+ \to \mathbb{R} instead, i.e., just replace Bijectors.Logit(0, 1)
by bijector(Exponential(1)) ā should probably have used bijector(Uniform(0, 1)) in the first place.
Regarding VB: This should work out-of-the-box for the model using CustomDistribution.
This is a bit of a tangent to the thread. But I think even though it is not a bijector, using this transformation is valid, and might be useful. The two caveats needed are:
(i) Continuity of density/transform on the stiched domains of the covering space - without which some algos might get confused (although this may not be a strict requirement)
(ii) A āuniformā covering. For example the mod1 cover of [0,1) with R has the required symmetry (the covering can be partitioned into branches)
With these conditions, the stable distribution on the original space should be maintained (and the sampling will work as intended).
Such more general transformation would be a nice addition (and maybe Stan doesnāt have them).
Still not sure that something like mod 1 is a good idea: First, the posterior on the unconstrained space will be multi-modal as many points map onto the same constrained parameter and accordingly also have the same likelihood. Secondly, and even worse, this means that the posterior will not be proper as it cannot be integrated ā without any additional density defined on the unconstrained space directly.
Why do you think such an approach would be better than a 1-1 mapping, e.g., via inverse logit?
Would be surprised by that ā and in any case, it would only work as most practical samplers are not able to explore the true posterior which is improper and multi-modal.
I donāt understand this comment fully:
Samplers are supposed to sample the distribution which is often multi-modal.
Samplers are not integrating, just sampling. They donāt really know about a global property of the distribution, like whether it is proper.
But Iām not really up to date with all the modern sampling tricks.
This is off-thread topic. So maybe we should let this idea slip through the cracks unless someone picks it up (maybe I will).
This is strictly not correct. MCMC algorithms are numerical integration methods. If certain conditions are satisfied, the resulting samples can be used to approximate certain expectations numerically. Moreover, when those conditions are satisfied, thereās a central limit theorem that bounds the error in the approximation of expectations. MCMC methods cannot sample from an improper distribution. āImproper distributionsā are not distributions at all. One could never draw exact samples from an āimproper distribution,ā so it is also not possible to draw samples using MCMC methods. Multi-modal distributions are in principle no problem for MCMC, but in practice most widely used MCMC algorithms would require infinite time to jump between widely-separated modes. So when possible, it is a good idea to parametrize a model in such a way that it is unimodal.
According to Markov chain Monte Carlo - Wikipedia :
" In statistics, Markov chain Monte Carlo (MCMC ) is a class of algorithms used to draw samples from a probability distribution. " but it is true that:
" MCMC methods are primarily used for calculating numerical approximations ".
You canāt even draw from the Normal Distribution on a computer (everything is finiteā¦).
Also inaccurate, and depends mostly on the width of the chainās ābottlenecksā (see Mixing Time literature).