Problem with a hypergeometric Turing model

mlanghinrichs · May 29, 2020, 4:01pm

Hi all,

I’m trying to get a bit more into probabilistic programming with Turing.jl. I want to build a simple hypergeometric model. Imagine you have a large box with Nred red and Nblue blue balls (makes N = Nred + Nblue balls in total) and you draw n=10 balls out of this box (four times) and get (3, 7), (4, 6), (4, 6) and (5, 5) red and blue balls in the samples. I want to sample the 3d posterior of (N, Nred, Nblue) (2d effectively, due to the constraint), given this data.

Please see my attempt of a model specification in Turing below. Unfortunately I get an error, because sometimes Nred is sampled >N, although it should be upper-bounded by N, leading to Nblue<0. Nblue<0 then causes the error in the hypergeometric distribution. These are typical outcomes of the print statements:

new:
83
26
57
new:
89
12
77
new:
11
18
-7

Does anyone know why this goes wrong?

Turing model:

using Turing
using StatsPlots

@model hypergeo(nred) = begin
    # fixed parameters
    n = 10 # number of balls in the drawn sample

    # priors
    N ~ DiscreteUniform(n, 100) # balls in the total population
    Nred ~ DiscreteUniform(0, N) # red balls in the total population
    Nblue = N - Nred # blue balls in the total population

    println("new:")
    println(N)
    println(Nred)
    println(Nblue)

    # observations
    @. nred ~ Hypergeometric(Nred, Nblue, n)
end

#  run sampler, collect results
chn = sample(hypergeo([3,4,4,5]), MH(), 5000)

# summarise results
describe(chn)

# plot results
p = plot(chn)

mohamed82008 · May 29, 2020, 4:47pm

Hi!

Your problem seems similar to Improper initial values when supports of the prior distribution are themselves random · Issue #1270 · TuringLang/Turing.jl · GitHub. Turing currently doesn’t support stochastic support for prior distributions but there is a neat workaround using a custom distribution like the example shown here Improper initial values when supports of the prior distribution are themselves random · Issue #1270 · TuringLang/Turing.jl · GitHub. In your case, the custom distribution will be a subtype of DiscreteMultivariateDistribution and you won’t need a bijector. This should probably work.

using Random, Distributions

struct MyDist <: DiscreteMultivariateDistribution
    n::Int
end
function Base.rand(rng::Random.AbstractRNG, d::MyDist)
    N = rand(rng, DiscreteUniform(d.n, 100))
    Nred = rand(rng, DiscreteUniform(0, N))
    return [N, Nred]
end
function Distributions.logpdf(d::MyDist, x::Vector{Int})
    return logpdf(DiscreteUniform(d.n, 100), x[1]) + logpdf(DiscreteUniform(0, x[1]), x[2])
end

mlanghinrichs · May 29, 2020, 5:12pm

thanks for your answer! But I’m a bit confused now. If I adapt the “Hello World” example from Turing a bit like this:

# Define a simple Normal model with unknown mean and variance.
@model gdemo(x, y) = begin
    s ~ InverseGamma(2, 3)
    m ~ Normal(s, 0.01)

    println("new:")
    println(s)
    println(m)
    
    x ~ Normal(m, sqrt(s))
    y ~ Normal(m, sqrt(s))
end

#  Run sampler, collect results
chn = sample(gdemo(1.5, 2), MH(), 5) # just 5 iteration for nice print output

I get the following output:

new:
2.326687447393811
2.330099688761129
new:
1.284186752463607
2.327211558566936
new:
0.9287803374710428
2.320031712933687
new:
2.334071342743244
2.3090054849030874
new:
0.8360725845595213
2.3227973835651685
new:
2.323124161080269
2.322218261627838

So basically m is sampled from a distribution with s’s first realisation (2.326687447393811, i.e. m~N(2.326687447393811, 0.01) ) or what?

So generally, Turing does currently not support priors that depend on any other random variable?

mohamed82008 · May 29, 2020, 5:18pm

No, prior distributions can depend on other random variables so long as the support of the distribution doesn’t. In your case, the support of the second distribution, i.e. the values that the distribution can sample, depends on the first random variable. In the tutorial, the support of the normal distriution [-Inf, Inf] doesn’t change when s changes.

Edit: the support of DiscreteUniform(0, N) is the integers between 0 and N.

mlanghinrichs · May 29, 2020, 5:24pm

yes, but what confuses me that even in my second example—where the support does not change since m ~ Normal(s, 0.01) will always have support [-Inf, Inf]—I think the samples for m and s still look weird.

In the second iteration we have

s = 1.284186752463607
m = 2.327211558566936

so s changes, but m still seems to be sampled from N(2.326687447393811, 0.01) (with the first realisation of s as the mean).

trappmartin · May 29, 2020, 5:26pm

Well, you are not printing the samples for each iteration here. You print the draws from the proposal distribution. You might want to look at the returned chain object instead.

trappmartin · May 29, 2020, 5:27pm

Also can’t you write your initial problem in terms of Bernoulli draws?

mlanghinrichs · May 29, 2020, 5:33pm

Alright, got it! When I looked at chn.value, m seems to “adapt” to s as you would expect. Thanks.

So as a conclusion, I will then just keep this in mind when I’m effecting the support of my priors.

mlanghinrichs · May 29, 2020, 5:41pm

You mean modelling this situation with replacement (binomial/Bernoulli) instead of without replacement (hypergeometric)?

For this I am not so sure because I’m actually interested in these quantities in the total population like N, Nred, and without replacement might be a more appropriate model in my specific case.

edit:

But thanks a lot so far. I will think about this and also try to understand these custom distr approaches.

mohamed82008 · May 29, 2020, 5:50pm

All you need is:

N_Nred ~ MyDist(n)
N = N_Nred[1]
Nred = N_Nred[2]

Topic		Replies	Views
Trouble translating a simple PyMC3 model to Turing New to Julia turing	39	5089	November 23, 2021
Sampling (no inference) from a mixture model (compound probability distribution) General Usage turing	3	308	January 11, 2024
Estimate posterior distribution of initial condition(s) Probabilistic programming turing , bayesian-inference	4	647	October 21, 2021
Turing.jl: fix some parameters to prior distribution Probabilistic programming	14	2119	June 10, 2020
Fixed distribution in Turing General Usage turing	2	312	September 11, 2023

Problem with a hypergeometric Turing model

Related topics