Trouble translating a simple PyMC3 model to Turing

dodoplus · November 21, 2021, 10:27am

I’ve been playing around with Turing solving this simple problem from Allen Downey’s Think Bayes:

23 grizzly bears visit a “trap” in the first season. In the 2nd season, there are 19 bears, 4 of which are repeat customers. We need to estimate the total number of bears.

This is the solution Downey gives using PyMC3:

k = 23
n = 19
x = 4

with pm.Model() as model6:
    N = pm.DiscreteUniform('N', 50, 500)
    y = pm.HyperGeometric('y', N=N, k=k, n=n, observed=x)
    trace6 = pm.sample(1000, **options)

The mean of the posterior distribution for N is 182.

Now, this should be the equivalent Julia code:

k = 23
n = 19
x = 4

@model function model6(y)
    N ~ DiscreteUniform(50, 500)
    y ~ Hypergeometric(k, N - k, n)
end
trace6 = sample(model6(x), IS(), 500);

(is IS the right sampler to use? NUTS crashes with an uninformative “Inexact” error)

the mean of this distribution is 284.

Where am I going wrong?

Thanks

DD

cjdoris · November 21, 2021, 10:35am

You sample 1000 in the Python case and 500 in the Julia case. Did you just divide by the wrong thing?

dodoplus · November 21, 2021, 10:46am

The code here is with a single chain instead of 2.
The result doesn’t change if I increase the samples to 1000 or even 5000.

Neither switching to a different sampler - e.g. SMC.
This is the result I get (with SMC & 1000 samples, but as I said, this hardly changes the result):

and this is Downey’s result:

Christopher_Fisher · November 21, 2021, 12:19pm

Hi,

NUTS does not work on discrete parameters. Your full error message may have included Int(some_value), but I agree it is not the most informative.

I’m not sure what sampler would be best for this problem. What is used in PyMC3?

One unrelated point: you should pass variables to your model function rather than declare global variables. This is good practice in general, but specifically in Julia, global variables often reduce performance because the compiler cannot guarantee their types at the time of execution.

Here is a better approach:

using Turing, Random
Random.seed!(505)
k = 23
n = 19
y = 4

@model function model6(y, k, n)
    N ~ DiscreteUniform(50, 500)
    y ~ Hypergeometric(k, N - k, n)
end
trace6 = sample(model6(y, k, n), IS(), 500);

I can reproduce your results. Indeed, the posterior distribution is not changing from the prior.

Christopher_Fisher · November 21, 2021, 12:27pm

PG seems to give reasonable results:

using Turing, Random
Random.seed!(505)
k = 23
n = 19
y = 4

@model function model6(y, k, n)
    N ~ DiscreteUniform(50, 500)
    y ~ Hypergeometric(k, N - k, n)
end
trace6 = sample(model6(y, k, n), PG(50), 1000)

dodoplus · November 21, 2021, 12:28pm

Metropolis:

Multiprocess sampling (2 chains in 2 jobs)
Metropolis: [N]
100.00% [4000/4000 00:00<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 1 seconds.
The number of effective samples is smaller than 25% for some parameters.

It would be nice if Turing chose sensible defaults automatically.

Christopher_Fisher · November 21, 2021, 12:31pm

Does PYMC3 always default to Metropolis-Hastings or does it select a default based on your model properties?

Here is the Metropolis-Hastings version:

using Turing, Random
Random.seed!(505)
k = 23
n = 19
y = 4

@model function model6(y, k, n)
    N ~ DiscreteUniform(50, 500)
    y ~ Hypergeometric(k, N - k, n)
end
trace6 = sample(model6(y, k, n), MH(), 1000)

plot(trace6)

dodoplus · November 21, 2021, 12:31pm

Thanks!
But why aren’t IS and SMC working?

Christopher_Fisher · November 21, 2021, 12:32pm

Good question. I am not exactly sure. It could be that they are not the best samplers for the problem or the hyper parameters need to be tweaked. Maybe one of the Turing devs will know.

dodoplus · November 21, 2021, 12:35pm

It’s usually NUTS, I think.

So, apparently, it does sensible decisions regarding what algorithms to use for a specific problem.
Why can’t we have a similar functionality in Turing? This is mainly a function of the distributions used in the model, right? Turing can definitely deduce whether e.g. a discrete random variable is required

Christopher_Fisher · November 21, 2021, 12:45pm

Yeah. Thats a good question. I can’t speak for the Turing devs, but I suspect setting default samplers may not have a large benefit and may have the drawback of making the sampler more opaque. NUTS is reasonable continuous variables, but discrete parameters can be challenging and more problem-specific. In my experience, Metropolis-Hastings has poor scalability and can fail in many cases. I suspect it is a default that is overridden in many cases. However, most samplers have reasonable default settings when possible. For example, NUTS uses the same defaults in Stan. Sometimes you can change the target acceptance rate to improve performance. Even with NUTS, the most challenging part can be finding the right transformations to enable efficient sampling. It would be nice if Turing selected an optimal transformation, but I suspect that would be difficult to do

dodoplus · November 21, 2021, 1:12pm

Continuing this thread…

A similar issue occurs with this PyMC3 model (also from Think Bayes):

k10 = 20 - 3
k01 = 15 - 3
k11 = 3
num_seen = k01 + k10 + k11

with pm.Model() as model9:
    p0 = pm.Beta('p0', alpha=1, beta=1)
    p1 = pm.Beta('p1', alpha=1, beta=1)
    N = pm.DiscreteUniform('N', num_seen, 350)
    
    q0 = 1-p0
    q1 = 1-p1
    ps = [q0*q1, q0*p1, p0*q1, p0*p1]
    
    k00 = N - num_seen
    data = pm.math.stack((k00, k01, k10, k11))
    y = pm.Multinomial('y', n=N, p=ps, observed=data)
    trace9 = pm.sample(1000, **options)

PyMC3 chose Metropolis:

Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>NUTS: [p1, p0]
>Metropolis: [N]
100.00% [4000/4000 00:02<00:00 Sampling 2 chains, 0 divergences]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 3 seconds.
The acceptance probability does not match the target. It is 0.5430480274605854, but should be close to 0.8. Try to increase the number of tuning steps.
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.
The estimated number of effective samples is smaller than 200 for some parameters.

Now, to Julia:

@model function model9(k01, k10, k11)
    p0 ~ Beta(1, 1)
    p1 ~ Beta(1, 1)
    num_seen = k01 + k10 + k11
    N ~ DiscreteUniform(num_seen, 350)
    
    q0 = 1-p0
    q1 = 1-p1
    ps = [q0*q1, q0*p1, p0*q1, p0*p1]
    
    y = [N - num_seen, k01, k10, k11]
    y ~ Multinomial(N, ps)
end
    
trace9 = sample(model9(k01, k10, k11), PG(50), 500)

We get:

Sampling: 100%|█████████████████████████████████████████| Time: 0:00:02
[112]:
Chains MCMC chain (500×9×1 Array{Float64, 3}):

Log evidence      = 0.0
Iterations        = 1:1:500
Number of chains  = 1
Samples per chain = 500
Wall duration     = 2.74 seconds
Compute duration  = 2.74 seconds
parameters        = p0, p1, N, y[1], y[2], y[3], y[4]
internals         = lp, logevidence

Summary Statistics
  parameters       mean       std   naive_se      mcse        ess      rhat    ⋯
      Symbol    Float64   Float64    Float64   Float64    Float64   Float64    ⋯

          p0     0.5122    0.2965     0.0133    0.0125   449.8449    0.9981    ⋯
          p1     0.5232    0.2838     0.0127    0.0105   449.0627    0.9988    ⋯
           N   190.5040   90.7350     4.0578    4.4045   514.1959    1.0049    ⋯
        y[1]    44.0380   48.9169     2.1876    2.1886   572.9458    1.0044    ⋯
        y[2]    47.2080   48.1557     2.1536    1.8543   469.3461    0.9985    ⋯
        y[3]    47.9620   54.4876     2.4368    2.3048   503.4060    0.9983    ⋯
        y[4]    51.2960   54.6575     2.4444    2.4123   430.4700    1.0021    ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%      25.0%      50.0%      75.0%      97.5% 
      Symbol   Float64    Float64    Float64    Float64    Float64 

          p0    0.0350     0.2440     0.5235     0.7708     0.9749
          p1    0.0447     0.2911     0.5222     0.7576     0.9844
           N   39.9500   108.0000   191.5000   267.0000   338.5250
        y[1]    0.0000     9.0000    25.0000    66.2500   180.7250
        y[2]    0.4750    10.0000    30.0000    68.0000   183.0500
        y[3]    0.0000     8.7500    27.0000    68.0000   199.2000
        y[4]    0.0000    11.0000    32.0000    72.0000   185.6750

and:

This seems to be the same problem, but this time PG(50) doesn’t work…
Neither does MH (which for some reason returns a normal distribution:

trace9 = sample(model9(k01, k10, k11), MH(), 500)

dlakelan · November 21, 2021, 4:04pm

It doesn’t return a normal it returns a chain that didn’t move from its initial place and the density plot smoothed the delta function into a normal.

Notice how the normal is mean 146 and has no real mass by 145 or 147.

Sampling discrete distributions can be difficult but I’ll try to take a look at this later today to see if I can come up with some suggestions.

dlakelan · November 21, 2021, 4:11pm

PyMC chose mixed sampling, with p1 and p2 updated by NUTS and N updated by Metropolis (note Metropolis is not just one algorithm the question is what is the proposal it uses?)

You can do this kind of mixed sampling in Turing, I’m on my phone so hard to pull an exact link but check out the website docs and look for Compositional Gibbs sampling https://turing.ml/dev/docs/using-turing/guide

dlakelan · November 21, 2021, 5:05pm

I think this should work, but for me it segfaults

using Turing,StatsPlots

@model function model9(k01, k10, k11)
    p0 ~ Beta(1, 1)
    p1 ~ Beta(1, 1)
    num_seen = k01 + k10 + k11
    N ~ DiscreteUniform(num_seen, 350)
    
    q0 = 1-p0
    q1 = 1-p1
    ps = [q0*q1, q0*p1, p0*q1, p0*p1]
    
    y = [N - num_seen, k01, k10, k11]
    y ~ Multinomial(N, ps)
end

k10 = 20-3
k01 = 15-3
k11 = 3

trace9 = sample(model9(k01, k10, k11), Gibbs(NUTS(1000,.8,:p0,:p1),PG(20,:N)),1000)

density(trace9[:N])

dodoplus · November 21, 2021, 5:35pm

I’m getting a gazillion warnings:

Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, true, false, true)
└ @ AdvancedHMC ~/.julia/packages/AdvancedHMC/HQHnm/src/hamiltonian.jl:47

The sampled distributions are rubbish

dlakelan · November 21, 2021, 5:45pm

You may need to first optimize the initial position. Perhaps pick a value from the sample on python just as a way to test if you can initialize it and make it work

The keyword argument for the initial position is wrong in the documentation it should be init_params=

dodoplus · November 21, 2021, 6:29pm

OK, tried to cheat, and pass initial parameters that are very close to their mean (according to PyMC3):

trace9 = sample(model9(k01, k10, k11), Gibbs(NUTS(1000,.8,:p0,:p1),PG(20,:N)), 1000, init_params=[0.2, 0.2, 100])

I’m still getting the warnings and results are even worse:

This does not strike me as an healthy chain (but at this point it should be clear that I’m not an expert in MCMC, Turing or PyMC3)

dlakelan · November 21, 2021, 6:33pm

Definitely not a healthy chain but choosing the mean can be a bad idea it can actually be far from a typical value just try choosing the last sample used by pymc

meanwhile I’m updating some of my packages in hopes to eliminate the segfault so I can try it out. I’ve been meaning to try Gibbs mixed continuous / discrete sampling anyway.

dodoplus · November 21, 2021, 7:14pm

OK, I’ve started the original on Colab:

Things aren’t that great here either apparently:

The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

(would be great if Turing gave similar commentary…)

Latest traces are:

{'p0_logodds__': -1.1201974288020848, 'p1_logodds__': -1.8108031775461457, 'N': 94, 'p0': 0.24597466437185683, 'p1': 0.1405410824792338}
{'p0_logodds__': -1.0759843501248898, 'p1_logodds__': -1.3370046772297803, 'N': 95, 'p0': 0.25426669304142646, 'p1': 0.208003069852844}
{'p0_logodds__': -1.309490295895163, 'p1_logodds__': -2.229973878615384, 'N': 97, 'p0': 0.21257214856225384, 'p1': 0.09709093087482638}
{'p0_logodds__': -1.2321949553810507, 'p1_logodds__': -2.4305296251833313, 'N': 96, 'p0': 0.22579748811056297, 'p1': 0.0808740895970291}

which aren’t far from the initial parameters I’ve tried (p0=0.2, p1=0.2, N=100).

I’ve tried:

trace9 = sample(model9(k01, k10, k11), Gibbs(NUTS(1000,.8,:p0,:p1),PG(20,:N)), 1000, init_params=[0.22, 0.08, 95])

But no joy:

Topic		Replies	Views
Slow Turing.jl sampling compared to python pymc Probabilistic Programming turing	7	143	June 26, 2025
Turing, Inexacterror for DiscreteUniform distribution, with NUTS sampler Probabilistic Programming turing , distributions	5	502	January 4, 2021
Imputing Discrete Data Probabilistic Programming question , turing	14	799	April 7, 2021
[Turing] Using a random variable as an index Statistics turing , probability	2	3081	November 8, 2021
Turing: help with slow model Probabilistic Programming	46	2647	February 13, 2021

Trouble translating a simple PyMC3 model to Turing

Related topics