How to (if possible) re-write this simple censored Poisson model without Turing.@addlogprob!

AdamR · February 20, 2023, 8:17am

I want to estimate the Poisson distribution parameter λ (usually λ < 1). I am given N observations, however they are censored: I can only see if the value was 0 or greater than zero, i.e.

Instead of observing the vector latent:
0 1 2 1 0 1 2 1 0 0 0 0 3
I see the y:
0 1 1 1 0 1 1 1 0 0 0 0 1

Since there are only two possible values allowed, for efficiency I use different data: N - total number of observations, nzero - number of observations equal zero. Of course nzero <= N.

@model function poisson_model_fast(nzero::Int, N::Int)
    λ ~ Uniform(0, 100)
    Turing.@addlogprob! logpdf(Poisson(λ), 0)*nzero
    Turing.@addlogprob! logccdf(Poisson(λ), 0)*(N-nzero)
end

The problem I have is that manually re-writing the target function, although perfectly legal, does seem hackish and overly complicated to me. I am the only person who can write Probabilistic Programs in the team, and I don’t want to introduce my mates with too steep learning curve.

Do you have any suggestions?

I’ve tried to use the following model, that uses the y vector directly, but it gives wrong results. Do you know, why?

@model function poisson_model(y::Vector{Int})
    λ ~ Uniform(0, 100)
    y ~ truncated(Poisson(λ), 0, 1)
end

Here’s the whole of my code:

using Random
using Distributions
using Turing
using StatsPlots

Random.seed!(3)

N = 1000
nzero = 50
lat = rand(Poisson(3.), N)
y = ifelse.(lat .> 0, 1, 0)

@model function poisson_model_fast(nzero::Int, N::Int)
    λ ~ Uniform(0, 100)
    Turing.@addlogprob! logpdf(Poisson(λ), 0)*nzero
    Turing.@addlogprob! logccdf(Poisson(λ), 0)*(N-nzero)
end
model = poisson_model_fast(nzero, N);

nsamples = 10000
nchains = 8
sampler = MH()
chains = sample(model, sampler, MCMCThreads(), nsamples, nchains, discard_initial = 500, thinning=50);
display(chains)
plot(chains[["λ"]]; colordim=:parameter, legend=true)

For reference, here’s (hopefully) the same model “model.bug” in rjags (input is a vector y). You may notice, it uses slightly different logic, one which uses latent discrete vector lat, and which uses a different censoring specification.

model{
    for(i in 1:N){
      y[i] ~ dinterval(lat[i], latmax[i])
      lat[i] ~ dpois(lambda)
    }     
    lambda ~ dunif(0, 100)
}

You would drive it with the following R script:

library(R2jags)
chain_count <- parallel::detectCores()
n.iter <- 10000
N <- 1000
lat <- rpois(N, 3.)
y <- ifelse(lat>0,1,0)
latmax <- ifelse(y==0, 100, 0)
out<-jags.parallel(data=data, inits=NULL, n.iter=n.iter,
                     parameters.to.save = 'lambda', model.file = 'model.bug', n.chains = chain_count)

cpfiffer · February 20, 2023, 2:02pm

AdamR:

I’ve tried to use the following model, that uses the y vector directly, but it gives wrong results. Do you know, why?
@model function poisson_model(y::Vector{Int})
    λ ~ Uniform(0, 100)
    y ~ truncated(Poisson(λ), 0, 1)
end

This part gives you the incorrect results because it’s not the same as the DGP you specified above. A truncated Poisson is different from this other distribution you have.

To be honest, using addlogprob! here is probably the best solution without deriving a different likelihood for the “indicator poisson” distribution thing you’ve got.

AdamR · February 20, 2023, 5:58pm

Thank you very much, @cpfiffer for the answer!

Can you explain me, what is DGP? I could not find it on Google.

AdamR · February 20, 2023, 6:01pm

Another question: What exactly do you mean by «“indicator poisson” distribution»? I know what is an indicator function, but I’m not sure I know what do you mean by “indicator distribution”. I suspect I am missing knowledge of an important keyword.

AdamR · February 20, 2023, 6:06pm

Do you think I can use a better sampling algorithm than Metropolis-Hastings? logccdf(Poisson(λ), 0) is differentiable, so in principle I should be able to use HamiltonianMC (or NUTS), but I get a veeery long, and completely cryptic error message when I try sampler = NUTS() (Mozilla Community Pastebin/V34xekjR (JavaScript)).

nilshg · February 20, 2023, 6:26pm

Data generating process

cpfiffer · February 20, 2023, 7:09pm

Another question: What exactly do you mean by «“indicator poisson” distribution»? I know what is an indicator function, but I’m not sure I know what do you mean by “indicator distribution”. I suspect I am missing knowledge of an important keyword.

I’m trying to make up a name to describe your distribution. No important keyword missing, this is a non-standard application and I’m just trying to find the words for it.

Yeah, MH is not the best sampler here. Your link is broken and cannot be debugged right now.

Dan · February 21, 2023, 12:06am

Maybe a simple way to implement this is with a Binomial distribution:

    nzero ~ Binomial(N, exp(-λ))

and explain the Poisson distribution formula for zero prob and the complement being 1-(zero prob), i.e. a sum of Bernoulli variables a.k.a Binomial distribution.
(see https://en.wikipedia.org/wiki/Poisson_distribution)

AdamR · February 21, 2023, 6:02am

Yes, you are right! It would be if during the data generation process I was discarding values > 1, but I am aliasing them with value == 1. Thank you!!

AdamR · February 21, 2023, 6:04am

What a bad luck with the Mozilla Pastebin. Here’s a classic pastebin: TaskFailedException nested task error: TaskFailedException - Pastebin.com

Dan · February 21, 2023, 6:45am

and

are identical bits of code. They effect the same change in the log-prob value (up to a constant of the binomial coefficient which doesn’t influence inference). If the comment:

was a reply to my post (it wasn’t marked as such), then I think I did not understand it or maybe I’m confused about the question.

AdamR · February 21, 2023, 8:50am

No, it wasn’t indeed. It was a comment to the @cpfiffer answer about why the y ~ truncated(Poisson(λ), 0, 1) was wrong. I am apparently struggling with the discourse’ interface. Sorry for the confusion.

AdamR · February 21, 2023, 9:00am

Here’s the refined model written in the more user-friendly way. What is great, is that the sampling errors when I use the NUTS sampler are gone!!

@model function binomial_model_fast(nzero::Int, N::Int)
    λ ~ Uniform(0, 100)
    nzero ~ Binomial(N, pdf(Poisson(λ), 0)) # pdf(Poisson(λ), 0) is proportional to exp(-λ), so we can just write `nzero ~ Binomial(N, exp(-λ))`
end
model = binomial_model_fast(nzero, N);

Thank you @Dan for the recommendation that it is actually a Binomial model. I am very grateful for this insight!

cpfiffer · February 22, 2023, 12:19am

Dan:

AdamR:
    Turing.@addlogprob! logpdf(Poisson(λ), 0)*nzero
    Turing.@addlogprob! logccdf(Poisson(λ), 0)*(N-nzero)
and

Dan:

nzero ~ Binomial(N, exp(-λ))

are identical bits of code. They effect the same change in the log-prob value (up to a constant of the binomial coefficient which doesn’t influence inference). If the comment:

Very well done, @Dan!

Topic		Replies	Views
Turing: selective truncated distribution error Probabilistic programming turing , distributions , autodiff	1	142	September 12, 2024
Puzzling on a product likelihood using Distributions and Turing Probabilistic programming question , turing , distributions	2	1548	January 21, 2021
Poisson regression in Turing Probabilistic programming question	2	888	February 5, 2022
A simple model which I can't get Turing to fit to simulated data Probabilistic programming	1	440	May 24, 2022
[Turing.jl] Censored Values Probabilistic programming	21	2398	February 13, 2021

How to (if possible) re-write this simple censored Poisson model without Turing.@addlogprob!

Related topics