Hi all,
I’ve been trying to fit a model to truncated discrete data using Turing, with the aim of recovering the nontruncated distribution’s parameter, and use the fitted model to predict new values based on new truncation points.
Below is a MWP example. Dummy data consists of a set of observations where the data
is above or at the truncation
threshold
Random.seed!(1234)
# Generate Poissondistributed data
λ_true = 10.0 # True Poisson rate parameter
n_samples = 100 # Number of samples
poisson_data = rand(Poisson(λ_true), n_samples)
λ_threshold = 7.0
trunc = rand(Poisson(λ_threshold), n_samples)
poisson_data_truncated = DataFrame(obs_data = poisson_data, truncation = trunc)
poisson_data_truncated = @subset(poisson_data_truncated, :obs_data .>= :truncation)
The T()
notation is from this link, whilst the lower
kwarg is from a more recent post here
I then set up the Turing
model:
@model function truncated_poisson_1(obs_data, trunc_thres)
# Prior for λ
λ ~ Exponential(5)
T = typeof(λ)
# Likelihood for each data point, truncated Poisson
for i in 1:length(obs_data)
obs_data[i] ~ truncated(Distributions.Poisson(λ); lower = T(trunc_thres[i]) , upper = T(100))
end
end
model_1 = truncated_poisson_1(poisson_data_truncated.obs_data, poisson_data_truncated.truncation)
chain_1 = sample(model_1, NUTS(), 10)
and got the following error:
MethodError: no method matching _gammalogccdf(::ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 1}, ::ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 1}, ::ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 1})

Is there anything wrong with my code above? I can switch to
Gibbs(MH())
for sampling and it works  but the convergence is quote poor that I would much preferNUTS()
if possible 
I see from the earlier link that it worked with the
Exponential
distribution? I understand from various posts thatNUTS()
uses AD which causes some issues, but I was under the impression from Distributions.jl that thelogccdf
is available for all univariate distributions? 
Seems like others from
StatsFuns.jl
andDistributions.jl
tried doing something about it but then stopped here and here.
As a bruteforce  if I am to combine the use of NUTS()
and to be able to use predict
down the line, then I’ve come up with the below as a workaround:
@model function truncated_poisson_2(obs_data, trunc_thres, response_data)
λ ~ Exponential(1)
for i in 1:length(obs_data)
if obs_data[i] > trunc_thres[i]
# Poisson loglikelihood
logp = logpdf(Poisson(λ), obs_data[i])
tmp_cdf = zero(Float64)
for j in 0:trunc_thres[i]
tmp_cdf += pdf(Poisson(λ), j)
end
tmp_cdf = min(one(eltype(tmp_cdf)), tmp_cdf) # just in case
trunc_logp = log1p(tmp_cdf) # errors out if log1p(cdf(Poisson(λ), trunc_thres[i]))??
Turing.@addlogprob! logp  trunc_logp
end
# for predicting data
if response_data[i] === missing
response_data[i] ~ Poisson(λ)
end
end
end
model_2 = truncated_poisson_2(poisson_data_truncated.obs_data, poisson_data_truncated.truncation, poisson_data_truncated.obs_data)
chain_2 = sample(model_2, NUTS(), MCMCThreads(), 100, 2)
Quite messy. I’ll need to add on Censoring later on too so really hoping for something along the lines of:
response_data[I] ~ censored(truncated(Poisson(λ); lower = trunc_thres[I]), upper = cen_thres[I])
Lastly, I also tried Normal
distribution  same syntax  seems to have worked fine?
n_samples = 100
m_true = 3.0
s_true = 0.5
normal_data = rand(Normal(m_true, s_true), n_samples)
μ_threshold = 2.5
trunc_2 = rand(Normal(μ_threshold, s_true), n_samples)
normal_data_truncated = DataFrame(obs_data = normal_data, truncation = trunc_2)
normal_data_truncated = @subset(normal_data_truncated, :obs_data .>= :truncation)
@model function truncated_norm_1(obs_data, trunc_thres)
m ~ Exponential(1)
s ~ Exponential(0.1)
for i in 1:length(obs_data)
obs_data[i] ~ truncated(Normal(m, s), trunc_thres[i], 100)
end
end
model_3 = truncated_norm_1(normal_data_truncated.obs_data, normal_data_truncated.truncation)
chain_3 = sample(model_3, NUTS(), 10)