Hi all,
I’ve been trying to fit a model to truncated discrete data using Turing, with the aim of recovering the non-truncated distribution’s parameter, and use the fitted model to predict new values based on new truncation points.
Below is a MWP example. Dummy data consists of a set of observations where the data
is above or at the truncation
threshold
Random.seed!(1234)
# Generate Poisson-distributed data
λ_true = 10.0 # True Poisson rate parameter
n_samples = 100 # Number of samples
poisson_data = rand(Poisson(λ_true), n_samples)
λ_threshold = 7.0
trunc = rand(Poisson(λ_threshold), n_samples)
poisson_data_truncated = DataFrame(obs_data = poisson_data, truncation = trunc)
poisson_data_truncated = @subset(poisson_data_truncated, :obs_data .>= :truncation)
The T()
notation is from this link, whilst the lower
kwarg is from a more recent post here
I then set up the Turing
model:
@model function truncated_poisson_1(obs_data, trunc_thres)
# Prior for λ
λ ~ Exponential(5)
T = typeof(λ)
# Likelihood for each data point, truncated Poisson
for i in 1:length(obs_data)
obs_data[i] ~ truncated(Distributions.Poisson(λ); lower = T(trunc_thres[i]) , upper = T(100))
end
end
model_1 = truncated_poisson_1(poisson_data_truncated.obs_data, poisson_data_truncated.truncation)
chain_1 = sample(model_1, NUTS(), 10)
and got the following error:
MethodError: no method matching _gammalogccdf(::ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 1}, ::ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 1}, ::ForwardDiff.Dual{ForwardDiff.Tag{DynamicPPL.DynamicPPLTag, Float64}, Float64, 1})
-
Is there anything wrong with my code above? I can switch to
Gibbs(MH())
for sampling and it works - but the convergence is quote poor that I would much preferNUTS()
if possible -
I see from the earlier link that it worked with the
Exponential
distribution? I understand from various posts thatNUTS()
uses AD which causes some issues, but I was under the impression from Distributions.jl that thelogccdf
is available for all univariate distributions? -
Seems like others from
StatsFuns.jl
andDistributions.jl
tried doing something about it but then stopped here and here.
As a brute-force - if I am to combine the use of NUTS()
and to be able to use predict
down the line, then I’ve come up with the below as a workaround:
@model function truncated_poisson_2(obs_data, trunc_thres, response_data)
λ ~ Exponential(1)
for i in 1:length(obs_data)
if obs_data[i] > trunc_thres[i]
# Poisson log-likelihood
logp = logpdf(Poisson(λ), obs_data[i])
tmp_cdf = zero(Float64)
for j in 0:trunc_thres[i]
tmp_cdf += pdf(Poisson(λ), j)
end
tmp_cdf = min(one(eltype(tmp_cdf)), tmp_cdf) # just in case
trunc_logp = log1p(-tmp_cdf) # errors out if log1p(-cdf(Poisson(λ), trunc_thres[i]))??
Turing.@addlogprob! logp - trunc_logp
end
# for predicting data
if response_data[i] === missing
response_data[i] ~ Poisson(λ)
end
end
end
model_2 = truncated_poisson_2(poisson_data_truncated.obs_data, poisson_data_truncated.truncation, poisson_data_truncated.obs_data)
chain_2 = sample(model_2, NUTS(), MCMCThreads(), 100, 2)
Quite messy. I’ll need to add on Censoring later on too so really hoping for something along the lines of:
response_data[I] ~ censored(truncated(Poisson(λ); lower = trunc_thres[I]), upper = cen_thres[I])
Lastly, I also tried Normal
distribution - same syntax - seems to have worked fine?
n_samples = 100
m_true = 3.0
s_true = 0.5
normal_data = rand(Normal(m_true, s_true), n_samples)
μ_threshold = 2.5
trunc_2 = rand(Normal(μ_threshold, s_true), n_samples)
normal_data_truncated = DataFrame(obs_data = normal_data, truncation = trunc_2)
normal_data_truncated = @subset(normal_data_truncated, :obs_data .>= :truncation)
@model function truncated_norm_1(obs_data, trunc_thres)
m ~ Exponential(1)
s ~ Exponential(0.1)
for i in 1:length(obs_data)
obs_data[i] ~ truncated(Normal(m, s), trunc_thres[i], 100)
end
end
model_3 = truncated_norm_1(normal_data_truncated.obs_data, normal_data_truncated.truncation)
chain_3 = sample(model_3, NUTS(), 10)