Hi there Turing.jl peoples. Still new as ever to Bayes stuff, but having a blast learning(when I can). I ran into a situation where I want to explore censored modelling.
I’ve scoured the documentation in Turing.jl and didn’t see any examples or obvious ways to go about doing it.
Hm, I think if you want to draw the censored data from a distribution, you can leave the datapoint as missing while sampling. For example, if you had data
y = [1.0, missing]
and a model
@model function f(y)
m ~ Normal(0, 1)
for i in 1:length(y)
y[i] ~ Normal(m, 1)
end
end
You’ll actually end up imputing the distribution of y[2] since it’s a missing value. It’s equivalent to it being a parameter in the model.
First of all - really awesome Turing supports modifying posterior pdf’s via missing values. Had no idea, but that’s very useful and convenient.
so I guess if our data is left or right censored, we could do something like the following?
@model function f(y, l_censored, r_censored)
m ~ Normal(0, 1)
y ~ filldist(Normal(m, 1), length(y))
for (n,censorand) in enumerate(l_censored)
missing ~ truncated(Normal(m, 1), censorand, Inf)
end
for (n, censorand) in enumerate(r_censored)
missing ~ truncated(Normal(m, 1), -Inf, censorand)
end
end
(consider this pseudocode I didn’t run it)
Or is there a more preferred way of dealing with this?
We check for missing values and treat them differently. The check happens at compile-time if the whole variable is missing or at runtime if it’s a Vector{T, Missing} for example.
@cscherrer - I agree this is a really good way to challenge a PPL. I’d love to see how Soss evolves to handle this case. It seems niche at first, but, as we’ve chit-chatted it does show up a lot. Having Julia support missing values as a first class citizen makes this very interesting. I REALLY like the design decision made in Turing, but I feel like left/right censoring may benefit from a convenience function? Also not sure how derivative based samplers would work here - my guess is they’d pop?
In a lot of applications, a sensor with a limited range will just clamp the result, literally like Julia’s clamp function. So instead of missing, an overflows might lead to lots of repetitions of whatever the max value is.
I wouldn’t think this would be that hard to implement, and I’m kind of surprised it’s not already in Distributions, especially given Truncated is there. That would probably be my approach. Implement it using Truncated as a guide, and then you could use the result in any PPL.
So depending on how you do the censoring you get vastly different solutions
x_l = rand(truncated(Normal(0,1), 1.0, Inf), 300)
@model function missing_trunc(l_censored)
m ~ Normal(0, 1)
for n in 1:length(l_censored)
missing ~ truncated(Normal(m, 1), l_censored[n], Inf)
end
end
Model = missing_trunc( x_l )
chain1 = sample(Model, PG(100), 300)
@model function l_censor_trunc(l_censored)
m ~ Normal(0, 1)
for n in 1:length(l_censored)
l_censored[n] ~ truncated(Normal(m, 1), l_censored[n], Inf)
end
end
Model = l_censor_trunc( x_l )
chain2 = sample(Model, PG(100), 300)
I think you’re right in that the correct way to do this is via l_censored[n] ~, but the predictions it offers don’t make sense to me. Using Missing gives a value I’d expect, but weirdly Turing appears to be treating the term missing as a variable?
Hm. Not so sure if what I’ve suggested is correct or not at this point…
You x_l doesn’t have any missing values, so it’s all treated as observations. The only parameter/random variable inferred in this case is m. The first model doesn’t make any sense in Turing, it just creates a random variable called missing and you are sampling it multiple times which should error in Turing. If it doesn’t, please open an issue. The second model is the correct one.
Turing doesn’t throw an error so I’ll open an issue - but, it’s okay.
I agree it is correct. Kind of hard to get my head around what a left/right censored value really means I guess. I feel like its incorrect to represent it this way, it almost seems like it needs a hyperprior? But I’ll come to terms with it :).
I played around with this a few months ago and had something working with the logpdf, the logcdf, and the logccdf. In the model function I brought in a variable named ‘censor’ related to what type of censoring (left/right). Something like this:
for i = 1:length(years)
if censor[i] == 0 # observed
Turing.acclogp!(_varinfo, logpdf(Weibull(beta, eta), years[i]) * weights[i])
elseif censor[i] == 1 # right censored
Turing.acclogp!(_varinfo, logccdf(Weibull(beta, eta), years[i]) * weights[i])
elseif censor[i] == -1 # left censored
Turing.acclogp!(_varinfo, logcdf(Weibull(beta, eta), years[i]) * weights[i])
end
end
I’d check this blog post out for something similar with Stan:
That’s the syntax I was looking for. Okay neat- now I can compare the missing approach to directly adding to the logp. Once I do this I’ll probably write about the results in a brief post pick a “solution” and we can go on our way.
Can I ask why you opted to use Turing.acclogp instead of sampling from the weibull directly? Shouldn’t we only need to use that in cases where things are censored?
Alright so now I am back to thinking that missing~ makes sense… @mohamed82008 can you talk me out of it? So the idea is, Turing handles missing ~ or the index into an array containing missings as a special case? So we can sample a value thats left/right censored, by using a truncated distribution - why not. But, we aren’t going to claim that the posterior distribution should map to the censored value? So Missing does make sense, and it is giving me - at least in this contrived case a very similar distribution for our parameter m as using acclogp directly.
x_l = rand(truncated(Normal(0,1), 1.0, Inf), 300)
@model function missing_trunc(l_censored)
m ~ Normal(0, 1)
for n in 1:length(l_censored)
missing ~ truncated(Normal(m, 1), l_censored[n], Inf)
end
end
Model = missing_trunc( x_l )
chain1 = sample(Model, PG(100), 300)
@model function l_censor_trunc(l_censored)
m ~ Normal(0, 1)
for n in 1:length(l_censored)
l_censored[n] ~ truncated(Normal(m, 1), l_censored[n], Inf)
end
end
Model = l_censor_trunc( x_l )
chain2 = sample(Model, PG(100), 300)
@model function logpccdf(l_censored)
m ~ Normal(0, 1)
for n in 1:length(l_censored)
Turing.acclogp!(_varinfo, logccdf(Normal(m, 1), l_censored[n] ) )
end
end
Model = logpccdf( x_l )
chain3 = sample(Model, PG(100), 300)
"missing_distribution" => median( get_params( chain1 ).m )
"censored value" => median( get_params( chain2 ).m )
"logp_hacking" => median( get_params( chain3 ).m )
It’s doing the wrong thing! I wrote the DSL code so I know
It’s not very special. A scalar input variable that’s missing will be treated the same way. But using missing like that actually defines a variable called missing. So think of it as x or something. The “wrong” part is that you are defining the variable multiple times in a loop. You should use an array to tell Turing that these are different variables. This array will be partly numbers (observations) and partly Base.missing which will be sampled.
Hi. I’m working on an example of censored regression. I’m trying to adapt the approach outlined by @joshualeond. I know the basics of what we are doing here… we are not treating censored data as missing, we are not imputing what their values are, instead we are manually incrementing the model log probability. But I am pretty new to Turing, so would appreciate some implementation help if possible?
Here is a MWE of a censored linear regression that I have so far.
using Turing, StatsPlots
function censor_data(x, y, L, U)
censor = fill(0, length(x))
for i in 1:length(x)
if y[i] > U
censor[i] = 1
y[i] = U
elseif y[i] < L
censor[i] = -1
y[i] = L
else
censor[i] = 0
end
end
return x, y, censor
end
@model function censored_regression(x, y, censor)
m ~ Normal(0, 2)
c ~ Normal(0, 2)
σ ~ truncated(Normal(0, 20), 0, Inf)
for i in 1:length(x)
if censor[i] == 0 # observed
Turing.acclogp!(_varinfo, logpdf(Normal(m*x[i]+c, σ), y[i]))
elseif censor[i] == 1 # right/upper censored
Turing.acclogp!(_varinfo, logccdf(Normal(m*x[i]+c, σ), y[i]))
elseif censor[i] == -1 # left/lower censored
Turing.acclogp!(_varinfo, logcdf(Normal(m*x[i]+c, σ), y[i]))
end
end
end
# generate data
m, c, σ, N = 1, 0, 2, 200
x = rand(Uniform(-10, 10), N)
y = rand.(Normal.(m.*x.+c, σ))
# Apply censoring
L, U = -5, 5
x, y, censor = censor_data(x, y, L, U)
chain = sample(censored_regression(x, y, censor), NUTS(), 2000)
plot(chain)
I think I’m definitely learning here, as the model does actually run! The chains look good.
However, the parameter estimates are biased (ie m is quite far below it’s true value of 1). So I don’t think the model is doing what it is intended to do. It is as if it is just doing regular regression, so I expect this issue is with the model definition. I would appreciate any insights here.
function visualise(x, y, chain, subplot, true_params)
m, c = true_params
scatter!(x, y, color=:grey, markerstrokewidth=0, label=:none, subplot=subplot)
xi = [minimum(x), maximum(x)]
n_samples = length(chain[:m])
samples_to_draw = 100
samples_to_plot = sample([1:n_samples;], samples_to_draw; replace=false)
for i in samples_to_plot
plot!(xi, chain[:m][i].*xi.+chain[:c][i], color=:black, alpha=0.1, label=false, subplot=subplot)
end
plot!(xi, mean(chain[:m]).*xi.+mean(chain[:c]), color=:red, lw=2, label="posterior mean", ylim=[minimum(y)*1.5, maximum(y)*1.5], subplot=subplot)
plot!(xi, m.*xi.+c, color=:black, lw=3, label="true", ylim=[minimum(y)*1.5, maximum(y)*1.5], subplot=subplot)
plot!(tick_direction=:out)
end
visualise(x, y, chain, 1, (m, c))