[Turing.jl] Censored Values

Hi there Turing.jl peoples. Still new as ever to Bayes stuff, but having a blast learning(when I can). I ran into a situation where I want to explore censored modelling.

I’ve scoured the documentation in Turing.jl and didn’t see any examples or obvious ways to go about doing it.

PyMC3 Syntax: https://docs.pymc.io/notebooks/censored_data.html
STAN Syntax: https://mc-stan.org/docs/2_18/stan-users-guide/truncated-or-censored-data.html

Is there a preferred way to go about doing this in Turing? I have an idea - but, would rather hear from the pros :).

1 Like

Hm, I think if you want to draw the censored data from a distribution, you can leave the datapoint as missing while sampling. For example, if you had data

y = [1.0, missing]

and a model

@model function f(y)
    m ~ Normal(0, 1)
    for i in 1:length(y)
        y[i] ~ Normal(m, 1)
    end
end

You’ll actually end up imputing the distribution of y[2] since it’s a missing value. It’s equivalent to it being a parameter in the model.

Not sure if that answered your question?

3 Likes

First of all - really awesome Turing supports modifying posterior pdf’s via missing values. Had no idea, but that’s very useful and convenient.

so I guess if our data is left or right censored, we could do something like the following?

@model function f(y, l_censored, r_censored)
    m ~ Normal(0, 1)
    y ~ filldist(Normal(m, 1), length(y))
    for (n,censorand) in enumerate(l_censored)
        missing ~ truncated(Normal(m, 1), censorand, Inf)
    end
    for (n, censorand) in enumerate(r_censored)
        missing ~ truncated(Normal(m, 1), -Inf, censorand)
    end
end

(consider this pseudocode I didn’t run it)

Or is there a more preferred way of dealing with this?

That’s really cool. Is there specific to handling missing values in a nice way, or does something about the technical approach get you that for free?

2 Likes

you probably want to have l_censored[n] here and similarly for the other one

1 Like

We check for missing values and treat them differently. The check happens at compile-time if the whole variable is missing or at runtime if it’s a Vector{T, Missing} for example.

3 Likes

Nice. Soss should be able to do this in principle, but I haven’t worked out how I’d want to have the data structures to make imputation efficient.

Censored days problems like @anon92994695 's are MNAR. Representing the missingness process for this case is a nice PPL challenge.

1 Like

@cscherrer - I agree this is a really good way to challenge a PPL. I’d love to see how Soss evolves to handle this case. It seems niche at first, but, as we’ve chit-chatted it does show up a lot. Having Julia support missing values as a first class citizen makes this very interesting. I REALLY like the design decision made in Turing, but I feel like left/right censoring may benefit from a convenience function? Also not sure how derivative based samplers would work here - my guess is they’d pop?

Guess again :slight_smile:

1 Like

In a lot of applications, a sensor with a limited range will just clamp the result, literally like Julia’s clamp function. So instead of missing, an overflows might lead to lots of repetitions of whatever the max value is.

I wouldn’t think this would be that hard to implement, and I’m kind of surprised it’s not already in Distributions, especially given Truncated is there. That would probably be my approach. Implement it using Truncated as a guide, and then you could use the result in any PPL.

2 Likes

So depending on how you do the censoring you get vastly different solutions

x_l = rand(truncated(Normal(0,1), 1.0, Inf), 300)

@model function missing_trunc(l_censored)
    m ~ Normal(0, 1)
    for n in 1:length(l_censored)
        missing ~ truncated(Normal(m, 1), l_censored[n], Inf)
    end
end
Model = missing_trunc( x_l )
chain1 = sample(Model, PG(100), 300)

@model function l_censor_trunc(l_censored)
    m ~ Normal(0, 1)
    for n in 1:length(l_censored)
        l_censored[n] ~ truncated(Normal(m, 1), l_censored[n], Inf)
    end
end
Model = l_censor_trunc( x_l )
chain2 = sample(Model, PG(100), 300)

I think you’re right in that the correct way to do this is via l_censored[n] ~, but the predictions it offers don’t make sense to me. Using Missing gives a value I’d expect, but weirdly Turing appears to be treating the term missing as a variable?

Hm. Not so sure if what I’ve suggested is correct or not at this point…

You x_l doesn’t have any missing values, so it’s all treated as observations. The only parameter/random variable inferred in this case is m. The first model doesn’t make any sense in Turing, it just creates a random variable called missing and you are sampling it multiple times which should error in Turing. If it doesn’t, please open an issue. The second model is the correct one.

1 Like

Turing doesn’t throw an error so I’ll open an issue - but, it’s okay.

I agree it is correct. Kind of hard to get my head around what a left/right censored value really means I guess. I feel like its incorrect to represent it this way, it almost seems like it needs a hyperprior? But I’ll come to terms with it :).

I played around with this a few months ago and had something working with the logpdf, the logcdf, and the logccdf. In the model function I brought in a variable named ‘censor’ related to what type of censoring (left/right). Something like this:

    for i = 1:length(years)
        if censor[i] == 0 # observed
            Turing.acclogp!(_varinfo, logpdf(Weibull(beta, eta), years[i]) * weights[i])
        elseif censor[i] == 1 # right censored
            Turing.acclogp!(_varinfo, logccdf(Weibull(beta, eta), years[i]) * weights[i])
        elseif censor[i] == -1 # left censored
            Turing.acclogp!(_varinfo, logcdf(Weibull(beta, eta), years[i]) * weights[i])
        end
    end

I’d check this blog post out for something similar with Stan:

1 Like

That’s the syntax I was looking for. Okay neat- now I can compare the missing approach to directly adding to the logp. Once I do this I’ll probably write about the results in a brief post pick a “solution” and we can go on our way.

Can I ask why you opted to use Turing.acclogp instead of sampling from the weibull directly? Shouldn’t we only need to use that in cases where things are censored?

Alright so now I am back to thinking that missing~ makes sense… @mohamed82008 can you talk me out of it? So the idea is, Turing handles missing ~ or the index into an array containing missings as a special case? So we can sample a value thats left/right censored, by using a truncated distribution - why not. But, we aren’t going to claim that the posterior distribution should map to the censored value? So Missing does make sense, and it is giving me - at least in this contrived case a very similar distribution for our parameter m as using acclogp directly.

x_l = rand(truncated(Normal(0,1), 1.0, Inf), 300)

@model function missing_trunc(l_censored)
    m ~ Normal(0, 1)
    for n in 1:length(l_censored)
        missing ~ truncated(Normal(m, 1), l_censored[n], Inf)
    end
end
Model = missing_trunc( x_l )
chain1 = sample(Model, PG(100), 300)

@model function l_censor_trunc(l_censored)
    m ~ Normal(0, 1)
    for n in 1:length(l_censored)
        l_censored[n] ~ truncated(Normal(m, 1), l_censored[n], Inf)
    end
end
Model = l_censor_trunc( x_l )
chain2 = sample(Model, PG(100), 300)

@model function logpccdf(l_censored)
    m ~ Normal(0, 1)
    for n in 1:length(l_censored)
        Turing.acclogp!(_varinfo, logccdf(Normal(m, 1), l_censored[n] ) )
    end
end
Model = logpccdf( x_l )
chain3 = sample(Model, PG(100), 300)

"missing_distribution" => median( get_params( chain1 ).m )
"censored value" => median( get_params( chain2 ).m )
"logp_hacking" => median( get_params( chain3 ).m )

It’s doing the wrong thing! I wrote the DSL code so I know :slight_smile:

It’s not very special. A scalar input variable that’s missing will be treated the same way. But using missing like that actually defines a variable called missing. So think of it as x or something. The “wrong” part is that you are defining the variable multiple times in a loop. You should use an array to tell Turing that these are different variables. This array will be partly numbers (observations) and partly Base.missing which will be sampled.

1 Like

Alright we’re good now - thanks for the low down :).

Hi. I’m working on an example of censored regression. I’m trying to adapt the approach outlined by @joshualeond. I know the basics of what we are doing here… we are not treating censored data as missing, we are not imputing what their values are, instead we are manually incrementing the model log probability. But I am pretty new to Turing, so would appreciate some implementation help if possible?

Here is a MWE of a censored linear regression that I have so far.

using Turing, StatsPlots

function censor_data(x, y, L, U)
    censor = fill(0, length(x))
    for i in 1:length(x)
        if y[i] > U
            censor[i] = 1
            y[i] = U
        elseif y[i] < L
            censor[i] = -1
            y[i] = L
        else
            censor[i] = 0
        end
    end
    return x, y, censor
end

@model function censored_regression(x, y, censor)
    m ~ Normal(0, 2)
    c ~ Normal(0, 2)
    σ ~ truncated(Normal(0, 20), 0, Inf)
    for i in 1:length(x)
        if censor[i] == 0  # observed
            Turing.acclogp!(_varinfo, logpdf(Normal(m*x[i]+c, σ), y[i]))
        elseif censor[i] == 1 # right/upper censored
            Turing.acclogp!(_varinfo, logccdf(Normal(m*x[i]+c, σ), y[i]))
        elseif censor[i] == -1 # left/lower censored
            Turing.acclogp!(_varinfo, logcdf(Normal(m*x[i]+c, σ), y[i]))
        end
    end
end

# generate data
m, c, σ, N = 1, 0, 2, 200
x = rand(Uniform(-10, 10), N)
y = rand.(Normal.(m.*x.+c, σ))
# Apply censoring
L, U = -5, 5
x, y, censor = censor_data(x, y, L, U)

chain = sample(censored_regression(x, y, censor), NUTS(), 2000)
plot(chain)

I think I’m definitely learning here, as the model does actually run! The chains look good.

However, the parameter estimates are biased (ie m is quite far below it’s true value of 1). So I don’t think the model is doing what it is intended to do. It is as if it is just doing regular regression, so I expect this issue is with the model definition. I would appreciate any insights here.

function visualise(x, y, chain, subplot, true_params)
    m, c = true_params
    scatter!(x, y, color=:grey, markerstrokewidth=0, label=:none, subplot=subplot)
    xi = [minimum(x), maximum(x)]
    n_samples = length(chain[:m])
    samples_to_draw = 100
    samples_to_plot = sample([1:n_samples;], samples_to_draw; replace=false)
    for i in samples_to_plot
        plot!(xi, chain[:m][i].*xi.+chain[:c][i], color=:black, alpha=0.1, label=false, subplot=subplot)
    end
    plot!(xi, mean(chain[:m]).*xi.+mean(chain[:c]), color=:red, lw=2, label="posterior mean", ylim=[minimum(y)*1.5, maximum(y)*1.5], subplot=subplot)
    plot!(xi, m.*xi.+c, color=:black, lw=3, label="true", ylim=[minimum(y)*1.5, maximum(y)*1.5], subplot=subplot)
    plot!(tick_direction=:out)
end

visualise(x, y, chain, 1, (m, c))

2 Likes

Ha! My bad… it does actually work! I think it was just a bad run / dataset. Repeatedly running with different generated datasets seems to work fine :slight_smile:

1 Like