# Turing: censored regression example

Building upon my previous example of truncated regression, I thought I’d do a worked example of censored regression. To clarify, censoring is where observations below a lower bound and/or observations above an upper bound are recorded at being at those bounds. An example would be weighting something heavier than the maximum weight that a measuring device can handle, and recording it as the max weight on the dial.

If we think about that in the context of a simple linear regression model, we could be concrete about censoring, defining it in a function:

``````function censor_data(x, y, L, U)
censor = fill(0, length(x))
yc = copy(y)
for i in 1:length(x)
if y[i] > U
censor[i] = 1
yc[i] = U
elseif y[i] < L
censor[i] = -1
yc[i] = L
else
censor[i] = 0
end
end
return x, yc, censor
end
``````

The vector `censor` has values `0` for uncensored data, `-1` for left/lower censored data, and `1` for right/upper censored data. This is necessary for the censored regression model that we’ll construct below.

We can then generate some fake data, censor it, and visualise the process of censoring

``````# generate data
m, c, σ, N = 1, 0, 2, 200
x = rand(Uniform(-10, 10), N)
y = rand.(Normal.(m.*x.+c, σ))
# Apply censoring
L, U = -5, 5
xc, yc, censor = censor_data(x, y, L, U)
``````

Our censored regression model can be defined as below. I borrowed from the suggestion by @joshualeond in this post.

``````using Turing

@model function censored_regression(x, y, censor)
m ~ Normal(0, 2)
c ~ Normal(0, 2)
σ ~ truncated(Normal(0, 20), 0, Inf)
for i in 1:length(x)
if censor[i] == 0  # observed
elseif censor[i] == 1 # right/upper censored
elseif censor[i] == -1 # left/lower censored
end
end
end
``````

We can then sample and check the chains

``````chain = sample(censored_regression(xc, yc, censor), NUTS(), 2000)
plot(chain)
``````

Everything looks good. The posterior for m is pretty close, and the posterior for c is slightly low, but nothing overly problematic.

Visualising the posterior predictive confirms that the model is a good fit to the data. By using the censored regression model, we have avoided the bias in estimating the slope that would have occurred with regular regression alone.

8 Likes

Very nice!

It’s interesting, I usually think of PPLs as a way to combine existing distributions (or meaures). So I’d probably approach this by first defining a new `Censored` measure, and then using that. It’s especially fun that `Censored` would be singular wrt Lebesgue measure, so we’d need to represent it on top of as a superposition of Lebesgue and Dirac measures.

I’ll need to see about getting this set up in MeasureTheory.jl, then we can compare that with your Turing implementation.

Thanks @cscherrer. I’m less of a mathematician/statistician than you, so I might not be following you exactly. But are you suggesting to replicate `truncated()` into a new `censored` function to work with distributions? From the user / model builder perspective, being able to write something like

``````y[i] ~ censored(Normal(m*x[i]+c, σ, L, U)
``````

would be pretty amazing.

2 Likes

Sorry if I was a little cryptic, but it sounds like you got the idea.

On the “singular wrt Lebesgue measure” thing… Say you took billions of samples from a censored distribution, and tried to build histograms. The samples at the limits of the distribution would tend to overwhelm the rest. The distribution is concentrated there, so in a sense the units don’t work out properly, and it’s a mess.

But this is the kind of use case we’re targeting for MeasureTheory, so it makes for a nice test.

1 Like

The problem you’re describing is modelled with Tobit Regression in my world. That would be an interesting thing to add to one of the econometrics packages.

Yep, I believe (but don’t know for sure) that this is doing exactly the same as Tobit regression, but with priors over `m`, `c`, and `σ`.

The lines

``````for i in 1:length(x)
if censor[i] == 0  # observed
Turing.acclogp!(_varinfo, logpdf(Normal(m*x[i]+c, σ), y[i]))
elseif censor[i] == 1 # right/upper censored
Turing.acclogp!(_varinfo, logccdf(Normal(m*x[i]+c, σ), y[i]))
elseif censor[i] == -1 # left/lower censored
Turing.acclogp!(_varinfo, logcdf(Normal(m*x[i]+c, σ), y[i]))
end
end
``````

modify the likelihood in the same way as the Tobit model, I believe. But happy to be corrected on that.

You can use `Turing.@addlogprob!` instead of `Turing.acclogp!`:

``````Turing.@addlogprob! logpdf(Normal(m*x[i]+c, σ), y[i])
``````

The macro just rewrites it with `Turing.acclogp!` but you don’t have to specify (and hence are not warned about) the internal variable `_varinfo`.

2 Likes

That’s pretty nice. I’ll edit the post to use that approach.